Category Archives: Data Warehousing

Secure Kafka Cluster

A while ago I’ve wrote Oracle best practices for building secure Hadoop cluster and you could find details here. In that blog I intentionally didn’t mention Kafka’s security, because this topic deserved dedicated article. Now it’s time to do this and this blog will be devoted by Kafka security only. 

Kafka Security challenges

1) Encryption in motion. By default you communicate with Kafka cluster over unsecured network and everyone, who can listen network between your client and Kafka cluster, can read message content.

the way to avoid this is use some on-wire encryption technology – SSL/TLS. Using SSL/TLS you encrypt data on a wire between your client and Kafka cluster.

Communication without SSL/TLS:

SSL/TLS communication:

After you enable SSL/TLS communication, you will have follow consequence of steps for write/read message to/from Kafka cluster:

2) Authentication. Well, now when we encrypt traffic between client and server, but here is another challenge – server doesn't know with whom it communicate. In other words, you have to enable some mechanisms, which will not allow to work with cluster for UNKNOWN users. The default authentication mechanism in Hadoop world is Kerberos protocol. Here is the workflow, which shows sequence of steps to enable secure communication with Kafka:

Kerberos is the trusted way to authenticate user on cluster and make sure, that only known users can access it. 

3) Authorization. Next step when you authenticate user on your cluster (and you know that you are working as a Bob or Alice), you may want to apply some authorization rules, like setup permissions for certain users or groups. In other words define what user can do and what user can't do. Sentry may help you with this. Sentry have philosophy, when users belongs to the groups, groups has own roles and roles have permissions.

4) Rest Encryption. Another one security aspect is rest encryption. It's when you want to protect data, stored on the disk. Kafka is not purposed for long term storing data, but it could store data for a days or even weeks. We have to make sure that all data, stored on the disks couldn't be stolen and them read with out encryption key.

Security implementation. Step 1 – SSL/TLS

There is no any strict steps sequence for security implementation, but as a first step I will recommend to do SSL/TLS configuration. As a baseline I took Cloudera's documentation. For structuring all your security setup, create a directory on your Linux machine where you will put all files (start with one machine, but later on you will need to do the same on other's Kafka servers):

$ sudo chown -R kafka:kafka /opt/kafka/security

$ sudo mkdir -p /opt/kafka/security

A Java KeyStore (JKS) is a repository of security certificates – either authorization certificates or public key certificates – plus corresponding private keys, used for instance in SSL encryption. We will need to generate a key pair (a public key and associated private key). Wraps the public key into an X.509 self-signed certificate, which is stored as a single-element certificate chain. This certificate chain and the private key are stored in a new keystore entry identified by selfsigned.

# keytool -genkeypair -keystore keystore.jks -keyalg RSA -alias selfsigned -dname "CN=localhost" -storepass 'welcome2' -keypass 'welcome3'

if you want to check content of keystore, you may run follow command:

# keytool -list -v -keystore keystore.jks

Alias name: selfsigned

Creation date: May 30, 2018

Entry type: PrivateKeyEntry

Certificate chain length: 1

Certificate[1]:

Owner: CN=localhost

Issuer: CN=localhost

Serial number: 2065847b

Valid from: Wed May 30 12:59:54 UTC 2018 until: Tue Aug 28 12:59:54 UTC 2018

As the next step we will need to extract a copy of the cert from the java keystore that was just created:

# keytool -export -alias selfsigned -keystore keystore.jks -rfc -file server.cer

Enter keystore password: welcome2

Then create a trust store by making a copy of the default java trust store.  Main difference between trustStore vs keyStore is that trustStore (as name suggest) is used to store certificates from trusted Certificate authorities(CA) which is used to verify certificate presented by Server in SSL Connection while keyStore is used to store private key and own identity certificate which program should present to other party (Server or client) to verify its identity. Some more details you could find here. In my case on Big Data Cloud Service I've performed follow command:

# cp /usr/java/latest/jre/lib/security/cacerts /opt/kafka/security/truststore.jks

put it into truststore:

# ls -lrt

-rw-r–r– 1 root root 113367 May 30 12:46 truststore.jks

-rw-r–r– 1 root root   2070 May 30 12:59 keystore.jks

-rw-r–r– 1 root root   1039 May 30 13:01 server.cer

put the certificate that was just extracted from the keystore into the trust store (note: "changeit" is standard password):

# keytool -import -alias selfsigned -file server.cer -keystore truststore.jks -storepass changeit

check file size after (it's bigger, because includes new certificate):

# ls -let

-rw-r–r– 1 root root   2070 May 30 12:59 keystore.jks

-rw-r–r– 1 root root   1039 May 30 13:01 server.cer

-rw-r–r– 1 root root 114117 May 30 13:06 truststore.jks

It may seems too complicated and I decided to depict all those steps in one diagram:

so far, all those steps been performed on the single (some random broker) machine. But you will need to have keystore and trustore files on each Kafka broker, let's copy It (note, current syntax is working on Big Data Appliance, Big Data Cloud Service or Big Data Cloud at Customer):

# dcli -C "mkdir -p /opt/kafka/security"

# dcli -C "chown kafka:kafka /opt/kafka/security"

# dcli -C -f /opt/kafka/security/keystore.jks -d /opt/kafka/security/keystore.jks

# dcli -C -f /opt/kafka/security/truststore.jks -d /opt/kafka/security/truststore.jks

after doing all these steps, you need to make some configuration changes in Cloudera Manager for each node (go to Cloudera Manager -> Kafka -> Configuration): In addition to this, on each node, you have to change listeners in "Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties"

Also, make sure, that in Cloudera Manager, you have security.inter.broker.protocol equal to SSL: After node restart, when all brokers up and running, let's test it:

# openssl s_client -debug -connect kafka1.us2.oraclecloud.com:9093 -tls1_2

Certificate chain

0 s:/CN=localhost

   i:/CN=localhost

Server certificate

—–BEGIN CERTIFICATE—–

MIICxzCCAa+gAwIBAgIEIGWEezANBgkqhkiG9w0BAQsFADAUMRIwEAYDVQQDEwls

b2NhbGhvc3QwHhcNMTgwNTMwMTI1OTU0WhcNMTgwODI4MTI1OTU0WjAUMRIwEAYD

VQQDEwlsb2NhbGhvc3QwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCI

53T82eoDR2e9IId40UPTj3xg3khl1jdjNvMiuB/vcI7koK0XrZqFzMVo6zBzRHnf

zaFBKPAQisuXpQITURh6jrVgAs1V4hswRPrJRjM/jCIx7S5+1INBGoEXk8OG+OEf

m1uYXfULz0bX9fhfl+IdKzWZ7jiX8FY5dC60Rx2RTpATWThsD4mz3bfNd3DlADw2

LH5B5GAGhLqJjr23HFjiTuoQWQyMV5Esn6WhOTPCy1pAkOYqX86ad9qP500zK9lA

hynyEwNHWt6GoHuJ6Q8A9b6JDyNdkjUIjbH+d0LkzpDPg6R8Vp14igxqxXy0N1Sd

DKhsV90F1T0whlxGDTZTAgMBAAGjITAfMB0GA1UdDgQWBBR1Gl9a0KZAMnJEvxaD

oY0YagPKRTANBgkqhkiG9w0BAQsFAAOCAQEAaiNdHY+QVdvLSILdOlWWv653CrG1

2WY3cnK5Hpymrg0P7E3ea0h3vkGRaVqCRaM4J0MNdGEgu+xcKXb9s7VrwhecRY6E

qN0KibRZPb789zQVOS38Y6icJazTv/lSxCRjqHjNkXhhzsD3tjAgiYnicFd6K4XZ

rQ1WiwYq1254e8MsKCVENthQljnHD38ZDhXleNeHxxWtFIA2FXOc7U6iZEXnnaOM

Cl9sHx7EaGRc2adIoE2GXFNK7BY89Ip61a+WUAOn3asPebrU06OAjGGYGQnYbn6k

4VLvneMOjksuLdlrSyc5MToBGptk8eqJQ5tyWV6+AcuwHkTAnrztgozatg==

—–END CERTIFICATE—–

subject=/CN=localhost

issuer=/CN=localhost

No client certificate CA names sent

Server Temp Key: ECDH, secp521r1, 521 bits

SSL handshake has read 1267 bytes and written 441 bytes

New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384

Server public key is 2048 bit

Secure Renegotiation IS supported

Compression: NONE

Expansion: NONE

SSL-Session:

    Protocol  : TLSv1.2

    Cipher    : ECDHE-RSA-AES256-GCM-SHA384

    Session-ID: 5B0EAC6CA8FB4B6EA3D0B4A494A4660351A4BD5824A059802E399308C0B472A4

    Session-ID-ctx:

    Master-Key: 60AE24480E2923023012A464D16B13F954A390094167F54CECA1BDCC8485F1E776D01806A17FB332C51FD310730191FE

    Key-Arg   : None

    Krb5 Principal: None

    PSK identity: None

    PSK identity hint: None

    Start Time: 1527688300

    Timeout   : 7200 (sec)

    Verify return code: 18 (self signed certificate)

Well, seems our SSL connection is up and running. Time try to put some messages into the topic:

#  kafka-console-producer  –broker-list kafka1.us2.oraclecloud.com:9093  –topic foobar

18/05/30 13:56:28 WARN clients.NetworkClient: Connection to node -1 could not be established. Broker may not be available.

18/05/30 13:56:28 WARN clients.NetworkClient: Connection to node -1 could not be established. Broker may not be available.

reason of this error, that we don't have properly configured clients. We will need to create and use client.properties and jaas.conf files.

# cat /opt/kafka/security/client.properties

security.protocol=SSL

ssl.truststore.location=/opt/kafka/security/truststore.jks

ssl.truststore.password=changeit

-bash-4.1# cat jaas.conf

KafkaClient {

      com.sun.security.auth.module.Krb5LoginModule required

      useTicketCache=true;

    };

# export KAFKA_OPTS="-Djava.security.auth.login.config=/opt/kafka/security/jaas.conf"

 now you could try again to produce messages:

# kafka-console-producer –broker-list kafka1.us2.oraclecloud.com:9093  –topic foobar –producer.config client.properties

Hello SSL world

no any errors – already good! Let's try to consume message:

# kafka-console-consumer –bootstrap-server kafka1.us2.oraclecloud.com:9093 –topic foobar –from-beginning  –consumer.config /opt/kafka/security/client.properties

Hello SSL world

Bingo! So, we created secure communication between Kafka Cluster and Kafka Client and write a message there.

Security implementation. Step 2 – Kerberos

So, we up and run Kafka on Kerberized cluster and write and read data from a cluster without Kerberos ticket.

$ klist

klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1001)

This is not how it's suppose to work. We assume that if we protect cluster by Kerberos it's impossible to do something without ticket. Fortunately, it's relatively easy to config communications with Kerberized Kafka cluster.

First, make sure that you have enabled Kerberos authentification in Cloudera Manager (Cloudera Manager -> Kafka -> Configuration):

second, go again to Cloudera Manager and change value of "security.inter.broker.protocol" to SASL_SSL: Note: Simple Authentication and Security Layer (SASL) is a framework for authentication and data security in Internet protocols. It decouples authentication mechanisms from application protocols, in theory allowing any authentication mechanism supported by SASL to be used in any application protocol that uses SASL. Very roughly – in this blog post you may think that SASL is equal to Kerberos. After this change, you will need to modify listeners protocol on each broker (to SASL_SSL) in "Kafka Broker Advanced Configuration Snippet (Safety Valve) for kafka.properties" setting: you ready for restart Kafka Cluster and write/read data from/to it.  Before doing this, you will need to modify Kafka client credentials:

$ cat /opt/kafka/security/client.properties

security.protocol=SASL_SSL

sasl.kerberos.service.name=kafka

ssl.truststore.location=/opt/kafka/security/truststore.jks

ssl.truststore.password=changeit

after this you may try to read data from Kafka cluster:

$ kafka-console-consumer –bootstrap-server kafka1.us2.oraclecloud.com:9093 –topic foobar –from-beginning  –consumer.config /opt/kafka/security/client.properties

Caused by: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Could not login: the client is being asked for a password, but the Kafka client code does not currently support obtaining a password from the user. not available to garner  authentication information from the user

Error may miss-lead you, but the the real reason is absence of Kerberos ticket:

$ klist

klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_1001)

$ kinit oracle

Password for oracle@BDACLOUDSERVICE.ORACLE.COM:

$ kafka-console-consumer –bootstrap-server kafka1.us2.oraclecloud.com:9093 –topic foobar –from-beginning  –consumer.config /opt/kafka/security/client.properties

Hello SSL world

Great, it works! now we have to run kinit everytime before read/write data from Kafka cluster. Instead of this for convenience we may use keytab. For doing this you will need go to KDC server and generate keytab file there:

# kadmin.local

Authenticating as principal hdfs/admin@BDACLOUDSERVICE.ORACLE.COM with password.

kadmin.local: xst -norandkey -k testuser.keytab testuser

Entry for principal oracle with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type des-hmac-sha1 added to keytab WRFILE:oracle.keytab.

Entry for principal oracle with kvno 2, encryption type des-cbc-md5 added to keytab WRFILE:oracle.keytab.

kadmin.local:  quit

# ls -l

-rw——-  1 root root    436 May 31 14:06 testuser.keytab

now, when we have keytab file, we could copy it to the client machine and use it for Kerberos Authentication. don't forget to change owner of keytab file to person, who will run the script:

$ chown opc:opc /opt/kafka/security/testuser.keytab

Also, we will need to modify jaas.conf file:

$ cat /opt/kafka/security/jaas.conf

KafkaClient {

      com.sun.security.auth.module.Krb5LoginModule required

      useKeyTab=true

      keyTab="/opt/kafka/security/testuser.keytab"

      principal="testuser@BDACLOUDSERVICE.ORACLE.COM";    

};

seems we are fully ready to consumption of messages from topic. Despite on we have oracle as kerberos principal on a OS, we connect to the cluster as testuser (according jaas.conf):

$ kafka-console-consumer –bootstrap-server kafka1.us2.oraclecloud.com:9093 –topic foobar –from-beginning  –consumer.config /opt/kafka/security/client.properties

18/05/31 15:04:45 INFO authenticator.AbstractLogin: Successfully logged in.

18/05/31 15:04:45 INFO kerberos.KerberosLogin: [Principal=testuser@BDACLOUDSERVICE.ORACLE.COM]: TGT refresh thread started.

Hello SSL world

Security Implementation Step 3 – Sentry

One step before we configured Authentication, which answers on question – who am I. Now is the time to set up some Authorization mechanism, which will answer on question – what am I allow to do. Sentry became very popular engine in Hadoop world and we will use it for Kafka’s Authorization. As I posted earlier Sentry have philosophy, when users belongs to the groups, groups has own roles and roles have permissions:

And we will need to follow this with Kafka as well. But we will start with some service configurations first (Cloudera Manager -> Kafka -> Configuration):

Also, it’s very important to add in Sentry config (Cloudera Manager -> Sentry -> Config) user kafka in “sentry.service.admin.group”:

 Well, when we know who connects to the cluster, we may restrict he/she from reading some particular topics (in other words perform some Authorization). 

Note: for perform administrative operations with Sentry, you have to work as Kafka user.

$ id

uid=1001(opc) gid=1005(opc) groups=1005(opc)

$ sudo find /var -name kafka*keytab -printf "%T+\t%p\n" | sort|tail -1|cut -f 2

/var/run/cloudera-scm-agent/process/1171-kafka-KAFKA_BROKER/kafka.keytab

$ sudo cp /var/run/cloudera-scm-agent/process/1171-kafka-KAFKA_BROKER/kafka.keytab /opt/kafka/security/kafka.keytab

$ sudo chown opc:opc /opt/kafka/security/kafka.keytab

obtain Kafka ticket:

$ kinit -kt /opt/kafka/security/kafka.keytab kafka/`hostname`

$ klist

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: kafka/kafka1.us2.oraclecloud.com@BDACLOUDSERVICE.ORACLE.COM

 

Valid starting     Expires            Service principal

05/31/18 15:52:28  06/01/18 15:52:28  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/05/18 15:52:28

Before configuring and testing Sentry with Kafka, we will need to create unprivileged user, who we will give grants (Kafka user is privileged and it bypassed Sentry). there are few simple steps, create test user (unprivileged) on each Hadoop node (this syntax will work on Big Data Appliance, Big Data Cloud Service and Big Data Cloud at Customer):

# dcli -C "useradd testsentry -u 1011"

we should remember that Sentry heavily relies on the Groups and we have to create it and put "testsentry" user there:

# dcli -C "groupadd testsentry_grp -g 1017"

after group been created, we should put user there:

# dcli -C "usermod -g testsentry_grp testsentry"

check that everything is how we expect:

# dcli -C "id testsentry"

10.196.64.44: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.60: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.64: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.65: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

10.196.64.61: uid=1011(testsentry) gid=1017(testsentry_grp) groups=1017(testsentry_grp)

Note: you have to have same userID and groupID on each machine. Now verify that Hadoop can lookup group:

# hdfs groups testsentry

testsentry : testsentry_grp

All this steps you have to perform as root. Next you should create testsentry principal in KDC (it's not mandatory, but more organize and easy to understand). Go to the KDC host and run follow commands:

# kadmin.local 

Authenticating as principal root/admin@BDACLOUDSERVICE.ORACLE.COM with password. 

kadmin.local:  addprinc testsentry

WARNING: no policy specified for testsentry@BDACLOUDSERVICE.ORACLE.COM; defaulting to no policy

Enter password for principal "testsentry@BDACLOUDSERVICE.ORACLE.COM": 

Re-enter password for principal "testsentry@BDACLOUDSERVICE.ORACLE.COM": 

Principal "testsentry@BDACLOUDSERVICE.ORACLE.COM" created.

kadmin.local:  xst -norandkey -k testsentry.keytab testsentry

Entry for principal testsentry with kvno 1, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type des3-cbc-sha1 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type arcfour-hmac added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type des-hmac-sha1 added to keytab WRFILE:testsentry.keytab.

Entry for principal testsentry with kvno 1, encryption type des-cbc-md5 added to keytab WRFILE:testsentry.keytab.

Now we have all setup for unprivileged user. Time to start configure Sentry policies. As soon as Kafka is superuser we may run admin commands as Kafka user. For managing sentry settings we will need to use Kafka user. To obtain Kafka credentials we need to run:

$ kinit -kt /opt/kafka/security/kafka.keytab kafka/`hostname`

$ klist 

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: kafka/kafka1.us2.oraclecloud.com@BDACLOUDSERVICE.ORACLE.COM

Valid starting     Expires            Service principal

06/15/18 01:37:53  06/16/18 01:37:53  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/20/18 01:37:53

First we need to create role. Let's call it testsentry_role:

$ kafka-sentry -cr -r testsentry_role

let's check, that role been created:

$ kafka-sentry -cr -r testsentry_role

admin_role

testsentry_role

[opc@cfclbv3872 ~]$  

as soon as role created, we will need to give some permissions to this role for certain topic:

$ kafka-sentry -gpr -r testsentry_role -p "Host=*->Topic=testTopic->action=write"

and also describe:

$  kafka-sentry -gpr -r testsentry_role -p "Host=*->Topic=testTopic->action=describe"

next step, we have to allow some consumer group to read and describe from this topic:

$ kafka-sentry -gpr -r testsentry_role -p "Host=*->Consumergroup=testconsumergroup->action=read"

$ kafka-sentry -gpr -r testsentry_role -p "Host=*->Consumergroup=testconsumergroup->action=describe"

next step is linking role and groups, we will need to assign testsentry_role to testsentry_grp (group automatically inherit all role's permissions):

$ kafka-sentry -arg -r testsentry_role -g testsentry_grp

after this, let's check that our mapping worked fine:

$ kafka-sentry -lr -g testsentry_grp

testsentry_role

now let's review list of the permissions, which have our certain role:

$ kafka-sentry -r testsentry_role -lp

HOST=*->CONSUMERGROUP=testconsumergroup->action=read

HOST=*->TOPIC=testTopic->action=write

HOST=*->TOPIC=testTopic->action=describe

HOST=*->TOPIC=testTopic->action=read

it's also very important to have consumer group in client properties file:

$ cat /opt/kafka/security/client.properties

security.protocol=SASL_SSL

sasl.kerberos.service.name=kafka

ssl.truststore.location=/opt/kafka/security/truststore.jks

ssl.truststore.password=changeit

group.id=testconsumergroup

after all set, we will need to switch to testsentry user for testing:

$ kinit -kt /opt/kafka/security/testsentry.keytab testsentry

$ klist 

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: testsentry@BDACLOUDSERVICE.ORACLE.COM

Valid starting     Expires            Service principal

06/15/18 01:38:49  06/16/18 01:38:49  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/22/18 01:38:49

test writes:

$ kafka-console-producer –broker-list kafka1.us2.oraclecloud.com:9093 –topic testTopic –producer.config /opt/kafka/security/client.properties

> testmessage1

> testmessage2

>

seems everything is ok, now let's test a read:

$ kafka-console-consumer –bootstrap-server kafka1.us2.oraclecloud.com:9093 –topic testTopic –from-beginning  –consumer.config /opt/kafka/security/client.properties

testmessage1

testmessage2

now, for showing Sentry in action, I'll try to read messages from other topic, which is outside of allowed topics for our test group.

$ kafka-console-consumer –from-beginning –bootstrap-server kafka1.us2.oraclecloud.com:9093 –topic foobar –consumer.config /opt/kafka/security/client.properties

18/06/15 02:54:54 INFO internals.AbstractCoordinator: (Re-)joining group testconsumergroup

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 13 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 15 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 16 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

18/06/15 02:54:54 WARN clients.NetworkClient: Error while fetching metadata with correlation id 17 : {foobar=UNKNOWN_TOPIC_OR_PARTITION}

so, as we can see we could not read from Topic, which we don't authorize to read.

Systemizing all this, I'd like to put user-group-role-privilegies flow on one picture:

And also, I'd like to summarize steps, required for getting list of privileges for certain user (testsentry in my example):

// Run as superuser – Kafka

$ kinit -kt /opt/kafka/security/kafka.keytab kafka/`hostname`

$ klist 

Ticket cache: FILE:/tmp/krb5cc_1001

Default principal: kafka/cfclbv3872.us2.oraclecloud.com@BDACLOUDSERVICE.ORACLE.COM

Valid starting     Expires            Service principal

06/19/18 02:38:26  06/20/18 02:38:26  krbtgt/BDACLOUDSERVICE.ORACLE.COM@BDACLOUDSERVICE.ORACLE.COM

    renew until 06/24/18 02:38:26

// Get list of the groups which belongs certain user

$ hdfs groups testsentry

testsentry : testsentry_grp

// Get list of the role for certain group

$ kafka-sentry -lr -g testsentry_grp

testsentry_role

// Get list of permissions for certain role

$ kafka-sentry -r testsentry_role -lp

HOST=*->CONSUMERGROUP=testconsumergroup->action=read

HOST=*->TOPIC=testTopic->action=describe

HOST=*->TOPIC=testTopic->action=write

HOST=*->TOPIC=testTopic->action=read

HOST=*->CONSUMERGROUP=testconsumergroup->action=describe

Based on what we saw above – our user testsentry could read and write to topic testTopic. For reading data he should to belong to the consumergroup "testconsumergroup".

Security Implementation Step 4 – Encryption At Rest

Last part of security journey is Encryption of Data, which you store on the disks. Here there are multiple ways, one of the most common is Navigator Encrypt.

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Big Data SQL 3.2.1 is Now Available

Just wanted to give a quick update.  I am pleased to announce that Oracle Big Data SQL 3.2.1 is now available.   This release provides support for Oracle Database 12.2.0.1.  Here are some key details:

  • Existing customers using Big Data SQL 3.2 do not need to take this update; Oracle Database 12.2.0.1 support is the reason for the update.
  • Big Data SQL 3.2.1 can be used for both Oracle Database 12.1.0.2 and 12.2.0.1 deployments
  • For Oracle Database 12.2.0.1, Big Data SQL 3.2.1 requires the April Release Update plus the Big Data SQL 3.2.1 one-off patch
  • The software is available on ARU.  The Big Data SQL 3.2.1 installer will be available on edelivery soon 
    • Big Data SQL 3.2.1 Installer ( Patch 28071671).  Note, this is the complete installer; it is not a patch.
    • Oracle Database 12.2.0.1 April Release Update (Patch 27674384). 
    • Big Data SQL 3.2.1 one-off on top of April RU (Patch 26170659).  Ensure you pick the appropriate release in the download page.  This patch must be applied to each database server and Grid Infrastructure.

Also, check out this new Big Data SQL Tutorial series on Oracle Learning Library.  The series includes numerous videos that helps you understand Big Data SQL capabilities.  It includes:

  • Introducing the Oracle Big Data Lite Virtual Machine and Hadoop
  • Introduction to Oracle Big Data SQL
  • Hadoop and Big Data SQL Architectures
  • Oracle Big Data SQL Performance Features
  • Information Lifecycle Management 

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with …

[unable to retrieve full-text content]

image thumb18 When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ...

A dataset with all sessions of the upcoming Oracle OpenWorld 2017 conference is nice to have – for experiments and demonstrations with many technologies. The session catalog is exposed at a website here.

With searching, filtering and scrolling, all available sessions can be inspected. If data is available in a browser, it can be retrieved programmatically and persisted locally in for example a JSON document. A typical approach for this is web scraping: having a server side program act like a browser, retrieve the HTML from the web site and query the data from the response. This process is described for example in this article – https://codeburst.io/an-introduction-to-web-scraping-with-node-js-1045b55c63f7 – for Node and the Cheerio library.

However, server side screen scraping of HTML will only be successful when the HTML is static. Dynamic HTML is constructed in the browser by executing JavaScript code that manipulates the browser DOM. If that is the mechanism behind a web site, server side scraping is at the very least considerably more complex (as it requires the server to emulate a modern web browser to a large degree). Selenium has been used in such cases – to provide a server side, programmatically accessible browser engine. Alternatively, screen scraping can also be performed inside the browser itself – as is supported for example by the Getsy library.

As you will find in this article – when server side scraping fails, client side scraping may be a much to complex solution. It is very well possible that the rich client web application is using a REST API that provides the data as a JSON document. An API that our server side program can also easily leverage. That turned out the case for the OOW 2017 website – so instead of complex HTML parsing and server side or even client side scraping, the challenge at hand resolves to nothing more than a little bit of REST calling. Read the complete article here.

PaaS Partner Community

For regular information on business process management and integration become a member in the SOA & BPM Partner Community for registration please visit www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center.

177013 When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Blog twitter on When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Twitter linkedin on When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... LinkedIn  When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Facebook  When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Wiki

Technorati Tags: SOA Community,Oracle SOA,Oracle BPM,OPN,Jürgen Kress

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Loading Data to the Object Store for Autonomous Data Warehouse Cloud

So you got your first service instance of your autonomous data warehouse set up, you experienced the performance of the environment using the sample data, went through all tutorials and videos and are getting ready to rock-n-roll. But the one thing you’re not sure about is this Object Store. Yes, you used it successfully as described in the tutorial, but what’s next?. And what else is there to know about the Object Store?

First and foremost, if you are interested in understanding a bit more about what this Object Store is, you should read the following blog post from Rachna, the Product Manager for the Object Store among other things. It introduces the Object Store, how to set it up and manage files with the UI, plus a couple of simple command line examples (don’t get confused by the term ‘BMC’, that’s the old name of Oracle’s Cloud Infrastructure; that’s true for the command line utility as well, which is now called oci). You should read that blog post to get familiar with the basic concepts of the Object Store and a cloud account (tenant).

The documentation and blog posts are great, but now you actually want to do use it to load data into ADWC.  This means loading more (and larger) files, more need for automation, and more flexibility.  This post will focus on exactly that: to become productive with command line utilities without being a developer, and to leverage the power of the Oracle Object Store to upload more files in one go and even how to upload larger files in parallel without any major effort.

The blog post will cover both:

  • The Oracle oci command line interface for managing files
  • The Swift REST interface for managing files

 

Using the oci command line interface

The Oracle oci command line interface (CLI) is a tool that enables you to work with Oracle Cloud Infrastructure objects and services. It’s a thin layer on top of the oci APIs (typically REST) and one of Oracle’s open source project (the source code is on GitHub).

Let’s quickly step through what you have to do for using this CLI. If you do not want to install anything, that is fine, too. In that case feel free to jump to the REST section in this post right away, but you’re going to miss out on some cool stuff that the CLI provides you out of the box.

To get going with the utility is really simple, as simple as one-two-three

  1. Install oci cli following the installation instructions on github.
    I just did this on an Oracle Linux 7.4 VM instance that I created in the Oracle Cloud and had the utility up and running in no time.
     
  2. Configure your oci cli installation.
    You need a user created in the Oracle Cloud account that you want to use, and that user must have the appropriate privileges to work with the object store. A keypair is used for signing API requests, with the public key uploaded to Oracle. Only the user calling the API should possess the private key. All this is described in the configuration section of the CLI. 

    That is probably the part that takes you the most time of the setup. You have to ensure to have UI console access when doing this since you have to upload the public key for your user.
     

  3. Use oci cli.
    After successful setup you can use the command line interface to manage your buckets for storing all your files in the Cloud, among other things.

 

First steps with oci cli

The focus of the command line interface is on ease-of-use and to make its usage as self-explaining as possible, with a comprehensive built-in help system in the utility. Whenever you want to know something without looking around, use the –help, -h, or -? Syntax for a command, irrespective of how many parameters you have already entered. So you can start with oci -h and let the utility guide you.

For the purpose of file management the important category is the object store category, with the main tasks of:

  • Creating, managing, and deleting buckets
    This task is probably done by an administrator for you, but we will cover it briefly nevertheless
     
  • Uploading, managing, and downloading objects (files)
    That’s your main job in the context of the Autonomous Data Warehouse Cloud

That’s what we are going to do now.

Creating a bucket

Buckets are containers that store objects (files). Like other resources, buckets belong to a compartment, a collection of resources in the Cloud that can be used as entity for privilege management. To create a bucket you have to know the compartment id to create a bucket. That is the only time we have to deal with this cloud-specific unique identifiers. All other object (file) operations use names.

So let’s create a bucket. The following creates a bucket named myFiles in my account ADWCACCT in a compartment given to me by the Cloud administrator.

$ oci os bucket create --compartment-id ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq --namespace-name adwcaact --name myFiles

{
  “data”: {
    “compartment-id”: “ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq”,
    “created-by”: “ocid1.user.oc1..aaaaaaaaomoqtk3z7y43543cdvexq3y733pb5qsuefcbmj2n5c6ftoi7zygq”,
    “etag”: “c6119bd6-98b6-4520-a05b-26d5472ea444″,
    “metadata”: {},
    “name”: “myFiles”,
    “namespace”: “adwcaact”,
    “public-access-type”: “NoPublicAccess”,
    “storage-tier”: “Standard”,
    “time-created”: “2018-02-26T22:16:30.362000+00:00″
  },
  “etag”: “c6119bd6-98b6-4520-a05b-26d5472ea733″
}

The operation returns with the metadata of the bucket after successful creation. We’re ready to upload and manage files in the object store.

Upload your first file with oci cli

You can upload a single file very easily with the oci command line interface. And, as promised before, you do not even have to remember any ocid in this case … .

$ oci os object put --namespace adwcacct --bucket-name myFiles --file /stage/supplier.tbl

Uploading object  [####################################]  100%
{
  “etag”: “662649262F5BC72CE053C210C10A4D1D”,
  “last-modified”: “Mon, 26 Feb 2018 22:50:46 GMT”,
  “opc-content-md5″: “8irNoabnPldUt72FAl1nvw==”
}

After successful upload you can check the md5 sum of the file; that’s basically the fingerprint that the data on the other side (in the cloud) is not corrupt and the same than local (on the machine where the data is coming from). The only “gotcha” is that OCI is using base64 encoding, so you cannot just do a simple md5. The following command solves this for me on my Mac:

$ openssl dgst -md5 -binary supplier.tbl |openssl enc -base64
8irNoabnPldUt72FAl1nvw==

Now that’s a good start. I can use this command in any shell program, like the following which loads all files in a folder sequentially to the object store: 

for i in `ls *.tbl`
do
  oci os object put --namespace adwcacct --bucket-name myFiles --file $ i
done

You can write it to load multiple files in parallel, load only files that match a specific name pattern, etc. You get the idea. Whatever you can do with a shell you can do.

Alternatively, if it’s just about loading all the files in  you can achieve the same with the oci cli as well by using its bulk upload capabilities. The following shows briefly

oci os object bulk-upload -ns adwcacct -bn myFiles --src-dir /MyStagedFiles

{
  "skipped-objects": [],
  "upload-failures": {},
  "uploaded-objects": {
    "chan_v3.dat": {
      "etag": "674EFB90B1A3CECAE053C210D10AC9D9",
      "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
      "opc-content-md5": "/t4LbeOiCz61+Onzi/h+8w=="
    },
    "coun_v3.dat": {
      "etag": "674FB97D50C34E48E053C230C10A1DF8",
      "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
      "opc-content-md5": "sftu7G5+bgXW8NEYjFNCnQ=="
    },
    "cust1v3.dat": {
      "etag": "674FB97D52274E48E053C210C10A1DF8",
      "last-modified": "Tue, 13 Mar 2018 17:44:06 GMT",
      "opc-content-md5": "Zv76q9e+NTJiyXU52FLYMA=="
    },
    "sale1v3.dat": {
      "etag": "674FBF063F8C50ABE053C250C10AE3D3",
      "last-modified": "Tue, 13 Mar 2018 17:44:52 GMT",
      "opc-content-md5": "CNUtk7DJ5sETqV73Ag4Aeg=="
    }
  }
}

Uploading a single large file in parallel 

Ok, now we can load one or many files to the object store. But what do you do if you have a single large file that you want to get uploaded? The oci command line offers built-in multi-part loading where you do not need to split the file beforehand. The command line provides you built-in capabilities to (A) transparently split the file into sized parts and (B) to control the parallelism of the upload.

$ oci os object put -ns adwcacct -bn myFiles --file lo_aa.tbl --part-size 100 --parallel-upload-count 4

While the load is ongoing you can all in-progress uploads, but unfortunately without any progress bar or so; the progress bar is reserved for the initiating session: 

$ oci os multipart list -ns adwcacct -bn myFiles
{
  "data":
   [    
    {
      "bucket": "myFiles",
      "namespace": "adwcacct",
      "object": "lo_aa.tbl",
      "time-created": "2018-02-27T01:19:47.439000+00:00",
      "upload-id": "4f04f65d-324b-4b13-7e60-84596d0ef47f"
    }
  ]
}

While a serial process for a single file gave me somewhere around 35 MB/sec to upload on average, the parallel load sped up things quite a bit, so it’s definitely cool functionality (note that your mileage will vary and is probably mostly dependent on your Internet/proxy connectivity and bandwidth. 

If you’re interested in more details about how that works, here is a link from Rachna who explains the inner details of this functionality in more detail.

Using the Swift REST interface

Now after having covered the oci utility, let’s briefly look into what we can do out of the box, without the need to install anything. Yes, without installing anything you can leverage REST endpoints of the object storage service. All you need to know is your username/SWIFT password and your environment details, e.g. which region your uploading to, the account (tenant) and the target bucket. 

This is where the real fun starts, and this is where it can become geeky, so we will focus only on the two most important aspects of dealing with files and the object store: uploading and downloading files.

Understanding how to use Openstack Swift REST

File management with REST is equally simple than it is with the oci cli command. Similar to the setup of the oci cli, you have to know the basic information about your Cloud account, namely: 

  • a user in the cloud account that has the appropriate privileges to work with a bucket in your tenancy. This user also has to be configured to have a SWIFT password (see here how that is done).
  • a bucket in one of the object stores in a region (we are not going to discuss how to use REST to do this). The bucket/region defines the rest endpoint, for example if you are using the object store in Ashburn, VA, the endpoint is https://swiftobjectstorage.us-ashburn-1.oraclecloud.com)

The URI for accessing your bucket is built as follows:

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Object Store Service operations. Part 1 – Loading data

One of the most common and clear trends in the IT market is Cloud and one of the most common and clear trends in the Cloud is Object Store. Some introduction information you may find here. Many Cloud providers, including Oracle, assumes, that data lifecycle starts from Object Store:

You land it there and then either read or load it by different services, such as ADWC or BDCS, for example. Oracle has two flavors of Object Store Services (OSS), OSS on OCI (Oracle Cloud Infrastructure) and OSS on OCI -C (Oracle Cloud  Infrastructure Classic). 

In this post, I’m going to focus on OSS on OCI-C, mostly because OSS on OCI, was perfectly explained by Hermann Baer here and by Rachna Thusoo here.

Upload/Download files.

As in Hermann’s blog, I’ll focus on most frequent operations Upload and Download. There are multiple ways to do so. For example:

- Oracle Cloud WebUI

- REST API

- FTM CLI tool

- Third Part tools such as CloudBerry

- Big Data Manager (via ODCP)

- Hadoop client with Swift API

- Oracle Storage Software Appliance

Let’s start with easiest one – Web Interface.

Upload/Download files. WebUI.

For sure you have to start with Log In to cloud services:

then, you have to go to Object Store Service:

after this drill down into Service Console and you will be able to see list of the containers within your OSS:

To create a new container (bucket in OCI terminology), simply click on “Creare Container” and give a name to it:

After it been created, click on it and go to “Upload object” button:

Click and Click again and here we are, file in the container:

Let’s try to upload a bigger file, ops… we got an error:

So, seems we have 5GB limitations. Fortunitely, we could have “Large object upload”:

Which will allow us to uplod file bigger than 5GB:

so, and what about downloading? It’s easy, simply click download and land file on local files system.

Upload/Download files. REST API.

WebUI maybe a good way to upload data, when a human operates with it, but it’s not too convenient for scripting. If you want to automate your file uploading, you may use REST API. You may find all details regarding REST API here, but alternatively, you may use this script, which I’m publishing below and it could hint you some basic commands:

#!/bin/bash
shopt -s expand_aliases

alias echo="echo -e"

USER="alexey.filanovskiy@oracle.com"
PASS="MySecurePassword"

OSS_USER="storage-a424392:$  {USER}"
OSS_PASS="$  {PASS}"
OSS_URL="https://storage-a424392.storage.oraclecloud.com/auth/v1.0"

echo "curl -k -sS -H \"X-Storage-User: $  {OSS_USER}\" -H \"X-Storage-Pass:$  {OSS_PASS}\" -i \"$  {OSS_URL}\""
out=`curl -k -sS -H "X-Storage-User: $  {OSS_USER}" -H "X-Storage-Pass:$  {OSS_PASS}" -i "$  {OSS_URL}"`
while [ $  ? -ne 0 ]; do
echo "Retrying to get token\n"
        sleep 1;
        out=`curl -k -sS -H "X-Storage-User: $  {OSS_USER}" -H "X-Storage-Pass:$  {OSS_PASS}" -i "$  {OSS_URL}"`
done

AUTH_TOKEN=`echo "$  {out}" | grep "X-Auth-Token" | sed 's/X-Auth-Token: //;s/\r//'`
STORAGE_TOKEN=`echo "$  {out}" | grep "X-Storage-Token" | sed 's/X-Storage-Token: //;s/\r//'`
STORAGE_URL=`echo "$  {out}" | grep "X-Storage-Url" | sed 's/X-Storage-Url: //;s/\r//'`

echo "Token and storage URL:"
echo "\tOSS url:       $  {OSS_URL}"
echo "\tauth token:    $  {AUTH_TOKEN}"
echo "\tstorage token: $  {STORAGE_TOKEN}"
echo "\tstorage url:   $  {STORAGE_URL}"

echo "\nContainers:"
for CONTAINER in `curl -k -sS -u "$  {USER}:$  {PASS}" "$  {STORAGE_URL}"`; do
echo "\t$  {CONTAINER}"
done

FILE_SIZE=$  ((1024*1024*1))
CONTAINER="example_container"
FILE="file.txt"
LOCAL_FILE="./$  {FILE}"
FILE_AT_DIR="/path/file.txt"
LOCAL_FILE_AT_DIR=".$  {FILE_AT_DIR}"
REMOTE_FILE="$  {CONTAINER}/$  {FILE}"
REMOTE_FILE_AT_DIR="$  {CONTAINER}$  {FILE_AT_DIR}"


for f in "$  {LOCAL_FILE}" "$  {LOCAL_FILE_AT_DIR}"; do
        if [ ! -e "$  {f}" ]; then
echo "\nInfo: File "$  {f}" does not exist. Creating $  {f}"
                d=`dirname "$  {f}"`
                mkdir -p "$  {d}";
                tr -dc A-Za-z0-9 </dev/urandom | head -c <span style="background-color: #fff0f0">"$  {FILE_SIZE}"</span> > "$  {f}"
                #dd if="/dev/random" of="$  {f}" bs=1 count=0 seek=$  {FILE_SIZE} &> /dev/null
        fi;
done;

echo "\nActions:"

echo "\tListing containers:\t\t\t\tcurl -k -vX GET -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/\""
echo "\tCreate container \"oss://$  {CONTAINER}\":\t\tcurl -k -vX PUT -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {CONTAINER}\""
echo "\tListing objects at container \"oss://$  {CONTAINER}\":\tcurl -k -vX GET -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {CONTAINER}/\""

echo "\n\tUpload \"$  {LOCAL_FILE}\" to \"oss://$  {REMOTE_FILE}\":\tcurl -k -vX PUT -T \"$  {LOCAL_FILE}\" -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {CONTAINER}/\""
echo "\tDownload \"oss://$  {REMOTE_FILE}\" to \"$  {LOCAL_FILE}\":\tcurl -k -vX GET -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {REMOTE_FILE}\" > \"$  {LOCAL_FILE}\""

echo "\n\tDelete \"oss://$  {REMOTE_FILE}\":\tcurl -k -vX DELETE -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {REMOTE_FILE}\""

echo "\ndone"

I put the content of this script into file oss_operations.sh, give execute permission and run it:

$   chmod +x oss_operations.sh
$   ./oss_operations.sh

the output will look like:

curl -k -sS -H "X-Storage-User: storage-a424392:alexey.filanovskiy@oracle.com" -H "X-Storage-Pass:MySecurePass" -i "https://storage-a424392.storage.oraclecloud.com/auth/v1.0"
Token and storage URL:
        OSS url:       https://storage-a424392.storage.oraclecloud.com/auth/v1.0
        auth token:    AUTH_tk45d49d9bcd65753f81bad0eae0aeb3db
        storage token: AUTH_tk45d49d9bcd65753f81bad0eae0aeb3db
        storage url:   https://storage.us2.oraclecloud.com/v1/storage-a424392

Containers:
        123_OOW17
        1475233258815
        1475233258815-segments
        Container
...
Actions:
        Listing containers:                             curl -k -vX GET -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/"
        Create container "oss://example_container":             curl -k -vX PUT -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container"
        Listing objects at container "oss://example_container": curl -k -vX GET -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/"

        Upload "./file.txt" to "oss://example_container/file.txt":      curl -k -vX PUT -T "./file.txt" -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/"
        Download "oss://example_container/file.txt" to "./file.txt":    curl -k -vX GET -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/file.txt" > "./file.txt"

        Delete "oss://example_container/file.txt":      curl -k -vX DELETE -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/file.txt"

Upload/Download files. FTM CLI.

REST API may seems a bit cumbersome and quite hard to use. But there is a good news that there is kind of intermediate solution Command Line Interface – FTM CLI. Again, here is the full documentation available here, but I’d like briefly explain what you could do with FTM CLI. You could download it here and after unpacking, it’s ready to use!

$   unzip ftmcli-v2.4.2.zip
...
$   cd ftmcli-v2.4.2
$   ls -lrt
total 120032
-rwxr-xr-x 1 opc opc      1272 Jan 29 08:42 README.txt
-rw-r--r-- 1 opc opc  15130743 Mar  7 12:59 ftmcli.jar
-rw-rw-r-- 1 opc opc 107373568 Mar 22 13:37 file.txt
-rw-rw-r-- 1 opc opc       641 Mar 23 10:34 ftmcliKeystore
-rw-rw-r-- 1 opc opc       315 Mar 23 10:34 ftmcli.properties
-rw-rw-r-- 1 opc opc    373817 Mar 23 15:24 ftmcli.log

You may note that there is file ftmcli.properties, it may simplify your life if you configure it once. Documentation you may find here and my example of this config:

$   cat ftmcli.properties
#saving authkey
#Fri Mar 30 21:15:25 UTC 2018
rest-endpoint=https\://storage-a424392.storage.oraclecloud.com/v1/storage-a424392
retries=5
user=alexey.filanovskiy@oracle.com
segments-container=all_segments
max-threads=15
storage-class=Standard
segment-size=100

Now we have all connection details and we may use CLI as simple as possible. There are few basics commands available with FTMCLI, but as a first step I’d suggest to authenticate a user (put password once):

$   java -jar ftmcli.jar list --save-auth-key
Enter your password:

if you will use –save-auth-key” it will save your password and next time will not ask you for a password:

$   java -jar ftmcli.jar list
123_OOW17
1475233258815
...

You may refer to the documentation for get full list of the commands or simply run ftmcli without any arguments:

$   java -jar ftmcli.jar
...
Commands:
upload            Upload a file or a directory to a container.
download          Download an object or a virtual directory from a container.
create-container  Create a container.
restore           Restore an object from an Archive container.
list              List containers in the account or objects in a container.
delete            Delete a container in the account or an object in a container.
describe          Describes the attributes of a container in the account or an object in a container.
set               Set the metadata attribute(s) of a container in the account or an object in a container.
set-crp           Set a replication policy for a container.
copy              Copy an object to a destination container.

Let’s try to accomplish standart flow for OSS – create container, upload file there, list objects in container,describe container properties and delete it.

# Create container
$   java -jar ftmcli.jar create-container container_for_blog
                  Name: container_for_blog
          Object Count: 0
            Bytes Used: 0
         Storage Class: Standard
         Creation Date: Fri Mar 30 21:50:15 UTC 2018
         Last Modified: Fri Mar 30 21:50:14 UTC 2018
Metadata
---------------
x-container-write: a424392.storage.Storage_ReadWriteGroup
x-container-read: a424392.storage.Storage_ReadOnlyGroup,a424392.storage.Storage_ReadWriteGroup
content-type: text/plain;charset=utf-8
accept-ranges: bytes
Custom Metadata
---------------
x-container-meta-policy-georeplication: container

# Upload file to container
$   java -jar ftmcli.jar upload container_for_blog file.txt
Uploading file: file.txt to container: container_for_blog
File successfully uploaded: file.txt
Estimated Transfer Rate: 16484KB/s

# List files into Container
$   java -jar ftmcli.jar list container_for_blog
file.txt

# Get Container Metadata
$   java -jar ftmcli.jar describe container_for_blog
                  Name: container_for_blog
          Object Count: 1
            Bytes Used: 434
         Storage Class: Standard
         Creation Date: Fri Mar 30 21:50:15 UTC 2018
         Last Modified: Fri Mar 30 21:50:14 UTC 2018

Metadata
---------------
x-container-write: a424392.storage.Storage_ReadWriteGroup
x-container-read: a424392.storage.Storage_ReadOnlyGroup,a424392.storage.Storage_ReadWriteGroup
content-type: text/plain;charset=utf-8
accept-ranges: bytes

Custom Metadata
---------------
x-container-meta-policy-georeplication: container

# Delete container
$   java -jar ftmcli.jar delete container_for_blog
ERROR:Delete failed. Container is not empty.

# Delete with force option
$   java -jar ftmcli.jar delete -f container_for_blog
Container successfully deleted: container_for_blog

Another great thing about FTM CLI is that allows easily manage uploading performance out of the box. In ftmcli.properties there is the property called “max-threads”. It may vary between 1 and 100. Here is testcase illustrates this:

-- Generate 10GB file
$   dd if=/dev/zero of=file.txt count=10240 bs=1048576

-- Upload file in one thread (has around 18MB/sec rate
$   java -jar ftmcli.jar upload container_for_blog /home/opc/file.txt
Uploading file: /home/opc/file.txt to container: container_for_blog
File successfully uploaded: /home/opc/file.txt
Estimated Transfer Rate: 18381KB/s

-- Change number of thrads from 1 to 99 in config file
$   sed -i -e 's/max-threads=1/max-threads=99/g' ftmcli.properties

-- Upload file in 99 threads (has around 68MB/sec rate)
$   java -jar ftmcli.jar upload container_for_blog /home/opc/file.txt
Uploading file: /home/opc/file.txt to container: container_for_blog
File successfully uploaded: /home/opc/file.txt
Estimated Transfer Rate: 68449KB/s

so, it’s very simple and at the same time powerful tool for operations with Object Store, it may help you with scripting of operations. 

Upload/Download files. CloudBerry.

Another way to interact with OSS use some application, for example, you may use CloudBerry Explorer for OpenStack Storage. There is a great blogpost, which explains how to configure CloudBerry for Oracle Object Store Service Classic and I will start from the point where I already configured it. Whenever you log in it looks like this:

You may easily create container in CloudBerry:

And for sure you may easily copy data from your local machine to OSS:

It’s nothing to add here, CloudBerry is convinient tool for browsing Object Stores and do a small copies between local machine and OSS. For me personally, it looks like TotalCommander for a OSS. 

Upload/Download files. Big Data Manager and ODCP.

Big Data Cloud Service (BDCS) has great component called Big Data Manager. This is tool developed by Oracle, which allows you to manage and monitor Hadoop Cluster. Among other features Big Data Manager (BDM) allows you to register Object Store in Stores browser and easily drug and drop data between OSS and other sources (Database, HDFS…). When you copy data to/from HDFS you use optimized version of Hadoop Distcp tool ODCP.

This is very fast way to copy data back and forth. Fortunitely, JP already wrote about this feature and I could just simply give a link. If you want to see concreet performance numbers, you could go here to a-team blog page.

Without Big Data Manager, you could manually register OSS on Linux machine and invoke copy command from bash. Documentation will show you all details and I will show just one example:

# add account:
$   export CM_ADMIN=admin
$   export CM_PASSWORD=SuperSecurePasswordCloderaManager
$   export CM_URL=https://cfclbv8493.us2.oraclecloud.com:7183
$   bda-oss-admin add_swift_cred --swift-username "storage-a424392:alexey.filanovskiy@oracle.com" --swift-password "SecurePasswordForSwift" --swift-storageurl "https://storage-a424392.storage.oraclecloud.com/auth/v2.0/tokens" --swift-provider bdcstorage
# list of credentials:
$   bda-oss-admin list_swift_creds
Provider: bdcstorage
    Username: storage-a424392:alexey.filanovskiy@oracle.com
    Storage URL: https://storage-a424392.storage.oraclecloud.com/auth/v2.0/tokens
# check files on OSS swift://[container name].[Provider created step before]/:
$   hadoop fs -ls  swift://alextest.bdcstorage/
18/03/31 01:01:13 WARN http.RestClientBindings: Property fs.swift.bdcstorage.property.loader.chain is not set
Found 3 items
-rw-rw-rw-   1  279153664 2018-03-07 00:08 swift://alextest.bdcstorage/bigdata.file.copy
drwxrwxrwx   -          0 2018-03-07 00:31 swift://alextest.bdcstorage/customer
drwxrwxrwx   -          0 2018-03-07 00:30 swift://alextest.bdcstorage/customer_address

Now you have OSS, configured and ready to use. You may copy data by ODCP, here you may find entire list of the sources and destinations. For example, if you want to copy data from hdfs to OSS, you have to run:

$   odcp hdfs:///tmp/file.txt swift://alextest.bdcstorage/

ODCP is a very efficient way to move data from HDFS to Object Store and back.

if you are from Hadoop world and you use to Hadoop fs API, you may use it as well with Object Store (configuring it before), for example for load data into OSS, you need to run:

$   hadoop fs -put /home/opc/file.txt swift://alextest.bdcstorage/file1.txt

Upload/Download files. Oracle Storage Cloud Software Appliance.

Object Store is a fairly new concept and for sure there is a way to smooth this migration. Years ago, when HDFS was new and undiscovered, many people didn’t know how to work with it and few technologies, such as NFS-Gateway and HDFS-fuse appears. Both these technology allowed to mount HDFS on Linux filesystem and work with it as with normal filesystem. Something like this allows doing Oracle Cloud Infrastructure Storage Software Appliance. All documentation you could find here, brief video here, download software here. In my blog I just show one example of its usage. This picture will help me to explain how works Storage Cloud Software Appliance:

you may see that customer need to install on-premise docker container, which will have all required stack. I’ll skip all details, which you may find in the documentation above, and will just show a concept.

# Check oscsa status
[on-prem client] $   oscsa info
Management Console: https://docker.oracleworld.com:32769
If you have already configured an OSCSA FileSystem via the Management Console,
you can access the NFS share using the following port.

NFS Port: 32770

Example: mount -t nfs -o vers=4,port=32770 docker.oracleworld.com:/ /local_mount_point

# Run oscsa[on-prem client]$   oscsa up

There (on the docker image, which you deploy on some on-premise machine) you may find WebUI, where you can configure Storage Appliance:

after login, you may see a list of configured Objects Stores:

In this console you may connect linked container with this on-premise host:

after it been connected, you will see option “disconnect”

After you connect a device, you have to mount it:

[on-prem client] $   sudo mount -t nfs -o vers=4,port=32770 localhost:/devoos /oscsa/mnt
[on-prem client] $   df -h|grep oscsa
localhost:/devoos  100T  1.0M  100T   1% /oscsa/mnt

Now you could upload a file into Object Store:

[on-prem client] $   echo "Hello Oracle World" > blog.file
[on-prem client] $   cp blog.file /oscsa/mnt/

This is asynchronous copy to Object Store, so after a while, you will be able to find a file there:

Only one restriction, which I wasn’t able to overcome is that filename is changing during the copy.

Conclusion.

Object Store is here and it will became more and more popular. It means there is no way to escape it and you have to get familiar with it. Blogpost above showed that there are multiple ways to deal with it, strting from user friendly (like CloudBerry) and ending on the low level REST API.

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Big Data SQL Quick Start. Correlate real-time data with historiacal benchmarks – Part 24

In Big Data SQL 3.2 we have introduced new capability – Kafka as a data source. Some details about how it works with some simple examples, I’ve posted over here. But now I want to talk about why do you want to run queries over Kafka. Here is Oracle concept picture on Datawarehouse:

You have some stream (real-time data), data lake where you land raw information and cleaned Enterprise data. This is just a concept, which could be implemented in many different ways, one of this depict here:

Kafka is the hub for streaming events, where you accumulate data from multiple real-time producers and provide this data to many consumers (it could be real-time processing, such as Spark-Streaming or you could load data in batch mode to the next Datawarehouse tier, such as Hadoop). 

In this architecture, Kafka contains stream data and it’s able to answer the question “what is going on right now”, whereas in Database you store operational data, in Hadoop historical and those two sources are able to answer the question “how it use to be”. Big Data SQL allows you to run the SQL over those tree sources and correlate real-time events with historical.

Example of using Big Data SQL over Kafka and other sources.

So, above I’ve explained the concept why you may need to query Kafka with Big Data SQL, now let me give a concrete example. 

Input for demo example:

- We have company, called MoviePlex, which sells video content all around the world

- There are two stream datasets – network data, which contains information about network errors, conditions of routing devices and so. The second data source is the fact of the movie sales. 

- Both stream data in real-time in Kafka

- Also, we have historical network data, which we store in HDFS (because of the cost of this data), historical sales data (which we store in database) and multiple dimension tables, stored in RDBMS as well.

Based on this we have a business case – monitor revenue flow, correlate current traffic with the historical benchmark (depend on Day of the Week and Hour of the Day) and try to find the reason in case of failures (network errors, for example).

Using Oracle Data Visualization Desktop, we’ve created a dashboard, which shows how real-time traffic correlate with statistical and also, shows a number of network errors based on the countries:

The blue line is a historical benchmark.

Over the time we see that some errors appear in some countries (left dashboard), but current revenue is more or less the same as it uses to be.

After a while revenue starts going down.

This trend keeps going.

A lot of network errors in France. Let’s drill down into itemized traffic:

Indeed, we caught that overall revenue goes down because of France and cause of this is some network errors.

Conclusion:

1) Kafka stores real-time data  and answers on question “what is going on right now”

2) Database and Hadoop stores historical data and answers on the question: “how it use to be”

3) Big Data SQL could query the data from Kafka, Hadoop, Database within single query (Join the datasets)

4) This fact allows us to correlate historical benchmarks with real-time data within SQL interface and use this with any SQL compatible BI tool 

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Review of Big Data Warehousing at OpenWorld 2017 – Now Available

 Review of Big Data Warehousing at OpenWorld 2017   Now Available

Did you miss OpenWorld 2017? Then my latest book is definitely something you will want to download! If you went to OpenWorld this book is also for you because it covers all the most important big data warehousing messages and sessions during the five days of OpenWorld.

Following on from OpenWorld 2017 I have put together a comprehensive review of all the big data warehousing content from OpenWorld 2017. This includes all the key sessions and announcements from this year’s Oracle OpenWorld conference. This review guide contains the following information:

Chapter 1 Welcome – an overview of the contents.  

Chapter 2 Let’s Go Autonomous - containing all you need to know about Oracle’s new, fully-managed Autonomous Data Warehouse Cloud. This was the biggest announcement at OpenWorld so this chapter contains videos, presentations and podcasts to get you up to speed on this completely new data warehouse cloud service.

Chapter 3 Keynotes – Relive OpenWorld 2017 by watching the most important highlights from this year’s OpenWorld conference with our on demand video service which covers all the major keynote sessions.

Chapter 4 Key Presenters – a list of the most important speakers by product area such as database, cloud, analytics, developer and big data. Each biography includes all relevant social media sites and pages.

Chapter 5 Key Sessions - a list of all the most important sessions with links to download the related presentations organized

Chapter 6 Staying Connected – Details of all the links you need to keep up to date on Oracle’s strategy and products for Data Warehousing and Big Data.  This covers all our websites, blogs and social media pages.

This review is available in three formats:

1) For highly evolved users, i.e. Apple users, who understand the power of Apple’s iBook format, your multi-media enabled iBook version is available here.

2) For Windows users who are forced to endure a 19th-Century style technological experience, your PDF version is available here.

3) For Linux users, Oracle DBAs and other IT dinosaurs, all of whom are allergic to all graphical user interfaces, the basic version of this comprehensive review is available here.

I hope you enjoy this review and look forward to seeing you next year at OpenWorld 2018, October 28 to November 1. If you’d like to be notified when registration opens for next year’s Oracle OpenWorld then register your email address here.
 

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

New Release: BDA 4.10 is now Generally Available

As of today, BDA version 4.10 is Generally Available. As always, please refer to If You Struggle With Keeping your BDAs up to date, Then Read Thisto learn about the innovative release process we do for BDA software.

This new release includes a number of features and updates:

  • Support for Migration From Oracle Linux 5 to Oracle Linux 6 - Clusters on Oracle Linux 5 must first be upgraded to v4.10.0 on Oracle Linux 5 and can then be migrated to Oracle Linux 6. This process must be done one server at a time. HDFS data and Cloudera Manager roles are retained. Please review the documentation for the entire process carefully before starting.

    • BDA v4.10 is the last release built for Oracle Linux 5 and no further upgrades for Oracle Linux 5 will be released.
  • Updates to NoSQL DB, Big Data Connectors, Big Data Spatial & Graph
    • Oracle NoSQL Database 4.5.12
    • Oracle Big Data Connectors 4.10.0
    • Oracle Big Data Spatial & Graph 2.4.0
  • Support for Oracle Big Data Appliance X-7 systems – Oracle Big Data Appliance X7 is based on the X7–2L server. The major enhancements in Big Data Appliance X7–2 hardware are:

    • CPU update: 2 24–core Intel Xeon processor
    • Updated disk drives: 12 10TB 7,200 RPM SAS drives
    • 2 M.2 150GB SATA SSD drives (replacing the internal USB drive)
    • Vail Disk Controller (HBA)
    • Cisco 93108TC-EX–1G Ethernet switch (replacing the Catalyst 4948E).
  • Spark 2 Deployed by Default – Spark 2 is now deployed by default on new clusters and also during upgrade of clusters where it is not already installed.
  • Oracle Linux 7 can be Installed on Edge Nodes – Oracle Linux 7 is now supported for installation on Oracle Big Data Appliance edge nodes running on X7–2L, X6–2L or X5–2L servers. Support for Oracle Linux 7 in this release is limited to edge nodes.
  • Support for Cloudera Data Science Workbench – Support for Oracle Linux 7 on edge nodes provides a way for customers to host Cloudera Data Science Workbench (CDSW) on Oracle Big Data Appliance. CDSW is a web application that enables access from a browser to R, Python, and Scala on a secured cluster. Oracle Big Data Appliance does not include licensing or official support for CDSW. Contact Cloudera for licensing requirements.
     
  • Scripts for Download & Configuration of Apache Zeppelin, Jupyter Notebook, and RStudio –  This release includes scripts to assist in download and configuration of these commonly used tools. The scripts are provided as a convenience to users. Oracle Big Data Appliance does not include official support for the installation and use of Apache Zeppelin, Jupyter Notebook, or RStudio.
     
  • Improved Configuration of Oracle’s R Distribution and ORAAH –  For these tools, much of the environment configuration that was previous done by the customer is now automated.
  • Node Migration Optimization – Node migration time has been improved by eliminating some steps.
  • Support for Extending Secure NoSQL DB clusters

This release is based on Cloudera Enterprise (CDH 5.12.1 & Cloudera Manager 5.12.1) as well as Oracle NoSQL Database (4.5.12).

  • Cloudera 5 Enterprise includes CDH (Core Hadoop), Cloudera Manager, Apache Spark, Apache HBase, Impala, Cloudera Search and Cloudera Navigator
  • The BDA continues to support all security options for CDH Hadoop clusters : Kerberos authentication – MIT or Microsoft Active Directory, Sentry Authorization, HTTPS/Network encryption, Transparent HDFS Disk Encryption, Secure Configuration for Impala, HBase , Cloudera Search and all Hadoop services configured out-of-the-box.
  • Parcels for Kafka 2.2, Spark 2.2, Kudu 1.4 and Key Trustee Server 5.12 are included in the BDA Software Bundle

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

OpenWord 2017: Must-See Sessions for Day 3 – Tuesday

Day 3, Tuesday, is here and this is my definitive list of Must-See sessions for today. Today we are focused on the new features in Oracle Database 18c – multitenant, in-memory, Oracle Text, machine learning, Big Data SQL etc etc. These sessions are what Oracle OpenWorld is all about: the chance to learn about the latest technology from the real technical experts.

MONDAY’s MUST-SEE GUIDE

Don’t worry if you are not able to join us in San Francisco for this year’s conference because I will be providing a comprehensive review after the conference closes on Thursday.

The review will include links to download the presentations for each of my Must-See sessions and links to any hands-on lab content as well.

Have a great conference.

If you are here in San Francisco then enjoy the conference – it’s going to be an awesome conference this year.

Don’t forget to make use of our Big DW #oow17 smartphone app which you can access by pointing your phone at this QR code:

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

OpenWord 2017 – Must-See Sessions for Day 1

aaeaaqaaaaaaaawfaaaajgq3ymnjmdkxlwyyywitnde5nc05njnilwjmnzm4ndexzthmmq OpenWord 2017   Must See Sessions for Day 1

It all starts today –  OpenWorld 2017. Each day I will provide you with a list of must-see sessions and hands-on labs. This is going to be one of the most exciting OpenWorlds ever!

Today is Day 1 so here here is my definitive list of Must-See sessions for the opening day. The list is packed full of really excellent speakers such as Franck PachotAmi AharonovichGalo Balda and Rich Niemiec. These sessions are what Oracle OpenWorld is all about: the chance to learn from the real technical experts.

Of course you need to end your first day in Moscone North Hall D for Larry Ellison’s welcome keynote – it’s going to be a  great one!
 

SUNDAY’S MUST-SEE GUIDE

Don’t worry if you are not able to join us in San Francisco for this year’s conference because I will be providing a comprehensive review after the conference closes on Thursday.

The review will include links to download the presentations for each of my Must-See sessions and links to any hands-on lab content as well. Have a great conference.

If you are here in San Francisco then enjoy the conference – it’s going to be an awesome conference this year.

Don’t forget to make use of our Big DW #oow17 smartphone app which you can access by pointing your phone at this QR code:
 

qrcode.41572804 OpenWord 2017   Must See Sessions for Day 1

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog