Category Archives: Data Warehousing

When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with …

[unable to retrieve full-text content]

image thumb18 When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ...

A dataset with all sessions of the upcoming Oracle OpenWorld 2017 conference is nice to have – for experiments and demonstrations with many technologies. The session catalog is exposed at a website here.

With searching, filtering and scrolling, all available sessions can be inspected. If data is available in a browser, it can be retrieved programmatically and persisted locally in for example a JSON document. A typical approach for this is web scraping: having a server side program act like a browser, retrieve the HTML from the web site and query the data from the response. This process is described for example in this article – https://codeburst.io/an-introduction-to-web-scraping-with-node-js-1045b55c63f7 – for Node and the Cheerio library.

However, server side screen scraping of HTML will only be successful when the HTML is static. Dynamic HTML is constructed in the browser by executing JavaScript code that manipulates the browser DOM. If that is the mechanism behind a web site, server side scraping is at the very least considerably more complex (as it requires the server to emulate a modern web browser to a large degree). Selenium has been used in such cases – to provide a server side, programmatically accessible browser engine. Alternatively, screen scraping can also be performed inside the browser itself – as is supported for example by the Getsy library.

As you will find in this article – when server side scraping fails, client side scraping may be a much to complex solution. It is very well possible that the rich client web application is using a REST API that provides the data as a JSON document. An API that our server side program can also easily leverage. That turned out the case for the OOW 2017 website – so instead of complex HTML parsing and server side or even client side scraping, the challenge at hand resolves to nothing more than a little bit of REST calling. Read the complete article here.

PaaS Partner Community

For regular information on business process management and integration become a member in the SOA & BPM Partner Community for registration please visit www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center.

177013 When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Blog twitter on When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Twitter linkedin on When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... LinkedIn  When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Facebook  When Screen Scraping became API calling – Gathering Oracle OpenWorld Session Catalog with ... Wiki

Technorati Tags: SOA Community,Oracle SOA,Oracle BPM,OPN,Jürgen Kress

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Loading Data to the Object Store for Autonomous Data Warehouse Cloud

So you got your first service instance of your autonomous data warehouse set up, you experienced the performance of the environment using the sample data, went through all tutorials and videos and are getting ready to rock-n-roll. But the one thing you’re not sure about is this Object Store. Yes, you used it successfully as described in the tutorial, but what’s next?. And what else is there to know about the Object Store?

First and foremost, if you are interested in understanding a bit more about what this Object Store is, you should read the following blog post from Rachna, the Product Manager for the Object Store among other things. It introduces the Object Store, how to set it up and manage files with the UI, plus a couple of simple command line examples (don’t get confused by the term ‘BMC’, that’s the old name of Oracle’s Cloud Infrastructure; that’s true for the command line utility as well, which is now called oci). You should read that blog post to get familiar with the basic concepts of the Object Store and a cloud account (tenant).

The documentation and blog posts are great, but now you actually want to do use it to load data into ADWC.  This means loading more (and larger) files, more need for automation, and more flexibility.  This post will focus on exactly that: to become productive with command line utilities without being a developer, and to leverage the power of the Oracle Object Store to upload more files in one go and even how to upload larger files in parallel without any major effort.

The blog post will cover both:

  • The Oracle oci command line interface for managing files
  • The Swift REST interface for managing files

 

Using the oci command line interface

The Oracle oci command line interface (CLI) is a tool that enables you to work with Oracle Cloud Infrastructure objects and services. It’s a thin layer on top of the oci APIs (typically REST) and one of Oracle’s open source project (the source code is on GitHub).

Let’s quickly step through what you have to do for using this CLI. If you do not want to install anything, that is fine, too. In that case feel free to jump to the REST section in this post right away, but you’re going to miss out on some cool stuff that the CLI provides you out of the box.

To get going with the utility is really simple, as simple as one-two-three

  1. Install oci cli following the installation instructions on github.
    I just did this on an Oracle Linux 7.4 VM instance that I created in the Oracle Cloud and had the utility up and running in no time.
     
  2. Configure your oci cli installation.
    You need a user created in the Oracle Cloud account that you want to use, and that user must have the appropriate privileges to work with the object store. A keypair is used for signing API requests, with the public key uploaded to Oracle. Only the user calling the API should possess the private key. All this is described in the configuration section of the CLI. 

    That is probably the part that takes you the most time of the setup. You have to ensure to have UI console access when doing this since you have to upload the public key for your user.
     

  3. Use oci cli.
    After successful setup you can use the command line interface to manage your buckets for storing all your files in the Cloud, among other things.

 

First steps with oci cli

The focus of the command line interface is on ease-of-use and to make its usage as self-explaining as possible, with a comprehensive built-in help system in the utility. Whenever you want to know something without looking around, use the –help, -h, or -? Syntax for a command, irrespective of how many parameters you have already entered. So you can start with oci -h and let the utility guide you.

For the purpose of file management the important category is the object store category, with the main tasks of:

  • Creating, managing, and deleting buckets
    This task is probably done by an administrator for you, but we will cover it briefly nevertheless
     
  • Uploading, managing, and downloading objects (files)
    That’s your main job in the context of the Autonomous Data Warehouse Cloud

That’s what we are going to do now.

Creating a bucket

Buckets are containers that store objects (files). Like other resources, buckets belong to a compartment, a collection of resources in the Cloud that can be used as entity for privilege management. To create a bucket you have to know the compartment id to create a bucket. That is the only time we have to deal with this cloud-specific unique identifiers. All other object (file) operations use names.

So let’s create a bucket. The following creates a bucket named myFiles in my account ADWCACCT in a compartment given to me by the Cloud administrator.

$ oci os bucket create --compartment-id ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq --namespace-name adwcaact --name myFiles

{
  “data”: {
    “compartment-id”: “ocid1.tenancy.oc1..aaaaaaaanwcasjdhfsbw64mt74efh5hneavfwxko7d5distizgrtb3gzj5vq”,
    “created-by”: “ocid1.user.oc1..aaaaaaaaomoqtk3z7y43543cdvexq3y733pb5qsuefcbmj2n5c6ftoi7zygq”,
    “etag”: “c6119bd6-98b6-4520-a05b-26d5472ea444″,
    “metadata”: {},
    “name”: “myFiles”,
    “namespace”: “adwcaact”,
    “public-access-type”: “NoPublicAccess”,
    “storage-tier”: “Standard”,
    “time-created”: “2018-02-26T22:16:30.362000+00:00″
  },
  “etag”: “c6119bd6-98b6-4520-a05b-26d5472ea733″
}

The operation returns with the metadata of the bucket after successful creation. We’re ready to upload and manage files in the object store.

Upload your first file with oci cli

You can upload a single file very easily with the oci command line interface. And, as promised before, you do not even have to remember any ocid in this case … .

$ oci os object put --namespace adwcacct --bucket-name myFiles --file /stage/supplier.tbl

Uploading object  [####################################]  100%
{
  “etag”: “662649262F5BC72CE053C210C10A4D1D”,
  “last-modified”: “Mon, 26 Feb 2018 22:50:46 GMT”,
  “opc-content-md5″: “8irNoabnPldUt72FAl1nvw==”
}

After successful upload you can check the md5 sum of the file; that’s basically the fingerprint that the data on the other side (in the cloud) is not corrupt and the same than local (on the machine where the data is coming from). The only “gotcha” is that OCI is using base64 encoding, so you cannot just do a simple md5. The following command solves this for me on my Mac:

$ openssl dgst -md5 -binary supplier.tbl |openssl enc -base64
8irNoabnPldUt72FAl1nvw==

Now that’s a good start. I can use this command in any shell program, like the following which loads all files in a folder sequentially to the object store: 

for i in `ls *.tbl`
do
  oci os object put --namespace adwcacct --bucket-name myFiles --file $ i
done

You can write it to load multiple files in parallel, load only files that match a specific name pattern, etc. You get the idea. Whatever you can do with a shell you can do.

Alternatively, if it’s just about loading all the files in  you can achieve the same with the oci cli as well by using its bulk upload capabilities. The following shows briefly

oci os object bulk-upload -ns adwcacct -bn myFiles --src-dir /MyStagedFiles

{
  "skipped-objects": [],
  "upload-failures": {},
  "uploaded-objects": {
    "chan_v3.dat": {
      "etag": "674EFB90B1A3CECAE053C210D10AC9D9",
      "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
      "opc-content-md5": "/t4LbeOiCz61+Onzi/h+8w=="
    },
    "coun_v3.dat": {
      "etag": "674FB97D50C34E48E053C230C10A1DF8",
      "last-modified": "Tue, 13 Mar 2018 17:43:28 GMT",
      "opc-content-md5": "sftu7G5+bgXW8NEYjFNCnQ=="
    },
    "cust1v3.dat": {
      "etag": "674FB97D52274E48E053C210C10A1DF8",
      "last-modified": "Tue, 13 Mar 2018 17:44:06 GMT",
      "opc-content-md5": "Zv76q9e+NTJiyXU52FLYMA=="
    },
    "sale1v3.dat": {
      "etag": "674FBF063F8C50ABE053C250C10AE3D3",
      "last-modified": "Tue, 13 Mar 2018 17:44:52 GMT",
      "opc-content-md5": "CNUtk7DJ5sETqV73Ag4Aeg=="
    }
  }
}

Uploading a single large file in parallel 

Ok, now we can load one or many files to the object store. But what do you do if you have a single large file that you want to get uploaded? The oci command line offers built-in multi-part loading where you do not need to split the file beforehand. The command line provides you built-in capabilities to (A) transparently split the file into sized parts and (B) to control the parallelism of the upload.

$ oci os object put -ns adwcacct -bn myFiles --file lo_aa.tbl --part-size 100 --parallel-upload-count 4

While the load is ongoing you can all in-progress uploads, but unfortunately without any progress bar or so; the progress bar is reserved for the initiating session: 

$ oci os multipart list -ns adwcacct -bn myFiles
{
  "data":
   [    
    {
      "bucket": "myFiles",
      "namespace": "adwcacct",
      "object": "lo_aa.tbl",
      "time-created": "2018-02-27T01:19:47.439000+00:00",
      "upload-id": "4f04f65d-324b-4b13-7e60-84596d0ef47f"
    }
  ]
}

While a serial process for a single file gave me somewhere around 35 MB/sec to upload on average, the parallel load sped up things quite a bit, so it’s definitely cool functionality (note that your mileage will vary and is probably mostly dependent on your Internet/proxy connectivity and bandwidth. 

If you’re interested in more details about how that works, here is a link from Rachna who explains the inner details of this functionality in more detail.

Using the Swift REST interface

Now after having covered the oci utility, let’s briefly look into what we can do out of the box, without the need to install anything. Yes, without installing anything you can leverage REST endpoints of the object storage service. All you need to know is your username/SWIFT password and your environment details, e.g. which region your uploading to, the account (tenant) and the target bucket. 

This is where the real fun starts, and this is where it can become geeky, so we will focus only on the two most important aspects of dealing with files and the object store: uploading and downloading files.

Understanding how to use Openstack Swift REST

File management with REST is equally simple than it is with the oci cli command. Similar to the setup of the oci cli, you have to know the basic information about your Cloud account, namely: 

  • a user in the cloud account that has the appropriate privileges to work with a bucket in your tenancy. This user also has to be configured to have a SWIFT password (see here how that is done).
  • a bucket in one of the object stores in a region (we are not going to discuss how to use REST to do this). The bucket/region defines the rest endpoint, for example if you are using the object store in Ashburn, VA, the endpoint is https://swiftobjectstorage.us-ashburn-1.oraclecloud.com)

The URI for accessing your bucket is built as follows:

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Object Store Service operations. Part 1 – Loading data

One of the most common and clear trends in the IT market is Cloud and one of the most common and clear trends in the Cloud is Object Store. Some introduction information you may find here. Many Cloud providers, including Oracle, assumes, that data lifecycle starts from Object Store:

You land it there and then either read or load it by different services, such as ADWC or BDCS, for example. Oracle has two flavors of Object Store Services (OSS), OSS on OCI (Oracle Cloud Infrastructure) and OSS on OCI -C (Oracle Cloud  Infrastructure Classic). 

In this post, I’m going to focus on OSS on OCI-C, mostly because OSS on OCI, was perfectly explained by Hermann Baer here and by Rachna Thusoo here.

Upload/Download files.

As in Hermann’s blog, I’ll focus on most frequent operations Upload and Download. There are multiple ways to do so. For example:

- Oracle Cloud WebUI

- REST API

- FTM CLI tool

- Third Part tools such as CloudBerry

- Big Data Manager (via ODCP)

- Hadoop client with Swift API

- Oracle Storage Software Appliance

Let’s start with easiest one – Web Interface.

Upload/Download files. WebUI.

For sure you have to start with Log In to cloud services:

then, you have to go to Object Store Service:

after this drill down into Service Console and you will be able to see list of the containers within your OSS:

To create a new container (bucket in OCI terminology), simply click on “Creare Container” and give a name to it:

After it been created, click on it and go to “Upload object” button:

Click and Click again and here we are, file in the container:

Let’s try to upload a bigger file, ops… we got an error:

So, seems we have 5GB limitations. Fortunitely, we could have “Large object upload”:

Which will allow us to uplod file bigger than 5GB:

so, and what about downloading? It’s easy, simply click download and land file on local files system.

Upload/Download files. REST API.

WebUI maybe a good way to upload data, when a human operates with it, but it’s not too convenient for scripting. If you want to automate your file uploading, you may use REST API. You may find all details regarding REST API here, but alternatively, you may use this script, which I’m publishing below and it could hint you some basic commands:

#!/bin/bash
shopt -s expand_aliases

alias echo="echo -e"

USER="alexey.filanovskiy@oracle.com"
PASS="MySecurePassword"

OSS_USER="storage-a424392:$  {USER}"
OSS_PASS="$  {PASS}"
OSS_URL="https://storage-a424392.storage.oraclecloud.com/auth/v1.0"

echo "curl -k -sS -H \"X-Storage-User: $  {OSS_USER}\" -H \"X-Storage-Pass:$  {OSS_PASS}\" -i \"$  {OSS_URL}\""
out=`curl -k -sS -H "X-Storage-User: $  {OSS_USER}" -H "X-Storage-Pass:$  {OSS_PASS}" -i "$  {OSS_URL}"`
while [ $  ? -ne 0 ]; do
echo "Retrying to get token\n"
        sleep 1;
        out=`curl -k -sS -H "X-Storage-User: $  {OSS_USER}" -H "X-Storage-Pass:$  {OSS_PASS}" -i "$  {OSS_URL}"`
done

AUTH_TOKEN=`echo "$  {out}" | grep "X-Auth-Token" | sed 's/X-Auth-Token: //;s/\r//'`
STORAGE_TOKEN=`echo "$  {out}" | grep "X-Storage-Token" | sed 's/X-Storage-Token: //;s/\r//'`
STORAGE_URL=`echo "$  {out}" | grep "X-Storage-Url" | sed 's/X-Storage-Url: //;s/\r//'`

echo "Token and storage URL:"
echo "\tOSS url:       $  {OSS_URL}"
echo "\tauth token:    $  {AUTH_TOKEN}"
echo "\tstorage token: $  {STORAGE_TOKEN}"
echo "\tstorage url:   $  {STORAGE_URL}"

echo "\nContainers:"
for CONTAINER in `curl -k -sS -u "$  {USER}:$  {PASS}" "$  {STORAGE_URL}"`; do
echo "\t$  {CONTAINER}"
done

FILE_SIZE=$  ((1024*1024*1))
CONTAINER="example_container"
FILE="file.txt"
LOCAL_FILE="./$  {FILE}"
FILE_AT_DIR="/path/file.txt"
LOCAL_FILE_AT_DIR=".$  {FILE_AT_DIR}"
REMOTE_FILE="$  {CONTAINER}/$  {FILE}"
REMOTE_FILE_AT_DIR="$  {CONTAINER}$  {FILE_AT_DIR}"


for f in "$  {LOCAL_FILE}" "$  {LOCAL_FILE_AT_DIR}"; do
        if [ ! -e "$  {f}" ]; then
echo "\nInfo: File "$  {f}" does not exist. Creating $  {f}"
                d=`dirname "$  {f}"`
                mkdir -p "$  {d}";
                tr -dc A-Za-z0-9 </dev/urandom | head -c <span style="background-color: #fff0f0">"$  {FILE_SIZE}"</span> > "$  {f}"
                #dd if="/dev/random" of="$  {f}" bs=1 count=0 seek=$  {FILE_SIZE} &> /dev/null
        fi;
done;

echo "\nActions:"

echo "\tListing containers:\t\t\t\tcurl -k -vX GET -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/\""
echo "\tCreate container \"oss://$  {CONTAINER}\":\t\tcurl -k -vX PUT -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {CONTAINER}\""
echo "\tListing objects at container \"oss://$  {CONTAINER}\":\tcurl -k -vX GET -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {CONTAINER}/\""

echo "\n\tUpload \"$  {LOCAL_FILE}\" to \"oss://$  {REMOTE_FILE}\":\tcurl -k -vX PUT -T \"$  {LOCAL_FILE}\" -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {CONTAINER}/\""
echo "\tDownload \"oss://$  {REMOTE_FILE}\" to \"$  {LOCAL_FILE}\":\tcurl -k -vX GET -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {REMOTE_FILE}\" > \"$  {LOCAL_FILE}\""

echo "\n\tDelete \"oss://$  {REMOTE_FILE}\":\tcurl -k -vX DELETE -u \"$  {USER}:$  {PASS}\" \"$  {STORAGE_URL}/$  {REMOTE_FILE}\""

echo "\ndone"

I put the content of this script into file oss_operations.sh, give execute permission and run it:

$   chmod +x oss_operations.sh
$   ./oss_operations.sh

the output will look like:

curl -k -sS -H "X-Storage-User: storage-a424392:alexey.filanovskiy@oracle.com" -H "X-Storage-Pass:MySecurePass" -i "https://storage-a424392.storage.oraclecloud.com/auth/v1.0"
Token and storage URL:
        OSS url:       https://storage-a424392.storage.oraclecloud.com/auth/v1.0
        auth token:    AUTH_tk45d49d9bcd65753f81bad0eae0aeb3db
        storage token: AUTH_tk45d49d9bcd65753f81bad0eae0aeb3db
        storage url:   https://storage.us2.oraclecloud.com/v1/storage-a424392

Containers:
        123_OOW17
        1475233258815
        1475233258815-segments
        Container
...
Actions:
        Listing containers:                             curl -k -vX GET -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/"
        Create container "oss://example_container":             curl -k -vX PUT -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container"
        Listing objects at container "oss://example_container": curl -k -vX GET -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/"

        Upload "./file.txt" to "oss://example_container/file.txt":      curl -k -vX PUT -T "./file.txt" -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/"
        Download "oss://example_container/file.txt" to "./file.txt":    curl -k -vX GET -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/file.txt" > "./file.txt"

        Delete "oss://example_container/file.txt":      curl -k -vX DELETE -u "alexey.filanovskiy@oracle.com:OP400#bdcs" "https://storage.us2.oraclecloud.com/v1/storage-a424392/example_container/file.txt"

Upload/Download files. FTM CLI.

REST API may seems a bit cumbersome and quite hard to use. But there is a good news that there is kind of intermediate solution Command Line Interface – FTM CLI. Again, here is the full documentation available here, but I’d like briefly explain what you could do with FTM CLI. You could download it here and after unpacking, it’s ready to use!

$   unzip ftmcli-v2.4.2.zip
...
$   cd ftmcli-v2.4.2
$   ls -lrt
total 120032
-rwxr-xr-x 1 opc opc      1272 Jan 29 08:42 README.txt
-rw-r--r-- 1 opc opc  15130743 Mar  7 12:59 ftmcli.jar
-rw-rw-r-- 1 opc opc 107373568 Mar 22 13:37 file.txt
-rw-rw-r-- 1 opc opc       641 Mar 23 10:34 ftmcliKeystore
-rw-rw-r-- 1 opc opc       315 Mar 23 10:34 ftmcli.properties
-rw-rw-r-- 1 opc opc    373817 Mar 23 15:24 ftmcli.log

You may note that there is file ftmcli.properties, it may simplify your life if you configure it once. Documentation you may find here and my example of this config:

$   cat ftmcli.properties
#saving authkey
#Fri Mar 30 21:15:25 UTC 2018
rest-endpoint=https\://storage-a424392.storage.oraclecloud.com/v1/storage-a424392
retries=5
user=alexey.filanovskiy@oracle.com
segments-container=all_segments
max-threads=15
storage-class=Standard
segment-size=100

Now we have all connection details and we may use CLI as simple as possible. There are few basics commands available with FTMCLI, but as a first step I’d suggest to authenticate a user (put password once):

$   java -jar ftmcli.jar list --save-auth-key
Enter your password:

if you will use –save-auth-key” it will save your password and next time will not ask you for a password:

$   java -jar ftmcli.jar list
123_OOW17
1475233258815
...

You may refer to the documentation for get full list of the commands or simply run ftmcli without any arguments:

$   java -jar ftmcli.jar
...
Commands:
upload            Upload a file or a directory to a container.
download          Download an object or a virtual directory from a container.
create-container  Create a container.
restore           Restore an object from an Archive container.
list              List containers in the account or objects in a container.
delete            Delete a container in the account or an object in a container.
describe          Describes the attributes of a container in the account or an object in a container.
set               Set the metadata attribute(s) of a container in the account or an object in a container.
set-crp           Set a replication policy for a container.
copy              Copy an object to a destination container.

Let’s try to accomplish standart flow for OSS – create container, upload file there, list objects in container,describe container properties and delete it.

# Create container
$   java -jar ftmcli.jar create-container container_for_blog
                  Name: container_for_blog
          Object Count: 0
            Bytes Used: 0
         Storage Class: Standard
         Creation Date: Fri Mar 30 21:50:15 UTC 2018
         Last Modified: Fri Mar 30 21:50:14 UTC 2018
Metadata
---------------
x-container-write: a424392.storage.Storage_ReadWriteGroup
x-container-read: a424392.storage.Storage_ReadOnlyGroup,a424392.storage.Storage_ReadWriteGroup
content-type: text/plain;charset=utf-8
accept-ranges: bytes
Custom Metadata
---------------
x-container-meta-policy-georeplication: container

# Upload file to container
$   java -jar ftmcli.jar upload container_for_blog file.txt
Uploading file: file.txt to container: container_for_blog
File successfully uploaded: file.txt
Estimated Transfer Rate: 16484KB/s

# List files into Container
$   java -jar ftmcli.jar list container_for_blog
file.txt

# Get Container Metadata
$   java -jar ftmcli.jar describe container_for_blog
                  Name: container_for_blog
          Object Count: 1
            Bytes Used: 434
         Storage Class: Standard
         Creation Date: Fri Mar 30 21:50:15 UTC 2018
         Last Modified: Fri Mar 30 21:50:14 UTC 2018

Metadata
---------------
x-container-write: a424392.storage.Storage_ReadWriteGroup
x-container-read: a424392.storage.Storage_ReadOnlyGroup,a424392.storage.Storage_ReadWriteGroup
content-type: text/plain;charset=utf-8
accept-ranges: bytes

Custom Metadata
---------------
x-container-meta-policy-georeplication: container

# Delete container
$   java -jar ftmcli.jar delete container_for_blog
ERROR:Delete failed. Container is not empty.

# Delete with force option
$   java -jar ftmcli.jar delete -f container_for_blog
Container successfully deleted: container_for_blog

Another great thing about FTM CLI is that allows easily manage uploading performance out of the box. In ftmcli.properties there is the property called “max-threads”. It may vary between 1 and 100. Here is testcase illustrates this:

-- Generate 10GB file
$   dd if=/dev/zero of=file.txt count=10240 bs=1048576

-- Upload file in one thread (has around 18MB/sec rate
$   java -jar ftmcli.jar upload container_for_blog /home/opc/file.txt
Uploading file: /home/opc/file.txt to container: container_for_blog
File successfully uploaded: /home/opc/file.txt
Estimated Transfer Rate: 18381KB/s

-- Change number of thrads from 1 to 99 in config file
$   sed -i -e 's/max-threads=1/max-threads=99/g' ftmcli.properties

-- Upload file in 99 threads (has around 68MB/sec rate)
$   java -jar ftmcli.jar upload container_for_blog /home/opc/file.txt
Uploading file: /home/opc/file.txt to container: container_for_blog
File successfully uploaded: /home/opc/file.txt
Estimated Transfer Rate: 68449KB/s

so, it’s very simple and at the same time powerful tool for operations with Object Store, it may help you with scripting of operations. 

Upload/Download files. CloudBerry.

Another way to interact with OSS use some application, for example, you may use CloudBerry Explorer for OpenStack Storage. There is a great blogpost, which explains how to configure CloudBerry for Oracle Object Store Service Classic and I will start from the point where I already configured it. Whenever you log in it looks like this:

You may easily create container in CloudBerry:

And for sure you may easily copy data from your local machine to OSS:

It’s nothing to add here, CloudBerry is convinient tool for browsing Object Stores and do a small copies between local machine and OSS. For me personally, it looks like TotalCommander for a OSS. 

Upload/Download files. Big Data Manager and ODCP.

Big Data Cloud Service (BDCS) has great component called Big Data Manager. This is tool developed by Oracle, which allows you to manage and monitor Hadoop Cluster. Among other features Big Data Manager (BDM) allows you to register Object Store in Stores browser and easily drug and drop data between OSS and other sources (Database, HDFS…). When you copy data to/from HDFS you use optimized version of Hadoop Distcp tool ODCP.

This is very fast way to copy data back and forth. Fortunitely, JP already wrote about this feature and I could just simply give a link. If you want to see concreet performance numbers, you could go here to a-team blog page.

Without Big Data Manager, you could manually register OSS on Linux machine and invoke copy command from bash. Documentation will show you all details and I will show just one example:

# add account:
$   export CM_ADMIN=admin
$   export CM_PASSWORD=SuperSecurePasswordCloderaManager
$   export CM_URL=https://cfclbv8493.us2.oraclecloud.com:7183
$   bda-oss-admin add_swift_cred --swift-username "storage-a424392:alexey.filanovskiy@oracle.com" --swift-password "SecurePasswordForSwift" --swift-storageurl "https://storage-a424392.storage.oraclecloud.com/auth/v2.0/tokens" --swift-provider bdcstorage
# list of credentials:
$   bda-oss-admin list_swift_creds
Provider: bdcstorage
    Username: storage-a424392:alexey.filanovskiy@oracle.com
    Storage URL: https://storage-a424392.storage.oraclecloud.com/auth/v2.0/tokens
# check files on OSS swift://[container name].[Provider created step before]/:
$   hadoop fs -ls  swift://alextest.bdcstorage/
18/03/31 01:01:13 WARN http.RestClientBindings: Property fs.swift.bdcstorage.property.loader.chain is not set
Found 3 items
-rw-rw-rw-   1  279153664 2018-03-07 00:08 swift://alextest.bdcstorage/bigdata.file.copy
drwxrwxrwx   -          0 2018-03-07 00:31 swift://alextest.bdcstorage/customer
drwxrwxrwx   -          0 2018-03-07 00:30 swift://alextest.bdcstorage/customer_address

Now you have OSS, configured and ready to use. You may copy data by ODCP, here you may find entire list of the sources and destinations. For example, if you want to copy data from hdfs to OSS, you have to run:

$   odcp hdfs:///tmp/file.txt swift://alextest.bdcstorage/

ODCP is a very efficient way to move data from HDFS to Object Store and back.

if you are from Hadoop world and you use to Hadoop fs API, you may use it as well with Object Store (configuring it before), for example for load data into OSS, you need to run:

$   hadoop fs -put /home/opc/file.txt swift://alextest.bdcstorage/file1.txt

Upload/Download files. Oracle Storage Cloud Software Appliance.

Object Store is a fairly new concept and for sure there is a way to smooth this migration. Years ago, when HDFS was new and undiscovered, many people didn’t know how to work with it and few technologies, such as NFS-Gateway and HDFS-fuse appears. Both these technology allowed to mount HDFS on Linux filesystem and work with it as with normal filesystem. Something like this allows doing Oracle Cloud Infrastructure Storage Software Appliance. All documentation you could find here, brief video here, download software here. In my blog I just show one example of its usage. This picture will help me to explain how works Storage Cloud Software Appliance:

you may see that customer need to install on-premise docker container, which will have all required stack. I’ll skip all details, which you may find in the documentation above, and will just show a concept.

# Check oscsa status
[on-prem client] $   oscsa info
Management Console: https://docker.oracleworld.com:32769
If you have already configured an OSCSA FileSystem via the Management Console,
you can access the NFS share using the following port.

NFS Port: 32770

Example: mount -t nfs -o vers=4,port=32770 docker.oracleworld.com:/ /local_mount_point

# Run oscsa[on-prem client]$   oscsa up

There (on the docker image, which you deploy on some on-premise machine) you may find WebUI, where you can configure Storage Appliance:

after login, you may see a list of configured Objects Stores:

In this console you may connect linked container with this on-premise host:

after it been connected, you will see option “disconnect”

After you connect a device, you have to mount it:

[on-prem client] $   sudo mount -t nfs -o vers=4,port=32770 localhost:/devoos /oscsa/mnt
[on-prem client] $   df -h|grep oscsa
localhost:/devoos  100T  1.0M  100T   1% /oscsa/mnt

Now you could upload a file into Object Store:

[on-prem client] $   echo "Hello Oracle World" > blog.file
[on-prem client] $   cp blog.file /oscsa/mnt/

This is asynchronous copy to Object Store, so after a while, you will be able to find a file there:

Only one restriction, which I wasn’t able to overcome is that filename is changing during the copy.

Conclusion.

Object Store is here and it will became more and more popular. It means there is no way to escape it and you have to get familiar with it. Blogpost above showed that there are multiple ways to deal with it, strting from user friendly (like CloudBerry) and ending on the low level REST API.

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Big Data SQL Quick Start. Correlate real-time data with historiacal benchmarks – Part 24

In Big Data SQL 3.2 we have introduced new capability – Kafka as a data source. Some details about how it works with some simple examples, I’ve posted over here. But now I want to talk about why do you want to run queries over Kafka. Here is Oracle concept picture on Datawarehouse:

You have some stream (real-time data), data lake where you land raw information and cleaned Enterprise data. This is just a concept, which could be implemented in many different ways, one of this depict here:

Kafka is the hub for streaming events, where you accumulate data from multiple real-time producers and provide this data to many consumers (it could be real-time processing, such as Spark-Streaming or you could load data in batch mode to the next Datawarehouse tier, such as Hadoop). 

In this architecture, Kafka contains stream data and it’s able to answer the question “what is going on right now”, whereas in Database you store operational data, in Hadoop historical and those two sources are able to answer the question “how it use to be”. Big Data SQL allows you to run the SQL over those tree sources and correlate real-time events with historical.

Example of using Big Data SQL over Kafka and other sources.

So, above I’ve explained the concept why you may need to query Kafka with Big Data SQL, now let me give a concrete example. 

Input for demo example:

- We have company, called MoviePlex, which sells video content all around the world

- There are two stream datasets – network data, which contains information about network errors, conditions of routing devices and so. The second data source is the fact of the movie sales. 

- Both stream data in real-time in Kafka

- Also, we have historical network data, which we store in HDFS (because of the cost of this data), historical sales data (which we store in database) and multiple dimension tables, stored in RDBMS as well.

Based on this we have a business case – monitor revenue flow, correlate current traffic with the historical benchmark (depend on Day of the Week and Hour of the Day) and try to find the reason in case of failures (network errors, for example).

Using Oracle Data Visualization Desktop, we’ve created a dashboard, which shows how real-time traffic correlate with statistical and also, shows a number of network errors based on the countries:

The blue line is a historical benchmark.

Over the time we see that some errors appear in some countries (left dashboard), but current revenue is more or less the same as it uses to be.

After a while revenue starts going down.

This trend keeps going.

A lot of network errors in France. Let’s drill down into itemized traffic:

Indeed, we caught that overall revenue goes down because of France and cause of this is some network errors.

Conclusion:

1) Kafka stores real-time data  and answers on question “what is going on right now”

2) Database and Hadoop stores historical data and answers on the question: “how it use to be”

3) Big Data SQL could query the data from Kafka, Hadoop, Database within single query (Join the datasets)

4) This fact allows us to correlate historical benchmarks with real-time data within SQL interface and use this with any SQL compatible BI tool 

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Review of Big Data Warehousing at OpenWorld 2017 – Now Available

 Review of Big Data Warehousing at OpenWorld 2017   Now Available

Did you miss OpenWorld 2017? Then my latest book is definitely something you will want to download! If you went to OpenWorld this book is also for you because it covers all the most important big data warehousing messages and sessions during the five days of OpenWorld.

Following on from OpenWorld 2017 I have put together a comprehensive review of all the big data warehousing content from OpenWorld 2017. This includes all the key sessions and announcements from this year’s Oracle OpenWorld conference. This review guide contains the following information:

Chapter 1 Welcome – an overview of the contents.  

Chapter 2 Let’s Go Autonomous - containing all you need to know about Oracle’s new, fully-managed Autonomous Data Warehouse Cloud. This was the biggest announcement at OpenWorld so this chapter contains videos, presentations and podcasts to get you up to speed on this completely new data warehouse cloud service.

Chapter 3 Keynotes – Relive OpenWorld 2017 by watching the most important highlights from this year’s OpenWorld conference with our on demand video service which covers all the major keynote sessions.

Chapter 4 Key Presenters – a list of the most important speakers by product area such as database, cloud, analytics, developer and big data. Each biography includes all relevant social media sites and pages.

Chapter 5 Key Sessions - a list of all the most important sessions with links to download the related presentations organized

Chapter 6 Staying Connected – Details of all the links you need to keep up to date on Oracle’s strategy and products for Data Warehousing and Big Data.  This covers all our websites, blogs and social media pages.

This review is available in three formats:

1) For highly evolved users, i.e. Apple users, who understand the power of Apple’s iBook format, your multi-media enabled iBook version is available here.

2) For Windows users who are forced to endure a 19th-Century style technological experience, your PDF version is available here.

3) For Linux users, Oracle DBAs and other IT dinosaurs, all of whom are allergic to all graphical user interfaces, the basic version of this comprehensive review is available here.

I hope you enjoy this review and look forward to seeing you next year at OpenWorld 2018, October 28 to November 1. If you’d like to be notified when registration opens for next year’s Oracle OpenWorld then register your email address here.
 

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

New Release: BDA 4.10 is now Generally Available

As of today, BDA version 4.10 is Generally Available. As always, please refer to If You Struggle With Keeping your BDAs up to date, Then Read Thisto learn about the innovative release process we do for BDA software.

This new release includes a number of features and updates:

  • Support for Migration From Oracle Linux 5 to Oracle Linux 6 - Clusters on Oracle Linux 5 must first be upgraded to v4.10.0 on Oracle Linux 5 and can then be migrated to Oracle Linux 6. This process must be done one server at a time. HDFS data and Cloudera Manager roles are retained. Please review the documentation for the entire process carefully before starting.

    • BDA v4.10 is the last release built for Oracle Linux 5 and no further upgrades for Oracle Linux 5 will be released.
  • Updates to NoSQL DB, Big Data Connectors, Big Data Spatial & Graph
    • Oracle NoSQL Database 4.5.12
    • Oracle Big Data Connectors 4.10.0
    • Oracle Big Data Spatial & Graph 2.4.0
  • Support for Oracle Big Data Appliance X-7 systems – Oracle Big Data Appliance X7 is based on the X7–2L server. The major enhancements in Big Data Appliance X7–2 hardware are:

    • CPU update: 2 24–core Intel Xeon processor
    • Updated disk drives: 12 10TB 7,200 RPM SAS drives
    • 2 M.2 150GB SATA SSD drives (replacing the internal USB drive)
    • Vail Disk Controller (HBA)
    • Cisco 93108TC-EX–1G Ethernet switch (replacing the Catalyst 4948E).
  • Spark 2 Deployed by Default – Spark 2 is now deployed by default on new clusters and also during upgrade of clusters where it is not already installed.
  • Oracle Linux 7 can be Installed on Edge Nodes – Oracle Linux 7 is now supported for installation on Oracle Big Data Appliance edge nodes running on X7–2L, X6–2L or X5–2L servers. Support for Oracle Linux 7 in this release is limited to edge nodes.
  • Support for Cloudera Data Science Workbench – Support for Oracle Linux 7 on edge nodes provides a way for customers to host Cloudera Data Science Workbench (CDSW) on Oracle Big Data Appliance. CDSW is a web application that enables access from a browser to R, Python, and Scala on a secured cluster. Oracle Big Data Appliance does not include licensing or official support for CDSW. Contact Cloudera for licensing requirements.
     
  • Scripts for Download & Configuration of Apache Zeppelin, Jupyter Notebook, and RStudio –  This release includes scripts to assist in download and configuration of these commonly used tools. The scripts are provided as a convenience to users. Oracle Big Data Appliance does not include official support for the installation and use of Apache Zeppelin, Jupyter Notebook, or RStudio.
     
  • Improved Configuration of Oracle’s R Distribution and ORAAH –  For these tools, much of the environment configuration that was previous done by the customer is now automated.
  • Node Migration Optimization – Node migration time has been improved by eliminating some steps.
  • Support for Extending Secure NoSQL DB clusters

This release is based on Cloudera Enterprise (CDH 5.12.1 & Cloudera Manager 5.12.1) as well as Oracle NoSQL Database (4.5.12).

  • Cloudera 5 Enterprise includes CDH (Core Hadoop), Cloudera Manager, Apache Spark, Apache HBase, Impala, Cloudera Search and Cloudera Navigator
  • The BDA continues to support all security options for CDH Hadoop clusters : Kerberos authentication – MIT or Microsoft Active Directory, Sentry Authorization, HTTPS/Network encryption, Transparent HDFS Disk Encryption, Secure Configuration for Impala, HBase , Cloudera Search and all Hadoop services configured out-of-the-box.
  • Parcels for Kafka 2.2, Spark 2.2, Kudu 1.4 and Key Trustee Server 5.12 are included in the BDA Software Bundle

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

OpenWord 2017: Must-See Sessions for Day 3 – Tuesday

Day 3, Tuesday, is here and this is my definitive list of Must-See sessions for today. Today we are focused on the new features in Oracle Database 18c – multitenant, in-memory, Oracle Text, machine learning, Big Data SQL etc etc. These sessions are what Oracle OpenWorld is all about: the chance to learn about the latest technology from the real technical experts.

MONDAY’s MUST-SEE GUIDE

Don’t worry if you are not able to join us in San Francisco for this year’s conference because I will be providing a comprehensive review after the conference closes on Thursday.

The review will include links to download the presentations for each of my Must-See sessions and links to any hands-on lab content as well.

Have a great conference.

If you are here in San Francisco then enjoy the conference – it’s going to be an awesome conference this year.

Don’t forget to make use of our Big DW #oow17 smartphone app which you can access by pointing your phone at this QR code:

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

OpenWord 2017 – Must-See Sessions for Day 1

aaeaaqaaaaaaaawfaaaajgq3ymnjmdkxlwyyywitnde5nc05njnilwjmnzm4ndexzthmmq OpenWord 2017   Must See Sessions for Day 1

It all starts today –  OpenWorld 2017. Each day I will provide you with a list of must-see sessions and hands-on labs. This is going to be one of the most exciting OpenWorlds ever!

Today is Day 1 so here here is my definitive list of Must-See sessions for the opening day. The list is packed full of really excellent speakers such as Franck PachotAmi AharonovichGalo Balda and Rich Niemiec. These sessions are what Oracle OpenWorld is all about: the chance to learn from the real technical experts.

Of course you need to end your first day in Moscone North Hall D for Larry Ellison’s welcome keynote – it’s going to be a  great one!
 

SUNDAY’S MUST-SEE GUIDE

Don’t worry if you are not able to join us in San Francisco for this year’s conference because I will be providing a comprehensive review after the conference closes on Thursday.

The review will include links to download the presentations for each of my Must-See sessions and links to any hands-on lab content as well. Have a great conference.

If you are here in San Francisco then enjoy the conference – it’s going to be an awesome conference this year.

Don’t forget to make use of our Big DW #oow17 smartphone app which you can access by pointing your phone at this QR code:
 

qrcode.41572804 OpenWord 2017   Must See Sessions for Day 1

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

UPDATED: Big Data Warehousing Must See Guide for Oracle OpenWorld 2017

 UPDATED: Big Data Warehousing Must See Guide for Oracle OpenWorld 2017 ** NEW ** Chapter 5

 UPDATED: Big Data Warehousing Must See Guide for Oracle OpenWorld 2017

*** UPDATED *** Must-See Guide now available as PDF and via Apple iBooks Store

This updated version now contains details of all the most important hands-on labs AND a day-by-day calendar. This means that our comprehensive guide now covers absolutely everything you need to know about this year’s Oracle OpenWorld conference. Now, when you arrive at Moscone Conference Center you are ready to get the absolute most out of this amazing conference.

The updated, and still completely free, big data warehousing Must-See guide for OpenWorld 2017 is now available for download from the Apple iBooks Store – click hereand in PDF format – click here.

Just so you know…this guide contains the following information:

Chapter 1

 – Introduction to the must-see guide. 

Chapter 2

 – A guide to the key the highlights from last year’s conference so you can relive the experience or see what you missed. Catch the most important highlights from last year’s OpenWorld conference with our on demand video service which covers all the major keynote sessions. Sit back and enjoy the highlights. The second section explains why you need to attend this year’s conference and how to justify it to your company. 

Chapter 3

- Full list of Oracle Product Management and Development presenters who will be at this year’s OpenWorld. Links to all their social media sites are included alongside each profile. Read on to find out about the key people who can help you and your teams build the FUTURE using Oracle’s Data Warehouse and Big Data technologies. 

Chapter 4

 – List of the “must-see” sessions

and hands-on labs

at this year’s OpenWorld by category. It includes all the sessions and hands-on labs by the Oracle Product Management and Development teams along with key customer sessions. Read on for the list of the best, most innovative sessions at Oracle OpenWorld 2017. 

Chapter 5

 – Day-by-Day “must-see” guide. It includes all the sessions and hands-on labs by the Oracle Product Management and Development teams along with key customer sessions. Read on for the list of the best, most innovative sessions at Oracle OpenWorld 2017. 

Chapter 6

 – Details of all the links you need to keep up to date on Oracle’s strategy and products for Data Warehousing and Big Data. This covers all our websites, blogs and social media pages. 

Chapter 7  

Details of our exclusive web application for smartphones and tablets provides you with a complete guide to everything related to data warehousing and big data at OpenWorld 2017. 

Chapter 8

 – Information to help you find your way around the area surrounding the Moscone Conference Center this section includes some helpful maps. 

Let me know if you have any comments. Enjoy and see you in San Francisco.

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

The Colbran Group of Companies – Home

To thrive and excel in today’s business environment, you have to be able to focus on your core business. 

All kinds of distractions can slow your company down, and you may end up losing your competitive edge.

We analyze your business processes and provide support in all other areas of management so you can focus on your business.

Let’s block ads! (Why?)

The Colbran Group of Companies – Home