Tag Archives: Start

Start Planning for Black Friday and Cyber Monday 2018 Now

black friday Start Planning for Black Friday and Cyber Monday 2018 Now

Black Friday and Cyber Monday are about to hit yet again. Last year, 154 million more people shopped than in 2015 and it’s only predicted to get bigger. It’ll be interesting to see how 2017 shapes up, but in the meantime, there are some things you can do as a retailer for both brick-and-mortar and online stores to start planning for next year and beyond.

How to handle the crush of shoppers

Every retailer’s biggest focus and investment, for both e-commerce and brick-and-mortar, needs to be on scaling. If your backend falls apart because of a crush of traffic, everything you set up will be for naught. You need a system that can handle the large traffic load and the pounding of people hitting refresh and trampling through your store.

Moving to the cloud or at least hybridizing part of your environment prior to Black Friday must be a priority. The cloud allows you to rapidly scale your operations based on traffic load for a significant cost savings over running your datacenter in-house. It’s a key part of keeping your network flexible, agile, and scalable.

For brick-and-mortar stores, the key to increasing revenue is optimizing the movement of people in and out of the store. The traffic flooding into these stores on Black Friday can represent several multiples of the typical daily traffic. In order to be successful, the floor staff must be equipped with real-time information about inventory levels, customer flows, in-store deal announcements, and answers to the questions asked by panicked, manic shoppers.

The best way to arm staff is to supply mobile devices that enable them to communicate with each other and provide that real-time information at their fingertips. You can even provide POS systems directly on those devices so they can check people out right in the middle of the store, significantly reducing the crush at the front counters.

Brick-and-mortar retailers must also stand out from the noise of their competitors beyond just giving excellent customer service. To differentiate, they must offer compelling real-time deals that can draw the crowds and satisfy shoppers’ needs. But customers must receive this information in a timely manner. The same digital systems that keep floor staff up to date can help customers determine their Black Friday game plan. With the right analytics in place offering a 360 degree view of the customer, you can provide personalized deals and recommendations that compel shoppers to prioritize your shops on their Black Friday agenda.

How to make the online experience as exciting as the in-store experience

Online goes a bit differently. Shoppers may plan their Cyber Monday or Black Friday based on email newsletters and ads they’ve received, but there’s a ton of competition. Sending real-time offers and encouraging shoppers to subscribe to deal alerts throughout the day can help you stand out from the noise and leverage the loyalty you’ve built throughout the rest of the year.

Online retailers can leverage the same technologies driving in-store personalization to send customized deals and ads to consumers based on their shopping habits with the help of products like analytics and integration. Researchers have seen consumers that receive more targeted ads are more than twice as likely to buy the advertised product as consumers that get non-targeted ads.

The same systems can identify shopping trends in real time, allowing for dynamic pricing and deal creation based on predicted shopping trends and inventory levels. This can help dramatically decrease the logistical headaches that come with coordinating your Black Friday and Cyber Monday sales while offering deals that drive purchasing behavior without discounting so deeply that you lose profits.

To see how TIBCO is helping retailers digitally transform, download the whitepaper: The Digital Challenge in Retail or visit our Retail page.

Let’s block ads! (Why?)

The TIBCO Blog

Big Data SQL Quick Start. Correlate real-time data with historiacal benchmarks – Part 24

In Big Data SQL 3.2 we have introduced new capability – Kafka as a data source. Some details about how it works with some simple examples, I’ve posted over here. But now I want to talk about why do you want to run queries over Kafka. Here is Oracle concept picture on Datawarehouse:

You have some stream (real-time data), data lake where you land raw information and cleaned Enterprise data. This is just a concept, which could be implemented in many different ways, one of this depict here:

Kafka is the hub for streaming events, where you accumulate data from multiple real-time producers and provide this data to many consumers (it could be real-time processing, such as Spark-Streaming or you could load data in batch mode to the next Datawarehouse tier, such as Hadoop). 

In this architecture, Kafka contains stream data and it’s able to answer the question “what is going on right now”, whereas in Database you store operational data, in Hadoop historical and those two sources are able to answer the question “how it use to be”. Big Data SQL allows you to run the SQL over those tree sources and correlate real-time events with historical.

Example of using Big Data SQL over Kafka and other sources.

So, above I’ve explained the concept why you may need to query Kafka with Big Data SQL, now let me give a concrete example. 

Input for demo example:

- We have company, called MoviePlex, which sells video content all around the world

- There are two stream datasets – network data, which contains information about network errors, conditions of routing devices and so. The second data source is the fact of the movie sales. 

- Both stream data in real-time in Kafka

- Also, we have historical network data, which we store in HDFS (because of the cost of this data), historical sales data (which we store in database) and multiple dimension tables, stored in RDBMS as well.

Based on this we have a business case – monitor revenue flow, correlate current traffic with the historical benchmark (depend on Day of the Week and Hour of the Day) and try to find the reason in case of failures (network errors, for example).

Using Oracle Data Visualization Desktop, we’ve created a dashboard, which shows how real-time traffic correlate with statistical and also, shows a number of network errors based on the countries:

The blue line is a historical benchmark.

Over the time we see that some errors appear in some countries (left dashboard), but current revenue is more or less the same as it uses to be.

After a while revenue starts going down.

This trend keeps going.

A lot of network errors in France. Let’s drill down into itemized traffic:

Indeed, we caught that overall revenue goes down because of France and cause of this is some network errors.


1) Kafka stores real-time data  and answers on question “what is going on right now”

2) Database and Hadoop stores historical data and answers on the question: “how it use to be”

3) Big Data SQL could query the data from Kafka, Hadoop, Database within single query (Join the datasets)

4) This fact allows us to correlate historical benchmarks with real-time data within SQL interface and use this with any SQL compatible BI tool 

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Big Data SQL Quick Start. Big Data SQL over Kafka – Part 23

G Data SQL 3.2 version brings a few interesting features. Among those features, one of the most interesting is the ability to read Kafka. Before drilling down into details, I’d like to explain in the nutshell what Kafka is.

What is Kafka?

The full scope of the information about Kafka you may find here, but in the nutshell, it’s distributed fault tolerant message system. It allows you to connect many systems in an organized fashion. Instead, connect each system peer to peer:

you may land all your messages company wide on one system and consume it from there, like this:

Kafka is kind of Data Hub system, where you land the messages and serve it after.

More technical details.

I’d like to introduce a few key Kafka’s terms.

1) Kafka Broker. This is Kafka service, which you run on each server and which operates all read and write request

2) Kafka Producer. The process which writes data in Kafka

3) Kafka Consumer. The process, which reads data from Kafka.

4) Message. The name describes itself, I just want to add that messages have key and value. In comparison to NoSQL databases key Kafka’s key is not indexed. It has application purposes (you may put some application logic in Key) and administrative purposes (each message with the same key goes to the same partition).

5) Topic. Set of the messages organized into topics. Database guys would compare it with a table.

6) Partition. It’s a good practice to divide the topic into partitions for performance and maintenance purposes. Messages within the same key go to the same partition. If a key is absent, messages are distributing in round – robin fashion.

7) Offset. The offset is the position of each message in the topic. The offset is indexed and it allows you quickly access your particular message.

When do you delete data?

One of the basic Kafka concepts is that of retention – Kafka does not keep data forever, nor does it wait for all consumers to read a message before deleting a message. Instead, the Kafka administrator configures a retention period for each topic – either amount of time for which to store messages before deleting them or how much data to store older messages are purged. This two parameters control this: log.retention.ms and log.retention.bytes.

The amount of data to retain in the log for each topic-partition. This is the limit per partition: multiply by the number of partitions to get the total data retained for the topic. 

How to query Kafka data with Big Data SQL?

for query the Kafka data you need to create hive table first. let me show an ent-to-end example. I do have a JSON file:

$   cat web_clicks.json
{ click_date: "38041", click_time: "67786", date: "2004-02-26", am_pm: "PM", shift: "second", sub_shift: "evening", item_sk: "396439", web_page: "646"}
{ click_date: "38041", click_time: "41831", date: "2004-02-26", am_pm: "AM", shift: "first", sub_shift: "morning", item_sk: "90714", web_page: "804"}
{ click_date: "38041", click_time: "60334", date: "2004-02-26", am_pm: "PM", shift: "second", sub_shift: "afternoon", item_sk: "151944", web_page: "867"}
{ click_date: "38041", click_time: "53225", date: "2004-02-26", am_pm: "PM", shift: "first", sub_shift: "afternoon", item_sk: "175796", web_page: "563"}
{ click_date: "38041", click_time: "47515", date: "2004-02-26", am_pm: "PM", shift: "first", sub_shift: "afternoon", item_sk: "186943", web_page: "777"}
{ click_date: "38041", click_time: "73633", date: "2004-02-26", am_pm: "PM", shift: "second", sub_shift: "evening", item_sk: "118004", web_page: "647"}
{ click_date: "38041", click_time: "43133", date: "2004-02-26", am_pm: "AM", shift: "first", sub_shift: "morning", item_sk: "148210", web_page: "930"}
{ click_date: "38041", click_time: "80675", date: "2004-02-26", am_pm: "PM", shift: "second", sub_shift: "evening", item_sk: "380306", web_page: "484"}
{ click_date: "38041", click_time: "21847", date: "2004-02-26", am_pm: "AM", shift: "third", sub_shift: "morning", item_sk: "55425", web_page: "95"}
{ click_date: "38041", click_time: "35131", date: "2004-02-26", am_pm: "AM", shift: "first", sub_shift: "morning", item_sk: "185071", web_page: "118"}

and I’m going to load it into Kafka with standard Kafka tool “kafka-console-producer”:

$   cat web_clicks.json|kafka-console-producer --broker-list bds2:9092,bds3:9092,bds4:9092,bds5:9092,bds6:9092 --topic json_clickstream

for a check that messages have appeared in the topic you may use the following command:

$   kafka-console-consumer --zookeeper bds1:2181,bds2:2181,bds3:2181 --topic json_clickstream --from-beginning

after I’ve loaded this file into Kafka topic, I create a table in Hive.

Make sure that you have oracle-kafka.jar and kafka-clients*.jar in your hive.aux.jars.path:

and here:

after this you may run follow DDL in the hive:

hive> CREATE EXTERNAL TABLE json_web_clicks_kafka
row format serde 'oracle.hadoop.kafka.hive.KafkaSerDe'
stored by 'oracle.hadoop.kafka.hive.KafkaStorageHandler'
hive> describe json_web_clicks_kafka;
hive> select * from json_web_clicks_kafka limit 1;

and as soon as hive table been created I create ORACLE_HIVE table in Oracle:

SQL> CREATE TABLE json_web_clicks_kafka (
topic varchar2(50),
partitionid integer,
VALUE  varchar2(4000),
offset integer,
timestamp timestamp, 
timestamptype integer

here you also have to keep in mind that you need to add oracle -kafka.jar and  kafka -clients*.jar in your bigdata.properties file on the database and on the Hadoop side. I have dedicated the blog about how to do this here.

Now we are ready to query:

SQL> SELECT * FROM json_web_clicks_kafka

json_clickstream	209	{ click_date: "38041", click_time: "43213"..."}	0	26-JUL-17 PM	1
json_clickstream	209	{ click_date: "38041", click_time: "74669"... }	1	26-JUL-17 PM	1

Oracle 12c provides powerful capabilities for working with JSON, such as dot API. It allows us to easily query the JSON data as a structure: 

SELECT t.value.click_date,
  FROM json_web_clicks_kafka t

38041	40629
38041	48699

Working with AVRO messages.

In many cases, customers are using AVRO as flexible self-described format and for exchanging messages through the Kafka. For sure we do support it and doing this in very easy and flexible way.

I do have a topic, which contains AVRO messages and I define Hive table over it:

row format serde 'oracle.hadoop.kafka.hive.KafkaSerDe'
stored by 'oracle.hadoop.kafka.hive.KafkaStorageHandler'
describe web_sales_kafka;
select * from web_sales_kafka limit 1;

Here I define ‘oracle.kafka.table.value.type’=’avro’ and also I have to specify ‘oracle.kafka.table.value.schema’. After this we have structure.

In a similar way I define a table in Oracle RDBMS:

  topic varchar2(50),
  partitionid integer,
  offset integer,
  timestamp timestamp, 
  timestamptype INTEGER
      ( com.oracle.bigdata.tablename: web_sales_kafka

And we good to query the data!

Performance considerations.

1) Number of Partitions.

This is the most important thing to keep in mind there is a nice article about how to choose a right number of partitions. For Big Data SQL purposes I’d recommend using a number of partitions a bit more than you have CPU cores on your Big Data SQL cluster.

2) Query fewer columns

Use column pruning feature. In other words list only necessary columns in your SELECT and WHERE statements. Here is the example.

I’ve created void PL/SQL function, which does nothing. But PL/SQL couldn’t be offloaded to the cell side and we will move all the data towards the database side:

SQL> create or replace function fnull(input number) return number is
Result number;
end fnull;

after this I ran the query, which requires one column and checked how much data have been returned to the DB side:


“cell interconnect bytes returned by XT smart scan” 5741.81MB

after this I repeat the same test case with 10 columns:


“cell interconnect bytes returned by XT smart scan” 32193.98 MB

so, hopefully, this test case clearly shows that you have to use only useful columns

3) Indexes

There is no Indexes rather than Offset columns. The fact that you have key column doesn’t have to mislead you – it’s not indexed. The only offset allows you have a quick random access

4) Warm up your data

If you want to read data faster many times, you have to warm it up, by running “select *” type of the queries.

Kafka relies on Linux filesystem cache, so for reading the same dataset faster many times, you have to read it the first time.

Here is the example

- I clean up the Linux filesystem cache

dcli -C "sync; echo 3 > /proc/sys/vm/drop_caches"

- I tun the first query:


it took 278 seconds.

- Second and third time took 92 seconds only.

5) Use bigger Replication Factor

Use bigger replication factor. Here is the example. I do have two tables one is created over the Kafka topic with Replication Factor  = 1, second is created over Kafka topic with ith Replication Factor  = 3.


this query took 278 seconds for the first run and 92 seconds for the next runs


This query took 279 seconds for the first run, but 34 seconds for the next runs.

6) Compression considerations

Kafka supports different type of compressions. If you store the data in JSON or XML format compression rate could be significant. Here is the examples of the numbers, that could be:

Data format and compression type Size of the data, GB
JSON on HDFS, uncompressed 273.1
JSON in Kafka, uncompressed 286.191
JSON in Kafka, Snappy 180.706
JSON in Kafka, GZIP 52.2649
AVRO in Kafka, uncompressed 252.975
AVRO in Kafka, Snappy 158.117
AVRO in Kafka, GZIP 54.49

This feature may save some space on the disks, but taking into account, that Kafka primarily used for the temporal store (like one week or one month), I’m not sure that it makes any sense. Also, you will pay some performance penalty, querying this data (and burn more CPU). 

I’ve run a query like:

SQL> select count(1) from ...

and had followed results:

Type of compression Elapsed time, sec
uncompressed 76
snappy 80
gzip 92

so, uncompressed is the leader. Gzip and Snappy slower (not significantly, but slow). taking into account this as well as fact, that Kafka is a temporal store, I wouldn’t recommend using compression without any exeptional need. 

7) Use parallelize your processing.

If for some reasons you are using a small number of partitions, you could use Hive metadata parameter “oracle.kafka.partition.chunk.size” for increase parallelism. This parameter defines a size of the input Split. So, if you set up this parameter equal 1MB and your topic has 4MB total, you will proceed it with 4 parallel threads.

Here is the test case:

- Drop Kafka topic

$   kafka-topics --delete --zookeeper cfclbv3870:2181,cfclbv3871:2181,cfclbv3872:2181 --topic store_sales

- Create again with only one partition

$   kafka-topics --create --zookeeper cfclbv3870:2181,cfclbv3871:2181,cfclbv3872:2181 --replication-factor 3 --partitions 1 --topic store_sales

- Check it

$   kafka-topics --describe --zookeeper cfclbv3870:2181,cfclbv3871:2181,cfclbv3872:2181 --topic store_sales
Topic:store_sales       PartitionCount:1        ReplicationFactor:3     Configs:
      Topic: store_sales      Partition: 0    Leader: 79      Replicas: 79,76,77      Isr: 79,76,77

- Check the size of input file:

$   du -h store_sales.dat
19G     store_sales.dat

- Load data to the Kafka topic

$   cat store_sales.dat|kafka-console-producer --broker-list cfclbv3870.us2.oraclecloud.com:9092,cfclbv3871.us2.oraclecloud.com:9092,cfclbv3872.us2.oraclecloud.com:9092,cfclbv3873.us2.oraclecloud.com:9092,cfclbv3874.us2.oraclecloud.com:9092 --topic store_sales  --request-timeout-ms 30000  --batch-size 1000000

- Create Hive External table

hive> CREATE EXTERNAL TABLE store_sales_kafka
row format serde 'oracle.hadoop.kafka.hive.KafkaSerDe'
stored by 'oracle.hadoop.kafka.hive.KafkaStorageHandler'

- Create Oracle external table

   (	TOPIC VARCHAR2(50), 
      VALUE VARCHAR2(4000), 
      ( com.oracle.bigdata.tablename=default.store_sales_kafka

- Run test query

SQL> SELECT COUNT(1) FROM store_sales_kafka;

it took 142 seconds

- Re-create Hive external table with ‘oracle.kafka.partition.chunk.size’ parameter equal 1MB

hive> CREATE EXTERNAL TABLE store_sales_kafka
row format serde 'oracle.hadoop.kafka.hive.KafkaSerDe'
stored by 'oracle.hadoop.kafka.hive.KafkaStorageHandler'

- Run query again:

SQL> SELECT COUNT(1) FROM store_sales_kafka;

Now it took only 7 seconds

One MB split is quite low, and for big topics we recommend to use 256MB.

8) Querying small topics.

Sometimes it happens that you need to query really small topics (few hundreds of messages, for example), but very frequently. At this case, it makes sense to create a topic with fewer paritions.

Here is the test case example:

- Create topic with 1000 partitions

$   kafka-topics --create --zookeeper cfclbv3870:2181,cfclbv3871:2181,cfclbv3872:2181 --replication-factor 3 --partitions 1000 --topic small_topic

- Load only one message there

$   echo "test"|kafka-console-producer --broker-list cfclbv3870.us2.oraclecloud.com:9092,cfclbv3871.us2.oraclecloud.com:9092,cfclbv3872.us2.oraclecloud.com:9092,cfclbv3873.us2.oraclecloud.com:9092,cfclbv3874.us2.oraclecloud.com:9092 --topic small_topic

- Create hive external table

hive> CREATE EXTERNAL TABLE small_topic_kafka
row format serde 'oracle.hadoop.kafka.hive.KafkaSerDe'
stored by 'oracle.hadoop.kafka.hive.KafkaStorageHandler'

- Create Oracle external table

SQL> CREATE TABLE small_topic_kafka (
topic varchar2(50),
partitionid integer,
VALUE varchar2(4000),
offset integer,
timestamp timestamp,
timestamptype integer

- Query all rows from it

SQL> SELECT * FROM small_topic_kafka

it took 6 seconds

- Create topic with only one partition and put only one message there and run same SQL query over it

$   kafka-topics --create --zookeeper cfclbv3870:2181,cfclbv3871:2181,cfclbv3872:2181 --replication-factor 3 --partitions 1 --topic small_topic
$   echo "test"|kafka-console-producer --broker-list cfclbv3870.us2.oraclecloud.com:9092,cfclbv3871.us2.oraclecloud.com:9092,cfclbv3872.us2.oraclecloud.com:9092,cfclbv3873.us2.oraclecloud.com:9092,cfclbv3874.us2.oraclecloud.com:9092 --topic small_topic
SQL> SELECT * FROM small_topic_kafka

now it takes only 0.5 second

9) Type of data in Kafka messages.

You have few options for storing data in Kafka messages and for sure you want to do pushdown processing. Big Data SQL supports pushdown operations only for JSONs. This means that everything that you could expose thought the JSON will be pushed down to the cell side and will be prosessed there.


- The query which could be pushed down to the cell side (JSON):


- The query which could not be pushed down to the cell side (XML):

 .getNumberVal() = 233183247;

If amounts of data is not significant, you could use Big Data SQL for processing. If we are talking about big data volumes, you could process it once and convert into different file formats on HDFS, with Hive query:

hive> select xpath_int(value,'/operation/col[@name="WR_ORDER_NUMBER"]/after/text()') from WEB_RETURNS_XML_KAFKA limit 1 ;

10) JSON vs AVRO format in the Kafka topics

In continuing to the previous point, you may be wondering which semi-structured format use? The answer is easy – use what your data source produce there is no significant performance difference between Avro and JSON. For example, a query like:


Will be done in 112 seconds in case of JSON and in 105 seconds in case of Avro.

and JSON topic will take 286.33 GB and Avro will take 202.568 GB. There is some difference, but not worth for converting the original format.

How to bring data from OLTP databases in Kafka? Use Golden Gate!

Oracle Golden Gate is the well-known product for capturing commit logs on the database side and bring the changes into a target system. The good news that Kafka may play a role in the target system. I’d like to skip the detailed explanation of this feature, because it’s already explained in very deep details here.

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Whether it’s a Garage Start up or 100-year manufacturer, NetSuite Fuels Business Growth

Posted by David Turner, Senior Marketing Director, EMEA, Oracle NetSuite

There are exciting opportunities for organisations today to grow and innovate. That could mean going into new markets, launching new products and services, or coming up with new business models. There are always ways to expand your business.

NRBT024 Whether it’s a Garage Start up or 100 year manufacturer, NetSuite Fuels Business Growth

There are also challenges to that growth, however. Data is locked away in siloes in the organisation. It’s not real-time, it’s not accessible, you can’t always analyse it. Compliance and regulation is growing ever more complex, country by country. Systems don’t talk to each other. Attracting and retaining talent is tough and on top of all that there are always new competitors entering the market.

This ‘hairball’ of disconnected systems hinders visibility across the organisation. As operations become ever more complex many companies are forced to resort to spreadsheets and manual processes to paper over the cracks between disparate systems, making it hard to see what’s really going on. With half of start-ups failing within five years, it’s more vital than ever to monitor the health of the business and identify the drivers of growth.

Having a unified cloud business system is absolutely key to tackling these challenges. At our NetSuite Next Ready Business Tour in London this week, NetSuite customers detailed the real-world challenges they’re facing and how NetSuite is helping to overcome them.

NRBT023 Whether it’s a Garage Start up or 100 year manufacturer, NetSuite Fuels Business Growth

London-based home fashion label Buster + Punch has grown rapidly over the four years. Founded in a London garage, the company has since grown to include an ecommerce website, showroom in London and retail store in Stockholm. It now has more than 71 stockists selling products across 27 countries.

Buster + Punch CEO Martin Preen explained to the audience at the event: “We were growing very fast in lots of different markets, different languages, an omnichannel business and lots of different siloes everywhere – and quite frankly getting one picture of the organisation was impossible.”

That drove Buster + Punch to standardize, streamline and scale its operations on NetSuite OneWorld. With one unified cloud solution, Buster + Punch can extend its global growth, including a move to build a presence in the US.

At the other end of the scale is Sheffield-based OSL Cutting Technologies, a manufacturing business that has been around since 1865. OSL Cutting Technologies manufactures and imports magnetic drilling machines and cutting tools. Matthew Grey, managing director at OSL Cutting Technologies, told the Next Ready Business Tour attendees that his business has seen a lot of change and transition in the last few years.

“We have a distribution hub in the US and China and supply chain all over the world. That offers some interesting challenges in terms of building systems to support it. We acquired a business in 2015 and that left us with four systems in one business,” he said.

The company implemented NetSuite OneWorld in May 2017 to manage financials, multi-currency accounting and financial consolidation, CRM, email marketing and advanced manufacturing processes and it has already improved its on-time delivery, reporting and streamlined its financial operations.

These organisations are using NetSuite to regain control of their data and systems and extract clear actionable insights. Since our acquisition by Oracle, and the increased resources that gives us, that’s something we are going to be better placed than ever to help businesses do, as we expand our cloud platform capabilities to cater for any industry, country, language and currency. Ultimately our mission remains same as ever – to help you grow your business.

Posted on Fri, October 20, 2017
by NetSuite filed under

Let’s block ads! (Why?)

The NetSuite Blog

Big Data SQL Quick Start. Custom SerDe – Part 20

Big Data SQL Quick Start. Custom SerDe – Part 20

Many thanks to Bilal Ibdah, who is actual author of this content, I’m just publishing it in the Big Data SQL blog.

A modernized data warehouse is a data warehouse augmented with insights and data from a Big Data environment, typically Hadoop, now rather than moving and pushing the Hadoop data to a database, companies tend to expose this data through a unified layer that allows access to all data storage platforms, Hadoop, Oracle DB & NoSQL to be more specific.

The problem lies when the data that we want to expose is stored in its native format and in the lowest granularity possible, for example packet data, which can be in a binary format (PCAP), typical uses of packet data is in the telecommunications industry where this data is generated from a packet core, and can contain raw data records, known in the telecom industry as XDRs.

Here as an example of traditional architecture when source data is loading into mediation and after this TEXT (CSV) files parsed to some ETL engine and then load data into Database:

10 Big Data SQL Quick Start. Custom SerDe – Part 20

here is an alternative architecture, when you load the data directly to the HDFS (which is the part of your logical datawarehouse) and after this parse it on the fly during SQL running:

11 Big Data SQL Quick Start. Custom SerDe – Part 20

In this blog we’re going to use Oracle Big Data SQL to expose and access raw data stored in PCAP format living in hadoop.

The first step is up store the PCAP files in HDFS using the “copyFromLocal” command.

1 Big Data SQL Quick Start. Custom SerDe – Part 20

This is what the file pcap file looks like in HDFS:

2 Big Data SQL Quick Start. Custom SerDe – Part 20

In order to expose this file using Big Data SQL, we need to register this file in the Hadoop Metastore, once it’s registered in the metastore Big Data SQL can access the metadata, create an external table, and run pure Oracle SQL queries on the file, but registering this file requires to unlock the content using a custom SerDe, more details here.

Start by downloading the PCAP project from GitHub here, the project contains two components:

  • The hadoop-pcap-lib, which can be used in MapReduce jobs and,
  • The hadoop-pcap-serde, which can be used to query PCAPs in HIVE

For this blog, we will only use the serde component.

If the serde project hasn’t been compiled, compile it in an IDE or in a cmd window using the command “mvn package -e -X”

3 Big Data SQL Quick Start. Custom SerDe – Part 20

Copy the output jar named “hadoop-pcap-serde-1.1-SNAPSHOT-jar-with-dependencies.jar” found in the target folder to each node in your hadoop cluster:

4 Big Data SQL Quick Start. Custom SerDe – Part 20

Then add the pcap serde to the HIVE environment variables through Cloudera Manager:

5 Big Data SQL Quick Start. Custom SerDe – Part 20

Then save the changes and restart HIVE (you might also need to redeploy the configuration and restart the stale services).

Now let’s create a HIVE table and test the serde; copy the below to create a HIVE table:

DROPtable pcap;
ADD JAR hadoop-pcap-serde-0.1-jar-with-dependencies.jar;
SET net.ripe.hadoop.pcap.io.reader.class=net.ripe.hadoop.pcap.DnsPcapReader;
                             ts_usec string,
                             protocol string,
                             src string,
                             src_port int,
                             dst string,
                             dst_port int,
                             len int,
                             ttl int,
                             dns_queryid int,
                             dns_flags string,
                             dns_opcode string,
                             dns_rcode string,
                             dns_question string,
                             dns_answer array<string>,
                             dns_authority array<string>,
                             dns_additional array<string>)
ROW FORMAT SERDE 'net.ripe.hadoop.pcap.serde.PcapDeserializer'
STORED AS INPUTFORMAT 'net.ripe.hadoop.pcap.io.PcapInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'LOCATION'hdfs:///user/oracle/pcap/';

Now it’s time to test the serde on HIVE, let’s run the below query:

select*from pcap limit5;

6 Big Data SQL Quick Start. Custom SerDe – Part 20

The query ran successfully. Next we will create an Oracle external table that points to the pcap file using Big Data SQL, for this purpose we need to add the PCAP serde file to the Big Data SQL environment variables (this must be done on each node in your hadoop cluster). Create a directory on each server in the Oracle Big Data Appliance such as “/home/oracle/pcapserde/ ” Copy the serde jar to each node in your Big Data Appliance. Browse to /opt/oracle/bigdatasql/bdcell-12.1

Add the the pcap jar file to the environment variables list in the configuration file “bigdata.properties”

7 Big Data SQL Quick Start. Custom SerDe – Part 20

The class also needs to be updated in bigdata.properties file on the database nodes.

First we need to copy the jar to the database nodes: 

  • Copy jar to db side
  • Add jar to class path
  • Create db external table and run query
  • Restart “bdsql” service in Cloudera Manager

After this we are goot to define External table in Oracle RDBMS and query it!

8 Big Data SQL Quick Start. Custom SerDe – Part 20

9 Big Data SQL Quick Start. Custom SerDe – Part 20

Just in case I will highlight that in the last query we query (read as parse and query) binary data on the fly.

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Big Data SQL Quick Start. Complex Data Types – Part 21

Many thanks to Dario Vega, who is the actual author of this content. I’m just publishing it on this blog.

A common potentially mistaken approach that people take regarding the integration of NoSQL, Hive and ultimately BigDataSQL is to use only a RDBMS perspective and not an integration point of view. People generally think about all the features and data types they’re already familiar with from their experience using one of these products; rather than realizing that the actual data is stored in the Hive (or NoSQL) database rather than RDBMS. Or without understanding that the data will be querying from RDBMS. 

When using Big Data SQL with complex types, we are thinking to use JSON/SQL without taking care of differences between Oracle Database and Hive use of Complex Types. Why ? Because the complex types are mapped to varchar2 in JSON format, so we are reading the data in JSON style instead of the original system. 

The Best sample of this is from a Json perspective JSON ECMA-404 – Map type does not exist. 

Programming languages vary widely on whether they support objects, and if so, what characteristics and constraints the objects offer. The models of object systems can be wildly divergent and are continuing to evolve. JSON instead provides a simple notation for expressing collections of name/value pairs. Most programming languages will have some feature for representing such collections, which can go by names like record, struct, dict, map, hash, or object.

The following built-in collection functions are supported in Hive:

  • int size (Map) Returns the number of elements in the map type.

  • array map_keys(Map) Returns an unordered array containing the keys of the input map.

  • array map_values(Map)Returns an unordered array containing the values of the input map.

Are they supported in RDBMS? the answer is NO but may be YES if using APEX PL/SQL or JAVA programs. 

In the same way, there is also a difference between Impala and Hive.

Lateral views. In CDH 5.5 / Impala 2.3 and higher, Impala supports queries on complex types (STRUCT, ARRAY, or MAP), using join notation rather than the EXPLODE() keyword. See Complex Types (CDH 5.5 or higher only) for details about Impala support for complex types.

The Impala complex type support produces result sets with all scalar values, and the scalar components of complex types can be used with all SQL clauses, such as GROUP BY, ORDER BY, all kinds of joins, subqueries, and inline views. The ability to process complex type data entirely in SQL reduces the need to write application-specific code in Java or other programming languages to deconstruct the underlying data structures.

Best practices We would advise taking a conservative approach.

This is because the mappings between the NoSQL data model, the Hive data model, and the Oracle RDBMS data model is not 1-to-1.
For example, the NoSQL data model is quite a rich and there are many things one can do with nested classes in NoSQL that have no counterpart in either Hive or Oracle Database (or both). As a result, integration of the three technologies had to take a ‘least-common-denominator’ approach; employing mechanisms common to all three.

But let me show a sample

Impala code

       ,PHONEINFO.*FROM rmvtable_hive_parquet, rmvtable_hive_parquet.PHONEINFO AS PHONEINFO
WHERE zipcode ='02610'AND lastname ='ACEVEDO'AND firstname ='TAMMY'AND ssn =576228946
+---------+----------+-----------+-----------+--------+------+--------------+| zipcode | lastname | firstname | ssn       | gender |KEY| VALUE        |+---------+----------+-----------+-----------+--------+------+--------------+|02610| ACEVEDO  | TAMMY     |576228946| female |WORK|617-656-9208||02610| ACEVEDO  | TAMMY     |576228946| female | cell |408-656-2016||02610| ACEVEDO  | TAMMY     |576228946| female | home |213-879-2134|+---------+----------+-----------+-----------+--------+------+--------------+

Oracle code:

`phoneinfo`IS JSON
FROM pmt_rmvtable_hive_json_api a
WHERE a.json_column.zipcode ='02610'AND a.json_column.lastname ='ACEVEDO'AND a.json_column.firstname ='TAMMY'AND a.json_column.ssn =576228946 ;
ZIPCODE : 02610 
SSN : 576228946
GENDER : female
PHONEINFO :{"work":"617-656-9208","cell":"408-656-2016","home":"213-879-2134"}

QUESTION : How to transform this JSON – PHONEINFO in two “arrays” keys, values- Map behavior expected.

Unfortunately, the nested path JSON_TABLE operator is only available for JSON ARRAYS. In the other side, when using JSON, we can access to each field as columns.

FROM pmt_rmvtable_hive_orc a  WHERE zipcode ='02610'AND lastname ='ACEVEDO'AND firstname ='TAMMY'AND ssn =576228946;
-------------------- -------------------- -------------------- ---------- -------------------- ------------------ --------------- --------------- ---------------02610		     ACEVEDO		  TAMMY 		576228946 female	       533933353734363933617-656-9208213-879-2134408-656-2016

and what about using map columns on the where clause Looking for a specific phone number

Impala code

  ,PHONEINFO.*FROM rmvtable_hive_parquet, rmvtable_hive_parquet.PHONEINFO AS PHONEINFO
WHERE PHONEINFO.key='work'AND PHONEINFO.value ='617-656-9208'
+---------+------------+-----------+-----------+--------+------+--------------+| zipcode | lastname   | firstname | ssn       | gender |KEY| VALUE        |+---------+------------+-----------+-----------+--------+------+--------------+|89878| ANDREWS    | JEREMY    |848834686| male   |WORK|617-656-9208||00183| GRIFFIN    | JUSTIN    |976396720| male   |WORK|617-656-9208||02979| MORGAN     | BONNIE    |904775071| female |WORK|617-656-9208||14462| MCLAUGHLIN | BRIAN     |253990562| male   |WORK|617-656-9208||83193| BUSH       | JANICE    |843046328| female |WORK|617-656-9208||57300| PAUL       | JASON     |655837757| male   |WORK|617-656-9208||92762| NOLAN      | LINDA     |270271902| female |WORK|617-656-9208||14057| GIBSON     | GREGORY   |345334831| male   |WORK|617-656-9208||04336| SAUNDERS   | MATTHEW   |180588967| male   |WORK|617-656-9208|
|23993| VEGA       | JEREMY    |123967808| male   |WORK|617-656-9208|+---------+------------+-----------+-----------+--------+------+--------------+
Fetched 852ROW(s) IN99.80s

But let me continue showing the same code on Oracle (querying on work phone).

Oracle code

`phoneinfo`IS JSON
FROM pmt_rmvtable_hive_parquet  a
35330		     SIMS		  DOUGLAS		295204437 male		       {"work":"617-656-9208","cell":"901-656-9237","home":"303-804-7540"}43466		     KIM		  GLORIA		358875034 female	       {"work":"617-656-9208","cell":"978-804-8373","home":"415-234-2176"}67056		     REEVES		  PAUL			538254872 male		       {"work":"617-656-9208","cell":"603-234-2730","home":"617-804-1330"}07492		     GLOVER		  ALBERT		919913658 male		       {"work":"617-656-9208","cell":"901-656-2562","home":"303-804-9784"}20815		     ERICKSON		  REBECCA		912769190 female	       {"work":"617-656-9208","cell":"978-656-0517","home":"978-541-0065"}48250		     KNOWLES		  NANCY 		325157978 female	       {"work":"617-656-9208","cell":"901-351-7476","home":"213-234-8287"}48250		     VELEZ		  RUSSELL		408064553 male		       {"work":"617-656-9208","cell":"978-227-2172","home":"901-630-7787"}43595		     HALL		  BRANDON		658275487 male		       {"work":"617-656-9208","cell":"901-351-6168","home":"213-227-4413"}77100		     STEPHENSON 	  ALBERT		865468261 male		       {"work":"617-656-9208","cell":"408-227-4167","home":"408-879-1270"}852ROWS selected.
Elapsed: 00:05:29.56

In this case, we can also use the dot-notation A.PHONEINFO.work = ‘617-656-9208′

Note: for make familiar with Database JSON API you may use follow blog series: https://blogs.oracle.com/jsondb

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Big Data SQL Quick Start. Binary Images and Big Data SQL – Part 22

Big Data SQL Quick Start. Binary Images and Big Data SQL – Part 22

Many thanks to Dario Vega, who is the actual author of this content, I’m just publishing it in the Big Data SQL blog.

Create a hive table with a binary field and cast to BLOB type in RDBMS when using big data sql

[oracle@tvpbdaacn13 dvega]$   /usr/bin/avro-tools tojson avro.file.dvega  | more
 "license":{"bytes":"S11641384"}}[oracle@tvpbdaacn13 dvega]$   /usr/bin/parquet-tools head  parquet.file.dvega 
zipcode= 00566
lastname= ALEXANDER
firstname= PETER
ssn= 637221663
gender= male
license= UzY4NTkyNTc4

Simulating using Linux tools

On hive:

createtable image_new_test (img binary);

On Oracle:

SQL>CREATETABLE image_new_test
        com.oracle.bigdata.cluster= tvpbdaacluster3
        com.oracle.bigdata.tablename: pmt.image_new_test 

On Linux:

base64 --w 10000000 YourImage.PNG > YourImage.BASE64
#Be sure to have only one line before copy to hadoop. If not fix
wc -l YourImage.BASE64

# you can concat many images on the same BASE64  file - one image by line

hadoop fs -put  Capture.BASE64 hdfs://tvpbdaacluster3-ns/user/hive/warehouse/pmt.db/image_new_test

or use load hive commands

Validate using SQL Developer:

1 Big Data SQL Quick Start. Binary Images and Big Data SQL – Part 22

Compare to the original one:

2 Big Data SQL Quick Start. Binary Images and Big Data SQL – Part 22

Original tables:

SQL> createtable image ( id number, img BLOB);

insert an image using sqldeveloper

REM create an externaltabletocopy the dmp files to hadoop

     TYPE oracle_datapump
     LOCATION ('filename1.dmp')

Hive Tables:

#copy files to hadoop eg. on/user/dvega/images/filename1.dmp

   ROW FORMAT SERDE 'oracle.hadoop.hive.datapump.DPSerDe'
      INPUTFORMAT  'oracle.hadoop.hive.datapump.DPInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'LOCATION'/user/oracle/dvega/images/';

createtable image_hive_text
asselect*from image_hive_dmp

Big Data SQL tables:

    com.oracle.bigdata.cluster= tvpbdaacluster3
    com.oracle.bigdata.tablename: pmt.image_hive_dmp

    com.oracle.bigdata.cluster= tvpbdaacluster3
    com.oracle.bigdata.tablename: pmt.image_hive_text

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog

Want To Build A Culture Of Innovation? Start With Experimentation

The September issue of the Harvard Business Review features a cover story on design thinking’s coming of age. We have been applying design thinking within SAP for the past 10 years, and I’ve witnessed the growth of this human-centered approach to innovation first hand.

Design thinking is, as the HBR piece points out, “the best tool we have for … developing a responsive, flexible organizational culture.”

This means businesses are doing more to learn about their customers by interacting directly with them. We’re seeing this change in our work on d.forum — a community of design thinking champions and “disruptors” from across industries.

Meanwhile, technology is making it possible to know exponentially more about a customer. Businesses can now make increasingly accurate predictions about customers’ needs well into the future. The businesses best able to access and pull insights from this growing volume of data will win. That requires a fundamental change for our own industry; it necessitates a digital transformation.

So, how do we design this digital transformation?

It starts with the customer and an application of design thinking throughout an organization – blending business, technology and human values to generate innovation. Business is already incorporating design thinking, as the HBR cover story shows. We in technology need to do the same.

scn sy 797031 Want To Build A Culture Of Innovation? Start With Experimentation

Design thinking plays an important role because it helps articulate what the end customer’s experience is going to be like. It helps focus all aspects of the business on understanding and articulating that future experience.

Once an organization is able to do that, the insights from that consumer experience need to be drawn down into the business, with the central question becoming: What does this future customer experience mean for us as an organization? What barriers do we need to remove? Do we need to organize ourselves differently? Does our process need to change – if it does, how? What kind of new technology do we need?

Then an organization must look carefully at roles within itself. What does this knowledge of the end customer’s future experience mean for an individual in human resources, for example, or finance? Those roles can then be viewed as end experiences unto themselves, with organizations applying design thinking to learn about the needs inherent to those roles. They can then change roles to better meet the end customer’s future needs. This end customer-centered approach is what drives change.

This also means design thinking is more important than ever for IT organizations.

We, in the IT industry, have been charged with being responsive to business, using technology to solve the problems business presents. Unfortunately, business sometimes views IT as the organization keeping the lights on. If we make the analogy of a store: business is responsible for the front office, focused on growing the business where consumers directly interact with products and marketing; while the perception is that IT focuses on the back office, keeping servers running and the distribution system humming. The key is to have business and IT align to meet the needs of the front office together.

Remember what I said about the growing availability of consumer data? The business best able to access and learn from that data will win. Those of us in IT organizations have the technology to make that win possible, but the way we are seen and our very nature needs to change if we want to remain relevant to business and participate in crafting the winning strategy.

We need to become more front office and less back office, proving to business that we are innovation partners in technology.

This means, in order to communicate with businesses today, we need to take a design thinking approach. We in IT need to show we have an understanding of the end consumer’s needs and experience, and we must align that knowledge and understanding with technological solutions. When this works — when the front office and back office come together in this way — it can lead to solutions that a company could otherwise never have realized.

There’s different qualities, of course, between front office and back office requirements. The back office is the foundation of a company and requires robustness, stability, and reliability. The front office, on the other hand, moves much more quickly. It is always changing with new product offerings and marketing campaigns. Technology must also show agility, flexibility, and speed. The business needs both functions to survive. This is a challenge for IT organizations, but it is not an impossible shift for us to make.

Here’s the breakdown of our challenge.

1. We need to better understand the real needs of the business.

This means learning more about the experience and needs of the end customer and then translating that information into technological solutions.

2. We need to be involved in more of the strategic discussions of the business.

Use the regular invitations to meetings with business as an opportunity to surface the deeper learning about the end consumer and the technology solutions that business may otherwise not know to ask for or how to implement.

The IT industry overall may not have a track record of operating in this way, but if we are not involved in the strategic direction of companies and shedding light on the future path, we risk not being considered innovation partners for the business.

We must collaborate with business, understand the strategic direction and highlight the technical challenges and opportunities. When we do, IT will become a hybrid organization – able to maintain the back office while capitalizing on the front office’s growing technical needs. We will highlight solutions that business could otherwise have missed, ushering in a digital transformation.

Digital transformation goes beyond just technology; it requires a mindset. See What It Really Means To Be A Digital Organization.

This story originally appeared on SAP Business Trends.

Top image via Shutterstock


Let’s block ads! (Why?)

Digitalist Magazine

It’s Time To Start Catering To The Omnichannel Shopper

If your brand isn’t among those planning a significant spend on mobile marketing in 2016, you need to stop treating it like a fad and step up to meet your competition. Usage statistics show that today people live and work while on the move, and the astronomical rise of mobile ad spending proves it.

According to eMarketer, ad spending experienced triple-digit growth in 2013 and 2014. While it’s slowed in 2015, don’t let that fool you: Mobile ad spending was $ 19.2 billion in 2013, and eMarketer’s forecast for next year is $ 101.37 billion—51 percent of the digital market.

  1. Marketers follow consumer behavior, and consumers rely on their mobile devices. The latest findings from show that two-third of Americans are now smartphone owners. Around the world, there are two billion smartphone users and, particularly in developing regions, eMarketer notes “many consumers are accessing the internet mobile-first and mobile-only.”
  2. The number of mobile users has already surpassed the number of desktop users, as has the number of hours people spend on mobile Internet use, and business practices are changing as a result. Even Google has taken notice; earlier this year the search giant rolled out what many referred to as “Mobilegeddon”—an algorithm update that prioritizes mobile-optimized sites.

The implications are crystal clear: To ignore mobile is to ignore your customers. If your customers can’t connect with you via mobile—whether through an ad, social, or an optimized web experience—they’ll move to a competitor they can connect with.

Consumers prefer mobile — and so should you

Some people think mobile marketing has made things harder for marketers. In some ways, it has: It’s easy to make missteps in a constantly changing landscape.

At the same time, however, modern brands can now reach customers at any time of the day, wherever they are, as more than 90 percent of users now have a mobile device within arm’s reach 24/7. This has changed marketing, allowing brands to build better and more personalized connections with their fans.

  • With that extra nudge from Google, beating your competition and showing up in search by having a website optimized for devices of any size is essential.
  • Search engine optimization (SEO) helps people find you online; SEO integration for mobile is even more personalized, hyper local, and targeted to an individual searcher.
  • In-app advertisements put your brand in front of an engaged audience.
  • Push messages keep customers “in the know” about offers, discounts, opportunities for loyalty points, and so much more.

And don’t forget about the power of apps, whose usage takes up 85 percent of the total time consumers spend on their smartphones. Brands like Nike and Starbucks are excellent examples of how to leverage the power of being carried around in someone’s pocket.

Personal computers have never been able to offer such a targeted level of reach. We’ve come to a point where marketing without mobile isn’t really marketing at all.

Mobile marketing tools are on the upswing too

As more mobile-empowered consumers themselves from their desks to the street, the rapid rise of mobile shows no signs of slowing down. This is driving more investment into mobile marketing solutions and programs.

According to VentureBeat’s Mobile Success Landscape, mobile engagement—which includes mobile marketing automation—is second only to app analytics in terms of investment. Mobile marketing has become a universe unto itself, one that businesses are eager to measure more effectively.

Every day, mobile marketing is becoming ever more critical for businesses. Brands that fail to incorporate mobile into their ad, content, and social campaigns will be left wondering where their customers have gone.

For more content like this, follow Samsung Business on InsightsTwitterLinkedIn , YouTube and SlideShare

The post Mobile Marketing Continues to Explode appeared first on Millennial CEO.

 It’s Time To Start Catering To The Omnichannel Shopperphoto credit: Samsung Galaxy S3 via photopin(license)


Let’s block ads! (Why?)

Digitalist Magazine

TSMC Will Start Volume Production Of iPhone's A11 Chips

Taiwanese semiconductor manufacturer and Apple’s chip partner TSMC will reportedly start volume production of A11 chips from April 2017, which will be supplied to the new-generation iPhones to be launched in September 2017.

The A11 chips will adopt the 10-nanometer FinFET manufacturing process. However, this is not TSMC’s first 10-nanometer chip product. The company started using a 10-nanometer manufacturing process from late last year and it started supply the chips from the first quarter of 2017.

TSMC’s production and supply planning of A11 chips is basically the same as those for iPhone 7. By the end of 2017, the company is expected to make 100 million A11 chips, which is a little bit higher than the supply of A10 Fusion chips. A10 Fusion chips are quad-core chips used on iPhone 7 products and it adopted a 16-nanometer process.

The new A11 chips will be used on iPhone 7s or iPhone 8. TSMC will start volume production of A11 chips from April 2017 and its shipment may reach 50 million units before July 2017.

Let’s block ads! (Why?)