Big Data SQL Quick Start. Complex Data Types – Part 21

Many thanks to Dario Vega, who is the actual author of this content. I’m just publishing it on this blog.

A common potentially mistaken approach that people take regarding the integration of NoSQL, Hive and ultimately BigDataSQL is to use only a RDBMS perspective and not an integration point of view. People generally think about all the features and data types they’re already familiar with from their experience using one of these products; rather than realizing that the actual data is stored in the Hive (or NoSQL) database rather than RDBMS. Or without understanding that the data will be querying from RDBMS. 

When using Big Data SQL with complex types, we are thinking to use JSON/SQL without taking care of differences between Oracle Database and Hive use of Complex Types. Why ? Because the complex types are mapped to varchar2 in JSON format, so we are reading the data in JSON style instead of the original system. 

The Best sample of this is from a Json perspective JSON ECMA-404 – Map type does not exist. 

Programming languages vary widely on whether they support objects, and if so, what characteristics and constraints the objects offer. The models of object systems can be wildly divergent and are continuing to evolve. JSON instead provides a simple notation for expressing collections of name/value pairs. Most programming languages will have some feature for representing such collections, which can go by names like record, struct, dict, map, hash, or object.

The following built-in collection functions are supported in Hive:

  • int size (Map) Returns the number of elements in the map type.

  • array map_keys(Map) Returns an unordered array containing the keys of the input map.

  • array map_values(Map)Returns an unordered array containing the values of the input map.

Are they supported in RDBMS? the answer is NO but may be YES if using APEX PL/SQL or JAVA programs. 

In the same way, there is also a difference between Impala and Hive.

Lateral views. In CDH 5.5 / Impala 2.3 and higher, Impala supports queries on complex types (STRUCT, ARRAY, or MAP), using join notation rather than the EXPLODE() keyword. See Complex Types (CDH 5.5 or higher only) for details about Impala support for complex types.

The Impala complex type support produces result sets with all scalar values, and the scalar components of complex types can be used with all SQL clauses, such as GROUP BY, ORDER BY, all kinds of joins, subqueries, and inline views. The ability to process complex type data entirely in SQL reduces the need to write application-specific code in Java or other programming languages to deconstruct the underlying data structures.


Best practices We would advise taking a conservative approach.

This is because the mappings between the NoSQL data model, the Hive data model, and the Oracle RDBMS data model is not 1-to-1.
For example, the NoSQL data model is quite a rich and there are many things one can do with nested classes in NoSQL that have no counterpart in either Hive or Oracle Database (or both). As a result, integration of the three technologies had to take a ‘least-common-denominator’ approach; employing mechanisms common to all three.

But let me show a sample

Impala code

`phoneinfo`map<string,string>
 
impala>SELECT
        ZIPCODE
       ,LASTNAME
       ,FIRSTNAME
       ,SSN
       ,GENDER
       ,PHONEINFO.*FROM rmvtable_hive_parquet, rmvtable_hive_parquet.PHONEINFO AS PHONEINFO
WHERE zipcode ='02610'AND lastname ='ACEVEDO'AND firstname ='TAMMY'AND ssn =576228946
;
 
+---------+----------+-----------+-----------+--------+------+--------------+| zipcode | lastname | firstname | ssn       | gender |KEY| VALUE        |+---------+----------+-----------+-----------+--------+------+--------------+|02610| ACEVEDO  | TAMMY     |576228946| female |WORK|617-656-9208||02610| ACEVEDO  | TAMMY     |576228946| female | cell |408-656-2016||02610| ACEVEDO  | TAMMY     |576228946| female | home |213-879-2134|+---------+----------+-----------+-----------+--------+------+--------------+

Oracle code:

`phoneinfo`IS JSON
 
 
SQL>SELECT/*+ MONITOR  */
     a.json_column.zipcode
    ,a.json_column.lastname
    ,a.json_column.firstname
    ,a.json_column.ssn
    ,a.json_column.gender
    ,a.json_column.phoneinfo
FROM pmt_rmvtable_hive_json_api a
WHERE a.json_column.zipcode ='02610'AND a.json_column.lastname ='ACEVEDO'AND a.json_column.firstname ='TAMMY'AND a.json_column.ssn =576228946 ;
 
 
ZIPCODE : 02610 
LASTNAME : ACEVEDO
FIRSTNAME : TAMMY
SSN : 576228946
GENDER : female
PHONEINFO :{"work":"617-656-9208","cell":"408-656-2016","home":"213-879-2134"}

QUESTION : How to transform this JSON – PHONEINFO in two “arrays” keys, values- Map behavior expected.

Unfortunately, the nested path JSON_TABLE operator is only available for JSON ARRAYS. In the other side, when using JSON, we can access to each field as columns.

SQL>SELECT/*+ MONITOR  */
   ZIPCODE
  ,LASTNAME
  ,FIRSTNAME
  ,SSN
  ,GENDER
  ,LICENSE
  ,a.PHONEINFO.work
  ,a.PHONEINFO.home
  ,a.PHONEINFO.cell
FROM pmt_rmvtable_hive_orc a  WHERE zipcode ='02610'AND lastname ='ACEVEDO'AND firstname ='TAMMY'AND ssn =576228946;
 
 
 
ZIPCODE 	     LASTNAME		  FIRSTNAME		      SSN GENDER	       LICENSE		  WORK		  HOME		  CELL
-------------------- -------------------- -------------------- ---------- -------------------- ------------------ --------------- --------------- ---------------02610		     ACEVEDO		  TAMMY 		576228946 female	       533933353734363933617-656-9208213-879-2134408-656-2016

and what about using map columns on the where clause Looking for a specific phone number

Impala code

`phoneinfo`map<string,string>SELECT
   ZIPCODE
  ,LASTNAME
  ,FIRSTNAME
  ,SSN
  ,GENDER
  ,PHONEINFO.*FROM rmvtable_hive_parquet, rmvtable_hive_parquet.PHONEINFO AS PHONEINFO
WHERE PHONEINFO.key='work'AND PHONEINFO.value ='617-656-9208'
;
 
+---------+------------+-----------+-----------+--------+------+--------------+| zipcode | lastname   | firstname | ssn       | gender |KEY| VALUE        |+---------+------------+-----------+-----------+--------+------+--------------+|89878| ANDREWS    | JEREMY    |848834686| male   |WORK|617-656-9208||00183| GRIFFIN    | JUSTIN    |976396720| male   |WORK|617-656-9208||02979| MORGAN     | BONNIE    |904775071| female |WORK|617-656-9208||14462| MCLAUGHLIN | BRIAN     |253990562| male   |WORK|617-656-9208||83193| BUSH       | JANICE    |843046328| female |WORK|617-656-9208||57300| PAUL       | JASON     |655837757| male   |WORK|617-656-9208||92762| NOLAN      | LINDA     |270271902| female |WORK|617-656-9208||14057| GIBSON     | GREGORY   |345334831| male   |WORK|617-656-9208||04336| SAUNDERS   | MATTHEW   |180588967| male   |WORK|617-656-9208|
...
|23993| VEGA       | JEREMY    |123967808| male   |WORK|617-656-9208|+---------+------------+-----------+-----------+--------+------+--------------+
 
Fetched 852ROW(s) IN99.80s

But let me continue showing the same code on Oracle (querying on work phone).

Oracle code

`phoneinfo`IS JSON
 
 
SELECT/*+ MONITOR */
   ZIPCODE
  ,LASTNAME
  ,FIRSTNAME
  ,SSN
  ,GENDER
  ,PHONEINFO
FROM pmt_rmvtable_hive_parquet  a
WHERE JSON_QUERY("A"."PHONEINFO" FORMAT JSON , '$  .work' RETURNING VARCHAR2(4000) ASIS  WITHOUTARRAY WRAPPER NULLON   ERROR)='617-656-9208'
;
 
35330		     SIMS		  DOUGLAS		295204437 male		       {"work":"617-656-9208","cell":"901-656-9237","home":"303-804-7540"}43466		     KIM		  GLORIA		358875034 female	       {"work":"617-656-9208","cell":"978-804-8373","home":"415-234-2176"}67056		     REEVES		  PAUL			538254872 male		       {"work":"617-656-9208","cell":"603-234-2730","home":"617-804-1330"}07492		     GLOVER		  ALBERT		919913658 male		       {"work":"617-656-9208","cell":"901-656-2562","home":"303-804-9784"}20815		     ERICKSON		  REBECCA		912769190 female	       {"work":"617-656-9208","cell":"978-656-0517","home":"978-541-0065"}48250		     KNOWLES		  NANCY 		325157978 female	       {"work":"617-656-9208","cell":"901-351-7476","home":"213-234-8287"}48250		     VELEZ		  RUSSELL		408064553 male		       {"work":"617-656-9208","cell":"978-227-2172","home":"901-630-7787"}43595		     HALL		  BRANDON		658275487 male		       {"work":"617-656-9208","cell":"901-351-6168","home":"213-227-4413"}77100		     STEPHENSON 	  ALBERT		865468261 male		       {"work":"617-656-9208","cell":"408-227-4167","home":"408-879-1270"}852ROWS selected.
 
Elapsed: 00:05:29.56

In this case, we can also use the dot-notation A.PHONEINFO.work = ‘617-656-9208′

Note: for make familiar with Database JSON API you may use follow blog series: https://blogs.oracle.com/jsondb

Let’s block ads! (Why?)

Oracle Blogs | Oracle The Data Warehouse Insider Blog