Category Archives: Data Warehousing

The first really hidden gem in Oracle Database 12c Release 2: runtime modification of external table parameters

We missed to document some functionality !!!

With the next milestone for Oracle Database 12c Release 2 just taking place – the
availability on premise for Linux x86-64, Solaris Sparc64, and
Solaris x86-64, in addition to the Oracle Cloud – I managed to use this as an excuse to play around with
it for a bit .. and found that we somehow missed to document new
functionality. Bummer. But still better than the other way around .. icon wink The first really hidden gem in Oracle Database 12c Release 2: runtime modification of external table parameters

We missed to document the capability to override some parameters of an external table at runtime.

So I decided to quickly blog about this to not only fill the gap in
documentation (doc bug is filed already) but also to ruthlessly hijack the momentum and to start
highlighting new functionality (there’s more blogs to come, specifically
around my pet peeve Partitioning, but that’s for later).

So what does it mean to override some parameters of an external table at runtime?

It simply means hat you can use one external table definition stub as proxy for external
data access of different files, with different reject limits, at different points in time. Without the
need to do a DDL to modify the external table definition.

The usage is pretty simple and straightforward, so let me quickly
demonstrate this with a not-so-business-relevant sample table. The
pre-requirement SQL for this one to run is at the end of this blog and
might make its way onto github as well; I have not managed that yet and just wanted to get this blog post out.

Here is my rather trivial external table definition. Works for me since version 9, so why not using it with 12.2 as well.

CREATE TABLE et1 (col1 NUMBER, col2 NUMBER, col3 NUMBER)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
  DEFAULT DIRECTORY d1
  ACCESS PARAMETERS
  ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
    NOBADFILE
    NOLOGFILE
    FIELDS TERMINATED BY ","
   )
  LOCATION ('file1.txt')
)
   REJECT LIMIT UNLIMITED
;

Pretty straightforward vanilla external table. Let’s now see how many rows this external table returns (the simple “data generation” is at the end of this blog):

SELECT count(*) FROM et1;

  COUNT(*)
———-
        99
So far, so good. And now the new functionality. We will now access the exact
same external table but tell the database to do a runtime modification of the file (location) we are accessing:

SELECT count(*) FROM et1
EXTERNAL MODIFY
(LOCATION ('file2.txt'));

  COUNT(*)
———-
         9
As you can see, the row count changes without me having done any change
to the external table definition like an ALTER TABLE. You will also see that nothing has
changed in the external table definition:

SQL> SELECT table_name, location FROM user_external_locations WHERE table_name='ET1';

TABLE_NAME                     LOCATION
—————————— ——————————
ET1                            file1.txt
And there’s one more thing. You might have asked yourself right now, right this moment … why do I have to specify a location then for the initial external table creation? The
answer is simple: you do not have to do this anymore.

Here is my external table without a location:

CREATE TABLE et2 (col1 NUMBER, col2 NUMBER, col3 NUMBER)
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
  DEFAULT DIRECTORY d1
  ACCESS PARAMETERS
  ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
    NOBADFILE
    NOLOGFILE
    FIELDS TERMINATED BY ","
   )
)
   REJECT LIMIT UNLIMITED
;

When I now select from it, guess what: you won’t get any rows back. The location is NULL.

SQL> SELECT * FROM et2;

no rows selected
Personally I am not sure I would go for this approach rather than having at least one
dummy file in the location with a control record with some pre-defined
values to being able to tell whether there’s really no records or
whether there’s a programmatic mistake when you plan to always override the location. But as many things in life, that’s a choice. As you saw, you don’t have to.

Using this stub table in the same way as before gives me access to my data.

SELECT count(*) FROM et1
EXTERNAL MODIFY
(LOCATION ('file2.txt'));

  COUNT(*)
———-
         9

You get the idea. Pretty cool stuff. 

Aaah, and to complete the
short functional introduction: the following clauses can be over-ridden:
DEFAULT DIRECTORY,
LOCATION, ACCESS PARAMETERS (BADFILE, LOGFILE, DISCARDFILE) and REJECT
LIMIT. 

That’s about it for now for online modification capabilities for
external tables. I am sure I have forgotten
some little details here and there, but there’s always soo many
things to talk (write) about that you will never catch it all. And
hopefully the documentation will cover it all rather sooner than later.

Stay tuned for now. There’s more blog posts about 12.2 to come. And
please, if you have any comments about this specific one or
suggestions for future ones, then please
let me know. You can always reach me at hermann.baer@oracle.com.

Cheers, over and out.

And here’s the most simple “data generation” I used for the examples above to get “something” in my files. Have fun playing.

rem my directory
rem
create or replace directory d1 as '/tmp';

rem create some dummy data in /tmp
rem
set line 300 pagesize 5000
spool /tmp/file1.txt
select rownum ||’,’|| 1 ||’,’|| 1 ||’,’  from dual connect by level < 100;
spool off

spool /tmp/file2.txt
select rownum ||’,’|| 22 ||’,’|| 22 ||’,’  from dual connect by level < 10;
spool off

Let’s block ads! (Why?)

The Data Warehouse Insider

Data loading into HDFS – Part3. Streaming data loading

In my previous blogs, I already told about data loading into HDFS. In the first blog, I covered data loading from generic servers to HDFS. The second blog was devoted by offloading data from Oracle RDBMS. Here I want to explain how to load into Hadoop streaming data. Before all, I want to note that I will now explain Oracle Golden Gate for Big Data just because it deserves a dedicated blog post. Today I’m going to talk about Flume and Kafka.

What is Kafka? 

Kafka is distributed service bus. Ok, but what is service bus? Let’s imagine that you do have few data systems, and each one needs data from others. You could link it directly, like this:

5 Data loading into HDFS   Part3. Streaming data loading

but it became very hard to manage. Instead this you could have one centralized system, that will accumulate data from all sources and be a single point of contact for all systems. Like this:

 Data loading into HDFS   Part3. Streaming data loading

What is Flume? 

“Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.” – this definition from documentation pretty good explains what is Flume. Flume historically was developed for loading data in HDFS. But why I couldn’t just use Hadoop client?

Challenge 1. Small files.

Hadoop have been designed for storing large files and despite on that on the last few year were done a lot of optimizations around NameNode, it’s still recommended to store only big files. If your source has a lot of small files, Flume could collect them and flush this collection in batch mode, like a single big file. I always use the analogy with glass and drops. You could collect one million drops in one glass and after this, you will have one glass of water instead one million drops.

Challenge 2. Lots of data sources

Let’s imagine that I do have an application (even two on two different servers) that produce files which I want to load into HDFS.

7 Data loading into HDFS   Part3. Streaming data loading

life is good,  if files are large enough it’s not gonna be a problem.

But now let’s imagine, that I have 1000 application servers and each one wants to write data into HDFS. Even if files are large this workload will collapse your Hadoop cluster. If not believe – just try it (but not on production cluster!). So, we have to have something in between HDFS and our data sources. 

8 Data loading into HDFS   Part3. Streaming data loading

Now is time for Flume. You could do two tiers architecture, fist ties will collect data from different sources, the second one will aggregate them and load into HDFS.

9 Data loading into HDFS   Part3. Streaming data loading

In my example I depict 1000 sources, which is handled by 100 Flume servers on the first tier, which is load data on the second tier, that connect directly to HDFS and in my example, it’s only two connections – it’s affordable. Here you could find more details, just want to add that general practice is use one aggregation agent for 4-16 client agents.

I also want to note, that it’s a good practice to use AVRO sink when you move data from one tier to next. Here is example of the flume config file:

########################################################################################################################

——————————————————————————————

agent.sinks = avroSink

agent.sinks.avroSink.type = avro 

agent.sinks.avroSink.channel = memory 

agent.sinks.avroSink.hostname = avrosrchost.example.com

agent.sinks.avroSink.port = 4353 

——————————————————————————————

######################################################################################################################## 

Kafka Architecture.

Deep technical presentation about Kafka you could find here and here actually, I got few screens from there. The Very interesting technical video you could find here. In my article, I just will remind key terms and concepts.

10 Data loading into HDFS   Part3. Streaming data loading

Producer – a process that writes data into Kafka cluster. It could be part of an application or edge nodes could play this role.

Consumer – a process that reads the data from Kafka cluster. 

Brocker – a member of Kafka cluster. Set of members is Kafka cluster. 

Flume Architecture.

You could find a lot of useful information about Flume in this book, here I just highlight key concepts.

11 Data loading into HDFS   Part3. Streaming data loading

Flume  has 3 major components:

1) Source – where I get the data

2) Chanel – where I buffer it. It could be memory or disk, for example. 

3) Sink – where I load my data. For example, it could be another tier of Flume agents, HDFS or  HBase.

12 Data loading into HDFS   Part3. Streaming data loading

Between source and channel, there are two minor components: Interceptor and Selector.

With Interceptor you could do simple processing, with Selector you could choose channel depends on the message header. 

Flume and Kafka similarities and differences.

It’s a frequent question: “what is the difference between Flume and Kafka”, the answer could be very expanded, but let me briefly explain key points.

1) Pull and Push.

Flume accumulates data up to some condition (number of the events, size of the buffer or timeout) and then push it to the disk

Kafka accumulates data until client initiate reads. So client pulls data whenever he wants.

2)  Data processing

Flume could do simple transformations by interceptors

Kafka doesn’t do any data processing, just store that data. 

3) Clustering

Flume usually is a batch of single instances.

Kafka is the cluster, which means that it has such benefits as High Availability and scalability out of the box without extra efforts. 

4) Message size

Flume doesn’t have any obvious restrictions for size of the message

Kafka was designed for few KB messages

5) Coding vs Configuring

Flume usually configurable tool (users usually don’t write the code, instead of this they use configure capabilities).

With Kafka, you have to write code for load/unload the data.

Flafka.

Many customers are thinking about choosing right technology either Flume or Kafka for handing their data streaming. Stop choosing, use both. It’s quite common use case and it named as Flafka. Good explanation and nice pictures you could find here (actually I borrowed few screens from there).

First of all, Flafka is not a dedicated project. It’s just bunch of Java classes for integration Flume and Kafka.

1 Data loading into HDFS   Part3. Streaming data loading

Now  Kafka could be either source for Flume by flume config:

flume1.sources.kafka-source-1.type = org.apache.flume.source.kafka.KafkaSource

or channel by the following directive:

flume1.channels.kafka-channel-1.type = org.apache.flume.channel.kafka.KafkaChannel 

Use Case1. Kafka as a source or Chanel

if you do have Kafka as enterprise service bus (see my example above) you may want to load data from your service bus into HDFS. You could do this by writing Java program, but if don’t like it, you may use Kafka as a Flume source. 

2 Data loading into HDFS   Part3. Streaming data loading

in this case, Kafka could be also useful for smoothing peak load. Flume provides flexible routing in this case.

Also, you could use Kafka as a Flume Chanel for high availability purposes (it’s distributed by application design). 

Use case 2. Kafka as a sink.

If you use Kafka as enterprise service bus, I may want to load data into it. The native way for Kafka is Java program, but if you feel, that it will be way more convenient with Flume (just using few config files) – you have this option. The only one that you need is config Kafka as a sink.

3 Data loading into HDFS   Part3. Streaming data loading

Use case 3. Flume as the tool to enrich data.

As I Already told before – Kafka could do any data processing. It just stores data without any transformation. You could use Flume as the way to add some extra information to your Kafka messages. For doing this you need to define Kafka as a source, implement interceptor which will add some information to your message and write back to the Kafka in a different topic.

4 Data loading into HDFS   Part3. Streaming data loading

Conclusion.

There are two major tools for loading stream data – Flume and Kafka. There is no right answer, what to use because each tool has own advantages/disadvantages. Generally, it’s why Flafka have been created – it’s just a combination of those two tools.

Let’s block ads! (Why?)

The Data Warehouse Insider

Dealing with very very long string lists using Database 12.2

 Dealing with very very long string lists using Database 12.2

Oracle RDBMS 11gR2 introduced the LISTAGG function for working with string values. It can be used to aggregate values from groups of rows and return a concatenated string where the values are typically separated by a comma or semi-colon – you can determine this yourself within the code by supplying your own separator symbol.

Based on the number of posts across various forums and blogs, it is widely used by developers. However, there is one key issue that has been highlighted by many people: when using LISTAGG on data sets that contain very large strings it is possible to create a list that is too long. This causes the following overflow error to be generated:

ORA-01489: result of string concatenation is too long.

Rather annoyingly for developers and DBAs, it is very difficult to determine ahead of time if the concatenation of the values within the specified LISTAGG measure_expr will cause an ORA-01489 error. Many people have posted workarounds to resolve this problem – including myself. Probably the most elegant and simple solution has been to use the 12c MATCH_RECOGNIZE feature, however, this required use of 12c Release 1 which was not always available to all DBAs and/or developers.

If you want to replicate the problem and you have access to the sample SH schema then try executing this query:

SELECT
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

All the samples in this post use our sample SH schema. Once we release the on-premise version of 12.2 you will be able to download the Examples file for your platform from the database home page on OTN. I do have a tutorial ready-to-go on LiveSQL, however, there is currently a technical issue with using very very very long strings – essentially, running my LISTAGG workshop generates a ‘Data value out of range’ error so as soon as it’s fixed I will update this blog post with a link to the tutorial.

What have we changed in 12.2?

One way of resolving ORA-01489 errors is to simply increase the size of VARCHAR2 objects.

Larger object sizes

The size limit for VARCHAR2 objects is determined by the database parameter MAX_STRING_SIZE. You can check the setting in your database using the following command:

show parameter MAX_STRING_SIZE

in my demo environment this returns the following:

NAME            TYPE   VALUE
————— —— ——–
max_string_size string STANDARD

Prior to Oracle RDBMS 12.1.0.2 the upper limit for VARCHAR2 was 4K. With Oracle RDBMS 12.1.0.2 this limit has been raised to 32K. This increase may solve a lot of issues but it does require a change to the database parameter MAX_STRING_SIZE. By setting MAX_STRING_SIZE = EXTENDED this enables the new 32767 byte limit.

ALTER SYSTEM SET max_string_size=extended SCOPE= SPFILE;

However, with the increasing interest in big data sources it is clear that there is still considerable potential for ORA-01489 errors as you use the LISTAGG feature within queries against extremely large data sets.

What is needed is a richer syntax within the LISTAGG function and this has now been implemented as part of Database 12c Release 2.

Better list management

With 12.2 we have made it easier to manage to lists that are likely to generate an error because they are too long. There are a whole series of new keywords that can be used:

  • ON OVERFLOW ERROR
  • ON OVERFLOW TRUNCATE
  • WITH COUNT vs. WITHOUT COUNT

Let’s look a little closer at each of these features….. 

1. Keeping Pre-12.2 functionality

If you want to your existing code to continue to return an error if the string is too long then the great news is that this is the default behaviour. When the length of the LISTAGG string exceeds the VARCHAR2 limit then the standard error will be returned:

ERROR at line xxx:
ORA-01489: result of string concatenation is too long

However, where possible I would recommend adding “ON OVERFLOW ERROR” to your LISTAGG code to make it completely clear that you are expecting an error when an overflow happens:

SELECT 
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’ ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

So it’s important to note that by default the truncation features are disabled and you will need to change any existing code if you don’t want to raised an error. 

2. New ON OVERFLOW TRUNCATE… keywords

If you want to truncate the list of values at the 4K or 32K boundary then you need to use the newly added keywords ON OVERFLOW TRUNCATE as shown here:

SELECT 
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’ ON OVERFLOW TRUNCATE) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

when truncation occurs we will truncate back to the next full value at which point you can control how you tell the user that the list has been truncated. By default we append three dots ‘…’ to the string as indicator that truncation has occurred but you can override this as follows:

SELECT 
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’ ON OVERFLOW TRUNCATE ‘***) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

If you want to keep the existing pre-12.2 behaviour where we return an error if the string is too long then you can either rely on the default behaviour or explicitly state that an error should be returned (always a good idea to avoid relying on default behaviour in my opinion) by using the keywords:

SELECT 
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’ ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

which will now generate the normal error message – i.e. replicates the pre-12.2 behaviour:

ORA-01489: result of string concatenation is too long
01489. 00000 – “result of string concatenation is too long”
*Cause: String concatenation result is more than the maximum size.
*Action: Make sure that the result is less than the maximum size.

 of course you can simply omit the new keywords and get the same behaviour:

SELECT 
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

which, as before, generates the normal error message – i.e. replicates the pre-12.2 behaviour:

ORA-01489: result of string concatenation is too long
01489. 00000 – “result of string concatenation is too long”
*Cause: String concatenation result is more than the maximum size.
*Action: Make sure that the result is less than the maximum size.

3. How many values are missing?

If you need to know how many values were removed from the list to make it fit into the available space then you can use the keywords ‘WITH COUNT’ – this is the default behaviour. Alternatively if you don’t want a count at the end of the truncated string you can use the keywords ‘WITHOUT COUNT’, which is the default behaviour.

SELECT 
g.country_region,
LISTAGG(c.cust_first_name||’ ‘||c.cust_last_name, ‘,’ ON OVERFLOW TRUNCATE ‘***’ WITH COUNT) WITHIN GROUP (ORDER BY c.country_id) AS Customer
FROM customers c, countries g
WHERE g.country_id = c.country_id
GROUP BY country_region
ORDER BY country_region;

4. Do we split values when truncation occurs?

No. When determining where to force the truncation we take into account the full length of each value. Therefore, if you consider the example that we have been using which creates a list of customer names within each country we will always include the customer full name  “Keith Laker” (i.e. first name + last name). There has to be enough space to add the complete string (first+last name) to the list otherwise the whole string, “Keith Laker” is removed and the truncation indicator is inserted. It’s not possible for the last value in the string to be only the first name where the last name has been truncated/removed. 

5. How do we calculate the overall length of the string values?

The characters to indicate that an overflow has occurred are appended at the end of the list of values, which in this case if the default value of three dots “. . .”.  The overflow functionality traverses backwards from the maximum possible length to the end of the last complete value in the LISTAGG clause, then it adds the user-defined separator followed by the user defined overflow indicator, followed by output from the ’WITH COUNT’ clause which adds a counter at the end of a truncated string to indicate the number of values that have been removed/truncated from the list. 

Summary

With Database 12c Release 2 we have tackled the ORA-01489 error in two ways: 1) increased the the size of VARCHAR2 objects to 32K and 2) extended functionality of LISTAGG to allow greater control over the management of extremely long lists. Specifically there are several new keywords:

  • ON OVERFLOW TRUNCATE
  • ON OVERFLOW ERROR (default behaviour)
  • WITH COUNT
  • WITHOUT COUNT  (default behaviour)

Hopefully this new functionality will mean that all those wonderful workarounds for dealing with “ORA-01489: result of string concatenation is too long“ errors that have been created over the years can now be replaced by standard SQL functionality. 

Let’s block ads! (Why?)

The Data Warehouse Insider

SQL Pattern Matching Deep Dive – Part 6, state machines

256px HAL9000.svg SQL Pattern Matching Deep Dive   Part 6, state machines

The obvious way to start this particular post is to pose a couple of simple questions: what is a state machine and why should you care? In general I would say that you don’t need to know about or care about state machines. That’s the beauty of using SQL for pattern matching. The MATCH_RECOGNIZE clause encapsulates all the deep technical modelling and processing that has to be performed to run pattern matching on a data set. However, there are times when it is useful, probably vital, that you understand what is going on behind the scenes and one of the most obvious situations is when backtracking happens.

Therefore, the content covered in this post is a going to be a gently lead-in into my next post where I am going to discuss the concept of “backtracking”  and the dreaded ORA-30009 error.

Let’s start our voyage of discovery…when you attempt to run a SQL statement containing a MATCH_RECOGNIZE clause during the compilation phase we generate a finite state machine based on the PATTERN and DEFINE clauses in your statement. What is an Finite State Machine? According to wikipedia:

A finite-state machine (FSM)…is a mathematical model of computation.…it is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time…changes from one state to another when initiated by a triggering event or condition; this is called a transition. A particular FSM is defined by a list of its states, and the triggering condition for each transition.

Reference from wikipedia – https://en.wikipedia.org/wiki/Finite-state_machine

A state machine, which is the PATTERN and DEFINE elements of your MATCH_RECOGNIZE clause, can be represented by a directed graph called a state diagram. This diagram shows each of the possible states for the “machine” and the conditions that force the machine to either remain in its current state or move to the next state. Below is a simple example of a state machine:  

 SQL Pattern Matching Deep Dive   Part 6, state machines

On the above diagram each state is represented by a node (ellipse) which in this case are marked as “State 1” to “State 4”. The arrows, known as Edges, show the transition(s) from one state to another. If you look at states 2 and 4 you will notice that they have two edges although these edges are shown in different vertical positions on the diagram. When drawing a proper state diagram each event is labeled with the event (condition) that triggers transition. Events (conditions) that don’t cause a change of state are represented by a circular arrow returning to the original state and these can be seen on states 2 and 4.

The precedence for reading the information is to read from top-down. What this means is that when in State 2 the FSM will test to see if State 3 can be achieved and if it can’t it will then test to see if State 2 can be maintained. The reverse is true for State 4 where the FSM will test to see if State 4 can be maintained and if it can’t it will then, in this example, either end having determined that a match has completed or start backtracking to try and complete a match. I am sure you can now see how this is going to link into my next blog post.

State machines are not limited to just pattern matching. They have all sorts of other uses. If you want a gentle diversion to look at state machines in a little more detail then try this article by Enrique Ortiz from the OTN magazine in August 2004: Managing the MIDlet Life-Cycle with a Finite State Machine.

All of this flows directly into keywords that appear (or don’t appear) in the explain plans which was covered in this post MATCH_RECOGNIZE and the Optimizer from January 2015. As quick refresher…essentially there are four new keywords that you need to be aware of:

  • MATCH RECOGNIZE
  • SORT
  • BUFFER
  • DETERMINISTIC FINITE AUTO

The fist three bullet points are reasonably obvious. The last keyword is linked to the use of “state machine”. Its appearance, or lack of appearance, affects the way our pattern is applied to our data set but that is all explained in the blog post. Most of my MATCH_RECOGNIZE examples are based on the stock ticker data set. Let’s assume that we are searching for V-shaped patters in our data set (https://docs.oracle.com/database/121/DWHSG/pattern.htm#CACHHJJG):

SELECT *
FROM Ticker MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY tstamp
MEASURES STRT.tstamp AS start_tstamp,
LAST(DOWN.tstamp) AS bottom_tstamp,
LAST(UP.tstamp) AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST UP
PATTERN (STRT DOWN+ UP+)
DEFINE
DOWN AS price < PREV(down.price),
UP AS price > PREV(up.price)
) MR
ORDER BY MR.symbol, MR.start_tstamp;

this is what the state diagram would look like:

 SQL Pattern Matching Deep Dive   Part 6, state machines

These diagrams can be really helpful when you have more complex patterns and you need to consider the impact of backtracking. This posts is all about laying the building blocks for my next post on backtracking and the dreaded ORA-30009 error. If you have managed to read this far then you are guaranteed to be ready for an in-depth look at what happens inside MATCH_RECOGNIZE when we have move from right to left through our state diagram in an attempt to find a complete match. Now you should know everything you need to know about state machines and I am going to carry over the “why care” part to the next post…

If you want a recap of where we are in this series of pattern matching deep dive posts here is the full list:

As per usual, if you have any comments or questions then feel free to contact me directly via email: keith.laker@oracle.com 

Technorati Tags: , , , ,

Let’s block ads! (Why?)

The Data Warehouse Insider

Amazon fiber optic project seeks funding

A project aimed at connecting remote areas in Brazil’s Amazon rainforest needs R$ 70m ($ 21m) in funding in order to resume work in 2017.

The resources are required for the purchase and roll-out of additional fibre optic infrastructure as well as other equipment needed for the implementation.

Currently, the Amazônia Conectada (Connected Amazon), project led by the Brazilian Army connects some 52 riverside municipalities to the Internet through subfluvial fiber optic cables.

One of the heads at the Army’s Integrated Telematics Center, Colonel Eduardo Wolski, told Brazilian portal Convergência Digital that the body is hoping to secure backing from the federal and states governments as well as the private sector – particularly telecom providers interested in expanding their foothold in the North of the country – so that it can carry on.

Lenovo launches New Glass C200: A quick tour

lenovo c200 a Lenovo launches New Glass C200: A quick tour

Connect with us


This article passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.
Recommended article: The Guardian’s Summary of Julian Assange’s Interview Went Viral and Was Completely False.

Colbran South Africa

Cyanogen shutting down services and OS by December 31

bloglogo1 optimized Cyanogen shutting down services and OS by December 31Image: Cyanogen

Cyanogen late Friday announced all services and nightly builds will be discontinued no later than December 31.

Leaked files reveal scope of Israeli firm’s phone cracking tech

cellebrite hero Leaked files reveal scope of Israeli firm’s phone cracking tech

(Image: file photo)

Earlier this year, we were sent a series of large, encrypted files purportedly belonging to a US police department as a result of a leak at a law firm, which was insecurely synchronizing its backup systems across the internet without a password.

Among the files was a series of phone dumps created by the police department with specialist equipment, which was created by Cellebrite, an Israeli firm that provides phone-cracking technology.

CIO leadership: Two female role models

cio leadership two female role models CIO leadership: Two female role models

Image from Istockphoto

In part one of my conversation with Kim Stevenson and Adriana Karaboutis, we discussed the challenges and opportunities facing women in technology

Also read:

Women in technology: Advice from Intel and Biogen

This second post offers advice from Andi and Kim on how CIOs can add value to their organizations. Both women have years of CIO experience; Kim was CIO at Intel and Andi worked in that role at Dell. Having been in the role and then left to take on business positions, they are each highly qualified to offer advice to CIOs.

Kim is currently Chief Operating Officer for the Client, IoT and System Architecture Group at Intel and Adriana is Executive Vice President Technology, Business Solutions & Corporate Affairs at Biogen. Both women are on the boards of public companies.

Turnbull’s agile struggle is all glitz and no grunt

“Welcome to the ideas boom. There’s never been a more exciting time to be Australian. Agile. Nimble. Innovation.” Remember those words from a little over a year ago, when Prime Minister Malcolm Turnbull launched Australia’s National Innovation and Science Agenda?

Are we still excited? Actually, were we even excited to begin with?