• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Four Really Real Meanings of Real-Time

March 16, 2016   Big Data

This blog was originally posted on the “Big Data Page by Paige” Blog.

Our director of engineering told me that she had a customer ask if we could do real-time data processing with Syncsort DMX-h. Knowing that real-time means different things to different people, the engineer asked what exactly the customer meant by real-time. He said, “We want to be able to move our data out of the database and into Hadoop in real-time every two hours.”

When she told me that story, I wanted to quote Inigo Montoya from “The Princess Bride.” You keep using that word, “real-time.” I do not think it means what you think it means.

But what does real-time actually mean? And what do you really mean when you say real-time? What do other people usually mean when they same real-time? How can you tell which meaning people are using? And what the heck is near real-time?

Here are four different things that I believe real-time really means, and how to determine which meaning you’re using.

 Four Really Real Meanings of Real Time

Sub-Second Response

Generally, when engineers say “real-time”, they are usually referring to sub-second response time. In this kind of real-time data processing, nanoseconds count. Extreme levels of performance are key to success.

Example quotes:

“Our cyber-security process has to respond in real-time to stop automated attacks from stealing customer data.”

“This stock exchange application has to bid in real-time or we’ll lose money.”

If this is what you mean when you say “real-time data processing”, then you need the data to come in, the condition for response to be evaluated, and the response to happen  ̶  all generally in less than a second. And if someone else’s system can do it a few nanoseconds faster, you might lose out. In this kind of real-time, pushing the limits of performance isn’t a bonus; it’s a necessity.

Human Comfortable Response Time

What this kind of real-time processing comes down to is a commandment: “Thou shalt not bore or frustrate the users.” The performance requirement for this kind of processing is usually a couple of seconds.

Example quotes:

“We need real-time drill down on visualizations for our business intelligence team, no matter how big the data.”

“This website needs to respond to user requests in real-time or we’ll lose sales.”

If this is what you mean when you say “real-time”, then performance matters, but it may not be the number one criteria. In some cases, a difference of a single second can be critical. For instance, if a person clicks on an ad on a web page, and the page takes 4 seconds to load, the user is likely to get bored and go look at a different web page. If that same page had loaded in 3 seconds, that user might have bought something on that web page.

For the most part, however, as long as the data gets crunched and the application responds before the user decides to go surf somewhere else, or check email or something, then the performance requirement is met.

Event-Driven

If when you say “real-time”, you mean the opposite of scheduled, then you mean event-driven. Instead of happening in a particular time interval, event-driven data processing happens when a certain action or condition triggers it. The performance requirement for this is generally before another event happens.

Example quotes:

“As changes are made to the database, the replication process copies them out to the cluster in real-time.”

“A listener watches for data to arrive from our customers in this location, then loads it into the system in real-time.”

In some cases, you don’t know precisely when you’ll need data processing done, but as soon as a certain thing happens, that’s when the need for data processing is triggered. Common event examples are changes in the data or user actions.

There are actually two different performance requirements for event-driven data processing. First, the data processing system has to be finished working and ready to start again before the next event happens. So, if on average, the events happen no closer together than five minutes, a data processing time frame of 2-3 minutes is excellent. If the events tend to happen an average of 10 seconds apart, then clearly, a 2-3 minute processing time would be unacceptable.

The second performance requirement may be more arbitrary. It’s the busines SLA. If for example, you want to be able to assure the CEO that his dashboards have the most current data up to the minute, then the data processing has to be able to complete within a minute of any data change in order to meet that deadline.

Streaming Data Processing

If when you say “real-time”, you mean the opposite of batch processing, then you mean streaming data processing. In batch processing, data is gathered together, and all records or other data units are processed in one big bundle until they’re all done. In streaming data processing, the data is processed as it flows in, one unit at a time. And once the data starts coming in, it generally doesn’t end. The performance requirement for streaming data processing is you must process data as fast as the data flows in.

Example quotes:

“We’re sifting through Twitter data in real-time for mentions of our company to keep an eye on sentiment.”

“The server information in this data center is monitored in real-time to catch problems early.”

More and more, when people say “real-time data processing” these days, they are most likely referring to streaming data processing. Streaming data processing has some very specific, and sometimes tricky to implement requirements. You have to be able to process the data continuously, without start-up or clean-up overhead. Micro-batch streaming data processing frameworks like Spark Streaming have found a way to handle start-up and clean-up needs while still keeping up with streaming performance speed requirements. Streaming data processing also requires a way to deal with occasional system failures without massive data loss. In some cases, data loss is acceptable, but in others, it isn’t.

Near Real-Time

The takeaway from all these different meanings of real-time is not that you’re using the word wrong, or one definition is more right than another. It’s that when you’re thinking about implementing a real-time data processing application, it’s important to consider what kind of real-time you really mean. Based on that, you can determine what level of performance you will require.

 Four Really Real Meanings of Real Time
That’s not real-time. That’s near real-time. Duh!

So, what does near real-time mean? Well, near real-time is essentially something engineers say because they’re cringing inside about how ambiguous the word “real-time” has become. Seriously, why does one little hyphenated word suddenly have four different meanings? What’s up with this English language drift thing? It’s so imprecise!

Near real-time basically means any one of the definitions I mentioned, aside from sub-second response time. Although, I have heard an engineer or two use real-time to mean streaming.

So, What Does Real-Time Really, Really Mean?

At the recent Spark Summit East, Syncsort GM, Tendu Yogurtcu was asked, “What trends do you see coming up?” to which she responded, “A lot more customers are moving to real-time data processing.” Ali Ghodsi, CEO of DataBricks had the same opinion. “We’re seeing a real push for real-time.” He also saw trends toward breaking data siloes and doing more advanced analytics, “all in real-time.” Tendu also said that the future lies in combining streaming and batch on one platform.

In the same interview, Tendu and  Dave Vallente of the CUBE delved into the question that spurred this blog post – what is the real definition of real-time? Dave came up with a great one that covers all four of the above meanings, and Tendu called back to it: “Respond before you lose the customer.” This, in some ways, is the best possible way to think about real-time when designing data processing systems.

Regardless of what level of performance your system has in any given situation, if you end up losing the customer, then it’s simply too slow. Maybe you should try moving up to real-time data processing.

Let’s block ads! (Why?)

Syncsort blog

Four, Meanings, real, Really, realtime
  • Recent Posts

    • Export with large E instead of small e
    • You’ll be back
    • Building AI for the Global South
    • Dapper Duo
    • AI Weekly: These researchers are improving AI’s ability to understand different accents
  • Categories

  • Archives

    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited