• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

The only constant in Big Data is change, or is it?

October 16, 2015   Big Data

Attending Strata + Hadoop World 2015  in New York City is always an amazing experience because there is so much you can do and learn.  Sessions, tutorials, networking and vendor visits are very eye-opening every time you attend one of these shows.  To see how fast the big data ecosystem changes and evolves in such a short amount of time is incredible, but can sometimes be very overwhelming.  In some ways, Strata + Hadoop World is a microcosm of the Hadoop ecosystem.  It’s exciting.  It’s promising.  But once you try to digest everything thoroughly, it’s exhausting.  This time out, I took a step back and evaluated how I could best utilize my limited time for such an extensive event.  I decided that since the technology evolves so quickly, I wouldn’t focus on the technology, I wanted to see how people actually used the technology.

21826594635 1243349185 n The only constant in Big Data is change, or is it?
Strata + Hadoop World is where cutting-edge science and new business fundamentals intersect—and merge.

With this in mind, I attempted to reach out to as many attendees, speakers and vendors to learn what goes on in their big data ecosystem.  What I found was kind of shocking to me, to be honest.  For all the promise of “just load all your data into HDFS and your business will instantly gain insight and make more money hand-over-fist”, no one I could find actually did this. Instead, the process was virtually the same as it was before Hadoop was even born.  The only thing that changed was the underlying technology to be able to accommodate the growing volume, variety and velocity of the data as well as the growing demand from the business.

Back in the day, you had maybe 3 to 5 data sources that you wanted to pull into your data warehouse, so that you could pull reports from that data.  Having that few of data sources made things like data wrangling, quality, lineage, and analysis very trivial to do, even by hand (a very primitive form of technology).  Maybe you’d convert the data from the write-optimized format of your transactional systems to a read-optimized format like a star or snowflake schema making the reporting much easier and faster.  You could even do this on your data warehouse in staging tables. You could then build OLAP (Online Analytical Processing) cubes for easier slicing and dicing of data for analysis and insights.

Demands (real-time, self-service, etc.) and data growth (more sources, larger volumes) are increasing exponentially while technology is changing just as fast to keep pace.  New tools are emerging seemingly daily to handle a specific function of the data pipeline, at scale. These tools are fantastic at their focus, but their messaging tends to over-simplify the necessity of the rest of the data-pipeline processes (e.g. Analytics tools promising you no longer need ETL).  In my conversations with attendees, these promises fall short, leading to confusion, frustration, and feeling like they don’t know where to start.

My advice for them would be simply: Don’t change your process, change your technology.

When you have a technology-first approach, it’s like having a hammer… everything starts to look like a nail.  Find the nail first.  Take, for instance, batch SQL queries on your data warehouse, or mainframe.  Studies show that a majority of data warehouses are performance or capacity constrained. Being able to replace the technology for these workloads from them would be a perfect nail. Having a familiar, well-defined use case like this also allows you to focus on the new technology, rather than the intricacies defining a brand new use case.

With tools like Syncsort DMX-h, you can make the transition to Hadoop even easier. If your SQL ELT jobs were written a long time ago, by someone no longer at the company, or just really long and messy, SILQ can help visualize and offload these to Hadoop with the click of a button. At Strata, I noticed every single session with “Spark” in the title was packed to the brim.  This was because many attendees were moving workloads within Hadoop from MapReduce to Spark.  Two years ago, I remember this same phenomenon happening with MapReduce V1 to MapReduce V2.  DMX-h was able to help its customers make the transition seamlessly with Intelligent Execution.  It doesn’t matter where you end up deploying the job (on-site, cloud, Hadoop), the design of the job is the same.  This helps developers keep pace with the quickly changing big data technologies without having to learn an entirely new stack every 12-18 months.

To keep pace with the growing demands of the business for real-time data, Syncsort DMX-h provides support for Apache Kafka.  This allows developers to blend both real-time and batch data (Lambda Architecture) with the same graphical UI.

In fact, Wikibon just published a research report – “Simplifying and Future-Proofing Hadoop”, that addresses this dilemma and how you can get started (and stay ahead) with tools, like Syncsort DMX-h, that help to hide much of the complexity.

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.

Syncsort blog

Change, Constant, data, Only
  • Recent Posts

    • Tech for Social Impact Day 2019: Helping Nonprofits Succeed
    • How Are Industrial Manufacturers Looking At Digitalization?
    • Drive by smoochies
    • How an embedded PowerApp enables faster sales quoting
    • AI detects potentially damaging ice on wind turbines
  • Categories

  • Archives

    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2019 Business Intelligence Info
Power BI Training | G Com Solutions Limited