• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Integrating Streaming Big Data with Kafka

May 23, 2016   Big Data

Kafka is a distributed messaging system that has generated a lot of interest among our customers. Some of those customers were already using a messaging system, but wanted to switch to Kafka in order to massively increase the number of messages they generate and process. Other customers realize that Kafka could be a game changer for their business by allowing them to stream Kafka message data for real-time analysis.. One of many use cases for this is for cable or telephone companies to be able to get alerts on events like dropped calls as they happen, so they can respond immediately.  Also, hospitals can get alerts to life threatening conditions from machines monitoring their patients. Another use case is for online banking to get near real-time updates through customer events processed and integrated via a Kafka data bus.

So, how could these Enterprise customers get started using Kafka?

It made sense for us to add streaming data support to DMX-h by reading from and writing to Kafka topics. This allowed our users to select Kafka topics as sources and targets from our GUI, and leverage the full ETL power of DMX-h when analyzing the contents of Kafka messages. DMX-h users could also leverage a MapReduce or Spark cluster to parallelize the consumption of Kafka messages by letting each node of the cluster read from a subset of partitions in the Kafka topic.

The Kafka project has done a lot of maturing in the past year. As we started our development the Kafka APIs were still changing, and changes were not backwards compatible. Our engineering team started working with Kafka 0.8.2.2, and kept up with the changes as new versions were released. Based on our discussions with our Confluent and Cloudera partners, we decided to release support once Kafka 0.9.0.0 was available, as the API was stabilized and security was added.

During development, we worked closely with some of our customers as design partners. Based on their feedback, we decided to support consuming messages in batch sizes defined either as number of messages, or time interval, to provide more flexibility. We also made sure DMX-h is a reliable Kafka consumer: messages read are only marked as committed in Kafka once DMX-h has written to persistent storage, such as a file or database table. This avoid loss of data if any part of the ETL process fails or needs to be restarted.

If any messages are rejected by Kafka, the DMX-h producer records the rejected messages in a file, so the process can be corrected and the rejected messages can be re-processed, with no loss of data.

By adding point-and-click support to Kafka sources and targets in DMX-h, we’ve made this technology more accessible to Enterprise customers and allowed them to combine batch and streaming data processing in a single platform.  We did this to make Kafka more accessible to our customers because they don’t have to write code to leverage it or when they upgrade to new versions.

Let’s block ads! (Why?)

Syncsort blog

data, Integrating, Kafka, Streaming
  • Recent Posts

    • Ba’al comes to CPAC, Ted Cruz jokes about his Cancun trip
    • Optimizing data migration/integration with Power Platform
    • AI Weekly: Biden calls for $37 billion to address chip shortage
    • NOT WHAT THEY MEANT BY “BUILDING ON THE BACKS OF….”
    • Why Healthcare Needs New Data and Analytics Solutions Before the Next Pandemic
  • Categories

  • Archives

    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited