• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

August 11, 2017   Big Data

Syncsort’s Paige Roberts recently caught up with Jules Damji (@2twitme), the Spark Community Evangelist at Databricks, and they enjoyed a long conversation. In Part 3 of this four-part interview series, we’ll look more at the importance of security to Spark users, the overwhelming move of a lot of Big Data processing to the Cloud, and what the Databricks Platform brings to the table.

In case you missed it. In Part 1, we looked at the Apache Spark community. And, in the second post, we covered how Spark and Hadoop ecosystems are merging, which supports AI development.

Paige Roberts: So, we’ve talked a lot about the new single API for Spark, a single API for Datasets and DataFrames. I can build my application once; I can run it in streaming, I can run it in batch. It doesn’t even matter anymore. I can execute it on this engine now, and maybe next year, I can execute it on another engine, and I won’t have to rewrite it every time. You won’t have to rebuild if it uses the same API. That’s very similar to a Syncsort message we’ve been calling it Intelligent Execution , or Design Once, Deploy Anywhere.

Someone asked at Reynold Xin’s talk, “What do you do when you go from RDD to DataFrames?” The answer was, “Well, you have to re-write.”

[Both laugh]

Damji: Yeah. We can’t quite do it that far back.

blog banner landscape Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

Roberts: Still, that’s a very exciting and appealing model for a lot of folks, designing jobs once and having them execute wherever without re-designing. One of the things I see that Spark has as a distinct advantage over everybody else is just the level of the APIs. They are so much easier to use, they are so much more robust. Even more so with version 2.x. That seems to broaden your community, and make it easier for the community to add to the Spark ecosystem.

Damji: It does make a huge difference in community support and participation.

So, one thing we haven’t touched on much is about the Databricks business model. How does it work?

That’s a good question. Hardly anyone has effectively cracked the code on how to monetize only on open source technology. Probably one of the few companies that a lot of newer company’s model on is Red Hat.

blog damji quote no one has Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

Red Hat had a model of saying, “We are going to take Linux, which is open source, and we are gonna add proprietary and coveted enterprise features on it to make it available and suitable for an enterprise. Then we are going to charge for a subscription and provide support and services with it since Linux is our core competency. We have the brilliant hackers who can write your kind of device drivers and that sort of thing.”

We know it better than anyone else.

Exactly. We know it better than anyone, so one added value is a core competency. Another is enterprise kinds of security, which you won’t usually get in open source out of the box or from downloading from the repo. Kafka is going the same way with Confluent right?

So, I think that’s the trend. Whoever provides the best experience for Apache Spark on their particular platform, is going to win. Databricks provides the best Apache Spark platform, along with a Unified Analytics Platform that brings people, processes and infrastructure (or platforms) together. We provide the unified work space with notebooks, which data engineers and data scientists can collaborate on; we provide the best IO access for all your storage. We provide enterprise-grade security for both data at rest and data in motion. And we provide a fine-grained pool of serverless clusters.

As more and more data is going into the Cloud, people are more and more worried about sensitive data, and how do you protect that? So, security comes as part of this augmented offering.

blog damji quote financial institutions Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

They are! A lot of our customers are banks, insurance companies, and they’re really concerned with information security.

Financial institutions are a good example, and we have customers in that vertical. Financial institutions are warming up to the fact that Cloud is the future, and a good alternative. We have the same vision. So, we provide this unified analytics platform powered by Apache Spark with other stuff around it, which is Databricks specific. It gives you this comprehensive platform, which differentiates between computing and storage, because we don’t tell you what storage to use.

Related: Expert Interview: Livy and Spot are Apache Spark and Cyber Security Projects, Not the Names of Sean Anderson’s Dogs

Store it however you want.

Right.You can store it however you want. We’ll give you the ability to bring the data in quickly and process it fast and write it back quickly. All these different aspects of Databricks bring tremendous value to our customers: security, fast IO access, core competency of Apache Spark, and the integrated workspace of notebooks.

The data scientist and ETL engineers and business analysts can work collaboratively through the Databricks notebook platform. You bring the data in, you explore the data, you do your ETL, you write notebooks, you create pipelines. So, that’s the added features for our customers that come on top of open source. But underneath it is powered by Apache Spark.

Finally, you also get the ability to productionize your jobs using our job scheduler. And the ability to manage your entire infrastructure without worrying about.

blog Spark community Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

And as long as you keep making Apache Spark better and better, and the community keeps jumping in and loving it, then you guys have got a good future.

Yes! If you try our Community Edition, you’ll actually see those benefits. If you start using our Professional Edition, you begin to see more. Every time we create a new release, we release it for our customers as well as the community. They get that instantaneously.

That’s about as fast as it gets.

Don’t miss the final post of this four-part conversation with Jules Damji (Monday, August 14th), which features more about Spark and Databricks, and the advantages of Cloud data analysis.

Big Data is constantly evolving – are you playing by the new rules? Download our eBook The New Rules for Your Data Landscape today!

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Cloud, Damji, Databricks, discusses, Expert, Interview, Notebooks, Part, Security
  • Recent Posts

    • Accelerate Your Data Strategies and Investments to Stay Competitive in the Banking Sector
    • SQL Server Security – Fixed server and database roles
    • Teradata Named a Leader in Cloud Data Warehouse Evaluation by Independent Research Firm
    • Derivative of a norm
    • TODAY’S OPEN THREAD
  • Categories

  • Archives

    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited