Tag Archives: discusses

Expert Interview (Part 1): Wikibon’s James Kobielus Discusses the Explosive Impact of Machine Learning

It’s hard to mention the topics of automation, artificial intelligence or machine learning without various parties speculating that technology will soon throw everybody out of their jobs. But James Kobielus (@jameskobielus) sees the whole mass unemployment scenario as overblown.

The Future of AI: Kobielus Sees Progress Over Fear

Sure, AI is automating a lot of knowledge-based and not-so-knowledge-based functions right now. It is causing dislocations in our work and in our world. But the way Kobielus looks at it, AI is not only automating human processes, it’s augmenting human capabilities.

“We make better decisions, we can be more productive … We’re empowering human beings to do far more with less time,” he says. “If fewer people are needed for things we took for granted, that trend is going to continue.”

It’s anybody’s guess how the world will look in the future, Kobielus says. But he doesn’t believe in the nightmare scenarios in which AI puts everyone out of a job. Why? Basic economics.

The industries that are deploying AI won’t have the ability to get customers if everyone is out of a job.

“There needs to be buying power in order to power any economy, otherwise the AI gravy train will stop,” he says.

blog kobielus quote2 Expert Interview (Part 1): Wikibon’s James Kobielus Discusses the Explosive Impact of Machine Learning

Kobielus is the lead analyst with Wikibon, which offers market research, webinars and consulting to clients looking for guidance on technology. His career in IT spans more than three decades and three-quarters of it has been in analyst roles for different firms. Before going to Wikibon, he spent five years at IBM as a data science evangelist in a thought leadership marketing position espousing all things Big Data and data science.

He talks regularly on issues surrounding Big Data, artificial intelligence, machine learning and deep learning.

How Machine Learning is Impacting Industry Today

Machine learning is a term that’s been around for a while now, Kobielus says. At its core, it’s simply using algorithms and analytics to find patterns in data that you wouldn’t have been able to find otherwise. Regression models and vector machines are examples of more established forms of machine learning. Today, newer crops of algorithms are lumped under what are called neural networks or recurrent neural networks.

“That’s what people think of as machine learning – it’s at the heart of industry now,” Kobielus says.

Brands are using these neural network tools for face and voice recognition, natural language processing and speech recognition.

Applied to text-based datasets, machine learning is often used to identify concepts and entities so that they can be distilled algorithmically to determine people’s intentions or sentiments.

blog banner 2018 Big Data Trends eBook Expert Interview (Part 1): Wikibon’s James Kobielus Discusses the Explosive Impact of Machine Learning

“More and more of what we see in the machine learning space is neural networks that are deeper,” Kobielus says. “[They’re] not just identifying a face, but identifying a specific face and identifying the mood and context of situation.”

They’re operating at much higher levels of sophistication.

And rather than just being used in a mainframe, more often these algorithms are embedded in chips that are being put into phones, smart cars and other “smart” technologies.

Consumers are using these technologies daily when they unlock their phones using facial recognition, ask questions to tools like Alexa or automatically tag their friends on Facebook photos.

More and more industries are embracing deep learning – machine learning that is able to process media objects like audio and video in real time, offering automated transcription, speech to text, facial recognition, for instance. Or, the ability to infer the intent of a user from their gesture or their words.

Beyond just translating or offering automated transcriptions, machine learning provides a real-time map of all the people and places being mentioned and shares how they relate to each other.

Looking at the internet of things market, anybody in the consumer space that wants to build a smart product is embedding deep learning capabilities right now.

Top Examples of Machine Learning: Self-Driving Cars and Translations

Kobielus points to self-driving vehicles as a prime example of how machine learning is being used.

“They would be nothing if it weren’t for machine learning – that’s their brains.”

Self-driving vehicles process a huge variety of input including images, sonar, proximity, and speed as well as the behavior of the people inside– inferring their intent, where they want to go, what alternative routes might be acceptable based on voice, gestures, their history of past travel and more.

Kobielus is also excited about advances in translation services made possible by machine learning.

“Amazon Translate, human translation between human languages in real-time, is becoming scary accurate, almost more accurate than human translation,” Kobielus says.

In the not-too-distant future, he predicts that people will be able to just wear an earpiece that will translate a foreign language in real-time so they will be able to understand what people are saying around them enough to at least get by, if not more.

“The perfect storm of technical advances are coming together to make it available to everybody at a low cost,” he says.

Learn more about the top Big Data trends for 2018 in Syncsort’s eBook based on their annual Big Data survey.

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 2): Alation’s CEO Sangani Discusses Big Data Management Trends and Best Practices

In Part 1of this two-part interview, Satyen Sangani (@satyx), CEO and co-founder of Alation, spoke about data cataloging. In today’s Part 2, he provides his thoughts on trends and best practices in Big Data management.

What are some of the more outdated or inefficient processes involved with accessing relevant data today? What is slowing businesses down?

It can take a long time to extract data from the data lake and get the right data to the right person exactly when they need it.

Businesses are moving to self-service analytics solutions where it isn’t necessary to have the involvement of the IT department to access and work with data. However, self-service tools often fail at helping users understand how to appropriately use the data. Specifically, they don’t always know which data sets to use, which definitions to use or which metrics are correct.

blog TDWI Data Lake Checklist 1 Expert Interview (Part 2): Alation’s CEO Sangani Discusses Big Data Management Trends and Best Practices

What should companies be doing today to prepare for how they’ll use data in the future? What should their long-term strategies look like?

Ultimately, you want to get data, business context and technical context in front of your employees as quickly as possible. The days where you could take months to prepare a report are over.

Given this, companies need to spend time thinking a.) how they can get data to their employees as fast as possible and b.) how to train their workforces to find, understand and use that data to get insights fast.

What’s one piece of advice you find yourself repeating to your clients over and over? Something you wish more companies were doing to get more out of their data?

Data governance has traditionally implied a top down, command and control oriented approach. Such an approach generally works when compliance is the primary goal, but when the goal is to get data consumers to use data more often, it’s important to take an iterative and agile approach to data governance.

It’s less about prescribing rules than reacting to users by gently correcting and improving their behavior.

What trends or innovations in Big Data management are you following today? Why do they excite you?

Self-service is, of course, a big one. We also like distributed computation engines like Presto and Spark. The notion that we can disconnect compute from storage is finally becoming a reality.

AI and Machine Learning need to be embedded into every layer of the stack. There’s too much manual work in data and that manual work comes at the cost of speed.

To learn how to put your legacy data to work for you, and plan and launch successful data lake projects with tips and tricks from industry experts, download the TDWI Checklist Report: Building a Data Lake with Legacy Data

2017 Big Data Survey Promo Expert Interview (Part 2): Alation’s CEO Sangani Discusses Big Data Management Trends and Best Practices

 Expert Interview (Part 2): Alation’s CEO Sangani Discusses Big Data Management Trends and Best Practices

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

Syncsort’s Paige Roberts recently caught up with Jules Damji (@2twitme), the Spark Community Evangelist at Databricks, and they enjoyed a long conversation. In Part 3 of this four-part interview series, we’ll look more at the importance of security to Spark users, the overwhelming move of a lot of Big Data processing to the Cloud, and what the Databricks Platform brings to the table.

In case you missed it. In Part 1, we looked at the Apache Spark community. And, in the second post, we covered how Spark and Hadoop ecosystems are merging, which supports AI development.

Paige Roberts: So, we’ve talked a lot about the new single API for Spark, a single API for Datasets and DataFrames. I can build my application once; I can run it in streaming, I can run it in batch. It doesn’t even matter anymore. I can execute it on this engine now, and maybe next year, I can execute it on another engine, and I won’t have to rewrite it every time. You won’t have to rebuild if it uses the same API. That’s very similar to a Syncsort message we’ve been calling it Intelligent Execution , or Design Once, Deploy Anywhere.

Someone asked at Reynold Xin’s talk, “What do you do when you go from RDD to DataFrames?” The answer was, “Well, you have to re-write.”

[Both laugh]

Damji: Yeah. We can’t quite do it that far back.

blog banner landscape Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

Roberts: Still, that’s a very exciting and appealing model for a lot of folks, designing jobs once and having them execute wherever without re-designing. One of the things I see that Spark has as a distinct advantage over everybody else is just the level of the APIs. They are so much easier to use, they are so much more robust. Even more so with version 2.x. That seems to broaden your community, and make it easier for the community to add to the Spark ecosystem.

Damji: It does make a huge difference in community support and participation.

So, one thing we haven’t touched on much is about the Databricks business model. How does it work?

That’s a good question. Hardly anyone has effectively cracked the code on how to monetize only on open source technology. Probably one of the few companies that a lot of newer company’s model on is Red Hat.

blog damji quote no one has Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

Red Hat had a model of saying, “We are going to take Linux, which is open source, and we are gonna add proprietary and coveted enterprise features on it to make it available and suitable for an enterprise. Then we are going to charge for a subscription and provide support and services with it since Linux is our core competency. We have the brilliant hackers who can write your kind of device drivers and that sort of thing.”

We know it better than anyone else.

Exactly. We know it better than anyone, so one added value is a core competency. Another is enterprise kinds of security, which you won’t usually get in open source out of the box or from downloading from the repo. Kafka is going the same way with Confluent right?

So, I think that’s the trend. Whoever provides the best experience for Apache Spark on their particular platform, is going to win. Databricks provides the best Apache Spark platform, along with a Unified Analytics Platform that brings people, processes and infrastructure (or platforms) together. We provide the unified work space with notebooks, which data engineers and data scientists can collaborate on; we provide the best IO access for all your storage. We provide enterprise-grade security for both data at rest and data in motion. And we provide a fine-grained pool of serverless clusters.

As more and more data is going into the Cloud, people are more and more worried about sensitive data, and how do you protect that? So, security comes as part of this augmented offering.

blog damji quote financial institutions Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

They are! A lot of our customers are banks, insurance companies, and they’re really concerned with information security.

Financial institutions are a good example, and we have customers in that vertical. Financial institutions are warming up to the fact that Cloud is the future, and a good alternative. We have the same vision. So, we provide this unified analytics platform powered by Apache Spark with other stuff around it, which is Databricks specific. It gives you this comprehensive platform, which differentiates between computing and storage, because we don’t tell you what storage to use.

Related: Expert Interview: Livy and Spot are Apache Spark and Cyber Security Projects, Not the Names of Sean Anderson’s Dogs

Store it however you want.

Right.You can store it however you want. We’ll give you the ability to bring the data in quickly and process it fast and write it back quickly. All these different aspects of Databricks bring tremendous value to our customers: security, fast IO access, core competency of Apache Spark, and the integrated workspace of notebooks.

The data scientist and ETL engineers and business analysts can work collaboratively through the Databricks notebook platform. You bring the data in, you explore the data, you do your ETL, you write notebooks, you create pipelines. So, that’s the added features for our customers that come on top of open source. But underneath it is powered by Apache Spark.

Finally, you also get the ability to productionize your jobs using our job scheduler. And the ability to manage your entire infrastructure without worrying about.

blog Spark community Expert Interview (Part 3): Databricks’ Damji Discusses Security, Cloud and Notebooks

And as long as you keep making Apache Spark better and better, and the community keeps jumping in and loving it, then you guys have got a good future.

Yes! If you try our Community Edition, you’ll actually see those benefits. If you start using our Professional Edition, you begin to see more. Every time we create a new release, we release it for our customers as well as the community. They get that instantaneously.

That’s about as fast as it gets.

Don’t miss the final post of this four-part conversation with Jules Damji (Monday, August 14th), which features more about Spark and Databricks, and the advantages of Cloud data analysis.

Big Data is constantly evolving – are you playing by the new rules? Download our eBook The New Rules for Your Data Landscape today!

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 2): Robert Corace of SoftServe Discusses Digital Transformation Challenges

In Part 1, Robert talked about how critical importance of digital transformation for organizations. In part two, highlights the results of recent research on digital transformation with a focus on the common challenges organizations face. He also provides some examples of  innovative strategies that companies such as Netflix and Amazon are using to tackle these digital transformation challenges.

What are the most common frustration or challenges your clients are coming to you with to solve? How do you help them?

As our recent research showed, security is the chief concern and the biggest challenge to solve. As I already mentioned, data mining and analytics is also a struggle for many, as well as experience design and organizational inflexibility.

On a higher, more strategic level, though, many companies understand that they need to transform, but they lack clear vision into what areas they need to focus on, where to start, and how to move forward with their transformative initiatives in the fastest and most efficient way. That’s why we have SoftServe Labs, in fact, to help our clients with research and proof-of-concept before they make large investments.

I wouldn’t describe these as purely challenges, though, as these companies also stand to gain a lot. Digital asset management, Cloud computing, mobile technologies, and the Internet of Things (IoT) approached as a part of digital transformation efforts can bring a lot of benefits to consumer facing operations, retail, the finance and banking sector, and many others.

What are some of the digital transformation challenges facing organizations today in harnessing their data?

Numerous security breaches and hacking attacks serve as a proof that we haven’t yet solved security challenges facing all businesses, small and large. Privacy is also a big concern, especially when it comes to access to personal data in healthcare, education, state and government organizations, etc.

blog digital transformation Expert Interview (Part 2): Robert Corace of SoftServe Discusses Digital Transformation Challenges

Data security is one of the common digital transformation challenges facing businesses today.

Another aspect of it is legacy software that cannot handle the amounts of data that require daily processing, and it can’t be all substituted within a couple of days due to financial and resource strains it would put upon the organizations. Artificial Intelligence (AI), though hugely promising, is not yet at that stage when it can automate decision making for truly impactful processes, beyond initial analysis. However, it can facilitate and speed them up considerably.

Related: The Future of Artificial Intelligence in Sales and Account Planning

It also is very important to remember that harnessing data is not an end in itself, but rather a means to help organizations achieve their business and strategic goals. And the consumer – a human being– is at the heart of all of it. So, no purely technical solution, no matter how powerful or innovative, will bring true value if it’s not applied correctly or as a part of a well-thought and comprehensive strategy.

What organizations do you feel have been especially forward-thinking and/or innovative at leveraging their data to solve? What can we learn from them to solve our own digital transformation challenges?

Well, when it comes to leveraging data and personalization, giants like Google and Netflix immediately come to mind. It’s interesting how thoroughly analyzing data and making the right predictions, Netflix managed to reduce the range of content available on their platform while improving customer satisfaction.

And look how Amazon is using data from different sensors and machine learning to disrupt the grocery business with their “Amazon Go” retail store.

When it comes to attracting new customers, which is also a challenge for traditional companies, I like the example of L’inizio Pizza Bar in New York. Their manager decided to attract Pokémon Go players to the place, and he spent just $ 10 to have Pokémon characters lured to his restaurant. The business went up by 75 percent. So, it’s never about technology or software only, it’s about innovative thinking and human ingenuity.

How can organizations manage their data assets more efficiently and effectively? What should their data management strategies include?

With the “Internet of Everything” and connected everything blurring the concepts of office and home devices as well as working hours and workplace, data assets need to be secure and protected and accessible from a variety of different devices, in different formats and easily searchable.

blog banner HP2017webcast Expert Interview (Part 2): Robert Corace of SoftServe Discusses Digital Transformation Challenges

For some organizations – most likely in the government sector, finance, and insurance, etc. – it will require switching to intranet to secure their assets from any unauthorized access or potential loss of information. For others, where remote access from any place, any time is a higher priority, omni-channel and compatibility will be the key focus. The challenges here include the already discussed legacy software and integration issues.

According to IDC research, by 2022 almost all data – 93 percent – in the digital universe will be unstructured. It will also, most likely be content in different formats, including audio and video files, images, interactive content, etc. Not only will this require greater storage and processing capacity, it also means that this data will need to be easily searchable and user friendly if we want it to be used versus stored.

When it comes to customer-facing content, another requirement is consistency across various channels. On the whole, when it comes to data, the current leaders in asset management are platform providers. With these platforms, instead of building their own solutions from scratch, which is a costly and time-consuming approach, businesses can quickly customize and scale a ready-made solution, adding and discarding additional features depending on their current needs.

What are some of the most exciting Big Data trends or innovations you’re following right now? Why do they interest you?

SoftServe’s 2016 Big Data survey showed 62 percent of organizations expect to implement machine learning by 2018, so apparently machine learning and Artificial Intelligence are huge Big Data trends we’re following right now. Chatbots as a customer-facing form of AI technology have gained momentum and are quickly becoming an area of huge interest for all kinds of user support activities.

But from a high-level perspective it’s nothing new, really. Once again, it’s all focused around building a better, different experience for a consumer, so machine learning, AI and chatbots are in fact just new(ish), possibly more effective ways to achieve the same goal: leveraging data to improve customer experience and stay relevant in an increasingly competitive marketplace.

For more on challenges driving digital transformations, download the eBook “Hadoop Perspectives for 2017” which offers an in-depth look at the results of Syncsort’s annual Hadoop survey, including five trends to watch for in 2017.

Let’s block ads! (Why?)

Syncsort blog

Expert Interview Series: Robin Bloor Discusses the Future of Big Data and the Role of Hadoop, Spark, Kafka and Big Data Analytics

Robin Bloor Blog Expert Interview Series: Robin Bloor Discusses the Future of Big Data and the Role of Hadoop, Spark, Kafka and Big Data Analytics

Robin Bloor believes that Hadoop and big data analytics is still relatively immature, but active archive on Hadoop is a no-brainer. He believes that the most exciting things will happen at the hardware level, fundamentally changing the way we develop software and conduct analytics.

What is Hadoop, really? What are its advantages and disadvantages? Where is the future of big data and data analytics headed? Robin Bloor, chief analyst for The Bloor Group, discusses these issues and more with Syncsort’s Paige Roberts. The Bloor Group is an independent research and news media analysis firm that specializes in accessing and analyzing enterprise-level software solutions, technologies, services, and markets. The firm is based in Austin, Texas.

What do you think Hadoop is for?

Robin Bloor: You could say Hadoop is an operating system but it’s not like any operating system that has existed before. And even if it was, it’s a long way short of being mature. Hadoop is an operating environment for data.

What about some particular uses of Hadoop?

Bloor: Well active archive is almost a no-brainer compared to previous choices, especially if you’ve got something expensive like a mainframe or Teradata or a big Oracle warehouse. Using Hadoop for active archive, it’s almost like, don’t even bother working out what the other options are. Just do it.

Don’t even think about it. It’s that obvious. That’s one of the reasons Syncsort came up with the mainframe distributable format. If you can archive without even changing your data, and still get to it, then like you said, it’s so far ahead of every other option out there, it’s a no-brainer.

Right. The other applications for Hadoop center around analytics. The analytics applications on Hadoop are interesting because you’ve got this mix now of near real-time and visual analytics. And visual analytics are all about knowledge discovery of one sort or another.

What do you think about the data lake/data hub concept?

This again is an immature thing but the concept of the data lake really is the idea of having a common point for the governance of all data – and everything that is involved in governance. So it’s data life cycle, it’s provenance and lineage, it’s data cleansing, metadata capture, it’s all of those things.

What are the advantages and disadvantages?

Well, if you look at what we used to have in the data warehouse world, you have the set of OLTP [transactional] applications that are recording the activities of the business, and you pipe that data into a data warehouse in order to get feedback on the activities of the business. With the data lake, it now applies to all data. It’s gone beyond the OLTP systems. It applies to data outside your organization as well as within the organization. So it’s much broader in terms of its area of action. And it now deals in an awful lot of data that wasn’t formally stored before. It’s called unstructured data, but it’s basically data in any form, rather than just well-structured database data.

What do you think is the most expensive IT or data management activity?

I actually think it’s going to vary from context to context, so there is going to be more than one answer to this question, but in the general sense, I think the governing of data to get it into a properly usable form is the greatest area of expense. A usable form would be something we could run a business application against so BI, analytics, whatever you are going to put on top of that. Getting it to that state from the point of ingest, whatever that was and wherever it came from.

What do you think of Spark?

I think it’s immature. I think because it has acquired lots of momentum, it may become a fulcrum of how streaming is done, but then again it may not. Its relevance is in two parts: One is to deal with streaming data and the other is to actually bring down latencies for things that need to be parallelized. The old architecture depended on stored data, on SSD or on spinning disks. It is superseded by a memory-based capability.

What streaming data processing technology looks the most exciting to you?

It’s nearly all immature. The products that existed to deal with streaming in terms of CEP [Complex Event Processing] technology are now, in one way or another, being evolved into the Hadoop environment. The most exciting stuff is actually the proprietary stuff. Things like Striim. Striim is one of the most exciting things because it’s actually delivering applications off the data stream. That’s more interesting than a platform that can deal with streaming, because it hasn’t been proven out by applications.

There is interesting development with Flink. There is interesting development with Apex as well, in that platform area. Then there are some interesting streaming products, and Striim is one of those, but there are others.

What do you think of Kafka?

I think it’s going to be the fulcrum of an awfully lot of activity. There is a necessity to have a very distributable message management environment that replaces the old Enterprise Service Buses and Kafka is it. And it’s probably not going to be dethroned.

I agree with that. So, Spark is the darling of the big data world right now. What do you think the next big thing is going to be?

It depends on the area of application. One will be Kafka. It’s going to become more and more important, because it deals with the distribution problem. Therefore, the importance of Kafka will rise up above perhaps even the importance of Spark or whatever replaces Spark, if it gets replaced.

But then there’s the next generation stuff. The next generation stuff is the blockchain [database] and everything that goes with that. But that’s five years away from becoming really important. Most people don’t see the blockchain yet, but they will.

Tell me about the blockchain.

It’s the technology around bitcoin. The reason most people won’t look at it in the big data world right now is that it is actually transactional technology. The advantages of the blockchain that make it important are twofold: The first advantage is that its security is bulletproof. It’s unbreakable security. It’s a write-only system, which is actually the correct way to handle data. Data should never be updated. It should always be written. The other reason that the blockchain is interesting is because it distributes the responsibility for transactional integrity as wide as you want it, basically as wide as it could be. The blockchain goes everywhere.

Hmm.

No one else will answer that question that way. I can guarantee it.

I am pretty sure you’re right. I’ll have to learn more about blockchain. So, one more question. What are you currently researching that’s interesting?

I think the most interesting thing that’s happening right now, because I think it will change a lot of things – it’s difficult to see what the changes will all be yet – is the changes in hardware that are happening at the moment. And there are a lot of them. This is not just one thing. It’s many things. They’re going to change some of the fundamental parameters of the way we do things.

You’ve got this 3D memory stuff that’s coming out of Intel. We’re looking at various technologies that are actively using FPGA [Field Programmable Gate Array chips]. The marriage of GPU and CPU is coming. There’s the battle – I’ve no idea how it resolves – between the ARM chip and the Intel chip. The ARM has been winning, and if it gets too powerful then Intel will be dethroned. You’ve got system-on-a-chip (SOC), which means that fundamentally you will be designing a different kind of hardware environment at the chip level. You take all of that together, and you realize that all those things are independent. You’re looking at possibly the emergence of new hardware platforms.

We have a hardware platform right now that is fundamentally CPU memory store. You could get hardware platforms that are possibly far more versatile than that. Node consisting of CPU memory storage is a particular kind of component in an architecture. If you actually have chains of system-on-a-chip, for instance, you have environments where the ability to distribute data and distribute processing is way different than it is right now. That will all have a radical impact.

And if parallelism is important then you really have to work out how you’re going to use CPU, which is fundamentally serial processing, GPU, which is fundamentally parallel processing, and FPGA which is fundamentally logic on the iron. All of those, how they balance, I have no idea. But those are the most radical things that are happening, because it will change everything above them.

Chips are the foundation of everything in data processing. If they change, everything changes. A few years back, I saw rumors that they were going to replace all the chips in data centers with ARM chips eventually. From a green point of view, that would be wonderful because of the lower energy requirements. But from a software point of view, that would be hugely disruptive. Every bit of software that has ever been written is going to be wrong, and have to be redone.

It might not be wrong actually. Linux runs on the ARM chips, and everything runs on Linux. So you could put it all on there.

But you have an actual different balance of things. An ARM chip has a different processing capability from an X86 chip. So, you’re going to have different arrangements. Yes, you can port the software, but the software behavior, the latencies and things like that, will have different patterns. It’s not going to be that easy.

And the more tightly coupled the software is to the chip behavior, the more it expects X86 style processing, the more it will be particularly off. Anything that uses vectorization to match chip cache for instance, is not going to work.

Yes, but the work that’s going on in the area of Software Defined Networks ultimately resolves the problem, when you get sophisticated enough, because it knows what resources are there and deploys on the resources according to what the workload is.

Sort of like a giant optimizer?

Well, that’s a very simple statement of something that’s not very simple to do. But as that gets more sophisticated, then the problem of “hard” and “soft” changes. “Where the rubber meets the road,” is how I think of it. Once you have that, you can have the hardware layer be as clever as you want, and have Software Defined Networking work out how to use it.

So that’s how I think it will happen.

We’ll see in a few years if you’re right.

There’s always the possibility that that’s completely wrong and something else will happen.

[laughter] That’s the trouble with looking into a technological crystal ball.

Editor’s Note: In addition to Robin’s excitement about the possibility of new hardware platforms, you can hear him talk about the cool new frameworks and data sources out there, and the promise they hold, including Spark and MapReduce, Kafka and NiFi, and how organizations can maximize potential of their legacy systems as well as their data lake by fusing legacy and modern architectures in a Bloor, “Hot Technologies of 2016 webcast.

Let’s block ads! (Why?)

Syncsort blog

Expert Interview: Part 3 — Dr. Ellen Friedman Discusses Streaming Data Architectures and What to Look for in a Good Streaming Message Layer

Big Data 7 15 16 Expert Interview: Part 3 — Dr. Ellen Friedman Discusses Streaming Data Architectures and What to Look for in a Good Streaming Message Layer

With big data, not only can a shipping company keep track of the data for one ship, but the port authorities can collect and track metrics from all the ships in all the ports, all over the world.

In this Syncsort Expert Interview, Syncsort’s Paige Roberts speaks with scientist, writer, and author of numerous books on big data, Ellen Friedman. The two discuss how Hadoop fits in the industry, what other tools work well for big data streaming and batch processing, and about Friedman’s latest book.

Tell me about your book.

Well, I just did! [laughs] The content I was talking about [in Part 1 and Part 2 of this interview] is kind of the heart of the book. But, there is more. The book is called “Streaming Architecture: New Designs Using Apache Kafka and MapR Streams.” It’s a book that should work well both for people who are actually the ones technically building these systems, and for people who are not. That’s the approach we take with all six of our books.

High level for the business person, and then drill down into the code for the technical person?

Right. It helps the very technical implementer because it gives them a chance to think about the basics behind what they’re doing. They don’t always have the time to do that.

We talk about why people use streaming and give a number of use cases. We talk about the stream-based architecture that I just described to you and why the messaging system is very important and how it can be used differently.

The third chapter is all about micro services … what the concept of micro services is, why that’s useful, why organizations that move to that style have seen a lot of success with it. You don’t have to do streaming, obviously, to set up micro services. Stream is a new way to start micro services, and I think sometimes people are surprised to realize it does support streaming. We explain how.

The fourth chapter is called Apache Kafka, and we explain the power of Kafka, how it works, templates, some sample programs … Chapter five turns around and does the same thing with MapR Streams. Then we have a couple of chapters that just take specific use cases. One is an anomaly detection case. The book shows how to build it using Stream system architecture, and why that could be an advantage to you.

The last use case … [laughing] I’m laughing because one of our figures has a little joke built into it, but it’s using the example of IoT (internet of things) data, looking at container shipping, just a mind-boggling scale of data to transport …

Ted Dunning used that as an example in his talk. (At Strata + Hadoop World 2016)

Well, I was at Strata Singapore in December. I was on the 22nd floor of some building meeting with a customer but I was distracted and looked out the window, and I could see the container port.  A huge percentage of the world’s container shipping goes through there. I’ve written about it before, but I’ve never been there before. Staring out the window there looking at the scale, the sheer amount of ships … it’s like your brain melts. It’s just stunning. When you think that all those containers can be just covered with sensors that are reporting back. There’s sensors on the ship. You can have an onboard cluster. You can stream data to that cluster. It can then stream data to a cluster at the port, which is maybe owned by the shipping company, so they’re tracking what’s happening with their stuff. They can send that data around the world.

Like to the port authority.

…who is interested not just in that one company at that one port. The port authority is interested in what’s happening in all the ports. That’s where the geo-distributed feature of MapR Streams comes in. Then the ship leaves, loads up its stuff, and chugs off to the next port. While it’s at sea, it’s collecting data about what’s happened on its on-board cluster. I’m not saying everyone’s doing this right now. I’m saying it’s the potential of what we see happening. Meanwhile, that shipping cluster the company has in Tokyo can be, with the MapR Stream replication, sending that data to Singapore before the ship ever gets there. So, now Singapore has an accurate record of what’s supposed to be coming in on the ship. The ship comes in and says, “This is what’s happened while we were at sea. Let me update you about what’s happened”. It’s this beautifully synchronized thing.

Pretty amazing. We live in interesting times.

I think we do. I just find that to be a mind-boggling example, even more so because … I could see the scale, see all those ships and all those containers. I just thought, “Oh, my God. What a huge job” I tell people, “If you read the book, you have to look for that little Easter Egg of an example.”

At the end of the book, we talk about if you are interested in taking this style of approach, with Apache Kafka and MapR Streams, how do you migrate your system? It gives some pointers for how to do it. MapR has the rights to the book for something like 30 days, so they are giving it away, doing a book signing here as well. MapR has it available online for free download. I know there is a .PDF and I think they are also sending it as an e-book, which is a little easier reading. The others books published by O’Reilly are available as free PDFs at MapR.com, which includes the series called “Practical Machine Learning.” Two are set up as an active e-book.

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.
Recommended article from FiveFilters.org: Most Labour MPs in the UK Are Revolting.

Syncsort blog

Expert Interview: Part 2 — Dr. Ellen Friedman Discusses Streaming Data Architectures and What to Look for in a Good Streaming Message Layer

Ellen Friedman Part 2 pic Expert Interview: Part 2 — Dr. Ellen Friedman Discusses Streaming Data Architectures and What to Look for in a Good Streaming Message Layer

Spark Streaming actually isn’t real-time. It is very, very close to real-time. However, it still won’t be replacing Hadoop.

In this Syncsort Expert Interview, Syncsort’s Paige Roberts speaks with scientist, writer, and author of numerous books on big data, Ellen Friedman. The two discuss how Hadoop fits in the industry, what other tools work well for big data streaming and batch processing, and about Friedman’s latest book in the “Practical Machine Learning” series, called “Streaming Data Architecture: New Designs Using Apache Kafka and MapR Streams”.

In Part 1, Ellen discussed what people are using Hadoop for, and where Spark and Drill fit in.  In Part 2, she talks about Streaming Data — what she finds the most exciting about technologies and strategies for steaming, including cool things happening in the streaming data processor space, streaming architecture, metadata management for streaming data and streaming messaging systems.

Let’s talk about streaming data … What looks most exciting to you right now, as far as technologies and strategies?

I’m so excited about this topic, I really am.

People start off by saying, “I need real-time insight, I need an analytic tool, I need the right algorithm, and a processor that can handle streaming data.”

I think one of the tools that comes to mind first is Spark Streaming. It’s so popular, and it is a very good tool for working in memory. People say that they use it for real-time. It actually can’t do real-time… It approaches it by micro batching, which is very clever. And over people’s window of what’s required, it is often adequate.

People were going in before and adding Apache Storm, the pioneer in real-time processing. They’ve done a beautiful job. People do say, though, Storm is a little bit hard, a little crankier to use, but it’s worth the effort.

Right now, though, I’m so excited about a new project, a top-level Apache project called Apache Flink. Just as Spark started out of Berkeley, Flink started out of several universities in Europe, a university in Berlin, and in Stockholm. It’s a Ph.D. project, and was called Stratosphere. Then, when it came into the Apache Foundation, it became Flink. Flink does the same sorts of things as Spark Streaming, or Storm. The difference between Spark Streaming and Flink is that Flink is real-time. It isn’t approximately real-time, it is real-time.

They took Spark Streaming’s approach and turned it around. Instead of, “Go from batch and approximate real time by micro-batching,” they say, “Let’s do it the other way around. Let’s do good, clean street-accessible real-time streaming,” and you can have it go back toward batch by changing the time window. That’s a more efficient way to do it.

People say Flink is very developer friendly. It’s a tremendously growing community. It gives people another alternative, but one they can use for both batch and streaming processes. I think it’s great that people now have a selection of tools to look at.

For real-time insight with very low latency, those are great processing tools. There are others, Apache Apex, for instance. There’s a lot of tools. That’s not the only one, or two or three. Look at what fits your situation and your previous expertise the best.

There are some cool things happening in the streaming data processor space.

Definitely. But, let’s move upstream from that. If people are good and clever and safe, they realize that to deliver streaming data to a process like that, you don’t want to just throw it in there. You want some kind of a clue if there’s a rupture in the process. You don’t want to lose your data. So, you begin to look at the whole range of tools you can use to deliver data.

You can use Flume, or others, but the tool that we think is so powerful is Apache Kafka. It works differently than the others. And now, I’m additionally excited because MapR has developed a streaming app called MapR Streams, a messaging system feature that’s integrated into the whole MapR platform. It uses the Apache Kafka .9 API. They work very similarly. There are a few things you can do with Streams that Kafka wouldn’t be able to do for you, and the fact that it’s integrated into the whole platform. As I said, it simplifies things. But at the heart of it, they are approaching this the same way. I think MapR Streams and Apache Kafka are both excellent tools for this messaging.

But I want to talk to you about something more fundamental than the technologies, and that’s really what our book “Streaming Architectures” is about.

The architecture for streaming.

Exactly. Instead of just talking about tools, what I do in the book is to talk about what are the capabilities you want to look for in that messaging layer to support the kind of architecture that we think makes the best use of streaming data. Because right now, those two tools, Apache Kafka and MapR Streams, are absolutely the tools of choice. But people constantly develop new tools. So, it’s not about this tool or that tool. It’s about what does a tool need to do for you? Do you understand it’s capabilities and why they’re an advantage? If so, you’ll recognize other good new tools as they get developed.

So, what do you feel are the capabilities to look for in a good streaming messaging system?

I think the big idea is that it’s not just about using streaming data for a single workflow, a single data flow, toward that real-time insight. The value of the messaging layer technology and the value of that right architecture goes way beyond that. It’s much broader.

Kafka and MapR Streams are very scalable, very high throughput, without the performance being a tradeoff against latency. Usually, if you can do one, you can’t do the other. Well, Kafka and Streams both do them both very well. The level at which they perform is a little different, but they’re both off in a class almost by themselves. They also have to be reliable, obviously, but they’re both excellent at that.

Another feature to look for is that they need to be able to support multiple producers of data in that same stream or topic, and multiple consumers or consumer groups. They both can be partitioned, so that helps with load balancing and so forth.

The consumers subscribe to topics and so the data shows up and is available for immediate use, but it’s decoupled from the consumer. So these messaging systems provide the data for immediate use, yet the data is durable. It’s persistent. You don’t have to have the consumer running constantly. They don’t have to be coordinated. The consumer may be there and use the data immediately; the consumer may start up later; you may add a new consumer.

That decoupling is incredibly important.

It makes that message stream be re-playable. What that does for you is make that stream become a way to support micro services, which is hugely powerful. Both Kafka and MapR Streams have that feature.

Back to the idea of flexibility that we discussed earlier. These message systems work for batch processes as well as streaming processes. It’s no longer just a queue that’s upstream from a streaming application, it becomes the heart of a whole architecture where you put your organization together.

You can have multiple consumers coming in and saying, “Oh, you were streaming that data toward the streaming application because you needed to make a real-time dashboard, blah blah blah. But, look. The data in that event stream, is something I want to analyze differently for this database or for this search document.” You just tap into that data stream and use the data, and it doesn’t interfere with what’s going on over there. It’s absolutely a different way of doing things. It simplifies life. Both Kafka and MapR Streams support all of those features and architecture. And I think this is a shift that people are just beginning to relate to.

Shifting to a new way of thinking and building can be difficult.

One of the nice things about this decoupling and the flexibility of using a good messaging system is that it makes the transition easier, as well. You can start it in parallel and then take the original offline. A transition is never easy to do, but it makes it much less painful than it could be.

MapR has one aspect that’s different. It’s a very new feature. It goes one step further, and actually collects together a lot of topics that go up to thousands, hundreds of thousands, millions of topics into a structure feature that MapR calls the Stream. There isn’t an equivalent in Kafka. The Stream is a beautiful thing. It’s a management level piece of technology. You don’t need it for everything. But, if you have a lot of topics, this is a really great thing.

Kind of a metadata management for streaming data?

Well … for the topics that you want to manage in a similar way, you can set up multiple streams. There may be one topic in a stream, there may be 100,000 topics collected into that stream. But for all the ones that you want to manage together, you can set various policies at the Stream level. That makes it really convenient. You can set policies on access, for instance at the Stream level. You can set time-to-live.

People should get it out of their head that if you’re streaming data it’s going to be a “use it or lose it” kind of situation, that the data is gone because you don’t have room to store it. It doesn’t mean you have to store all your streaming data, it just means that if you want to, you have the option. You have a configurable time-to-live. The time-to-live can be…

Seven seconds or seven days …

Or, if you want to just keep it, you basically set that date to infinity and you’ve persisted everything. You have the option to do what you want, and you can go back and change it, too. You can set those policies. With MapR Streams, you don’t have to go in and set it

To 100,000 different topics.

Right. You can do it collectively. Say, that whole project, we want to keep for one year, and then we’re done. Or, that whole project, we want to keep it for one day, and we’re done. You can set access rights as well at the project level. You can set who in your organization has access to what data.

This group can access this project, this group can access that project, but they can’t access this other data.

That’s right. MapR Streams gives you that additional advantage.

And here’s another, different way of thinking about streaming architecture. MapR Streams has a really efficient geo-distributed replication. Say you’re trying to push streaming data out to data centers in multiple places around the world, you want that to happen right away, and you want to do it efficiently. You just replicate the stream to the other data center. It’s a very powerful capability, and

That is organized at the stream level, as well, so again, you might say, “These topics, I want the same time-to-live or I want the same access, but these I want to replicate to five data centers, and these ones I don’t. So, I’ll make two different streams.”

It’s a good management system. These are elegant additional features, but I think at the heart of it, even if you don’t have that capability, you still have the capability to bring a stream-first architecture to most systems. Then, streaming isn’t the specialized thing, it becomes the norm.

You pull data away from that into various kinds of streams, and decide, “I’m going to do a batch process here, a virtualized database there, and I’m going to do this thing in real time.”

Right now, Kafka and MapR Streams are the two messaging technologies that we like, but it doesn’t mean they will be the only ones in the field. That’s why I think it’s important for people to look at what the capability is, rather than just looking at the tools. A tool may be the popular tool now, but there may be even better ones later.

Is there anything else people should keep in mind relating to streaming architectures?

In this whole industry, looking at how people use anything related to the whole Hadoop ecosystem, I think future-proofing is something you need to keep in mind. People are very new to this, in some cases, and one thing they can do by jumping in and using any of these technologies is build up expertise. They’re not even sure, in some cases, exactly how they want to use it. The sooner they begin to build expertise internally, the better position they’ll be in by the time they want to put something into production.

On Friday, in Part 3, Ellen will talk about her book, “Streaming Data Architecture: New Designs Using Apache Kafka and MapR Streams”.

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.
Recommended article from FiveFilters.org: Most Labour MPs in the UK Are Revolting.

Syncsort blog

Expert Interview Series: Part 1 — Dr. Ellen Friedman Discusses Increased Flexibility in Big Data Tools and Changing Business Cultures

In this Syncsort Expert Interview, Syncsort’s Paige Roberts speaks with scientist, writer, and author of numerous books on big data, Dr. Ellen Friedman. The two discuss how Hadoop fits in the industry, what other tools work well for big data streaming and batch processing, and about Friedman’s latest book in the “Practical Machine Learning” series, called “Streaming Data Architecture: New Designs Using Apache Kafka and MapR Streams“.

Ellen Friedman is a consultant and commentator, currently writing mainly about big data topics. She is a committer for the Apache Mahout project and a contributor to the Apache Drill project. With a PhD in Biochemistry, she has years of experience as a research scientist and has written about a variety of technical topics including molecular biology, nontraditional inheritance, and oceanography. Ellen is on Twitter at @Ellen_Friedman

Ellen Friedman Blog 7 12 16 Expert Interview Series: Part 1 — Dr. Ellen Friedman Discusses Increased Flexibility in Big Data Tools and Changing Business Cultures

Hadoop isn’t just one thing. There is Hadoop itself, with all its multiple parts, plus the whole ecosystem of Hadoop infrastructure and architecture.

In your experience, what are people using Hadoop for?

I wrote a book a little over a year ago with Ted Dunning called “Real World Hadoop”. For that book, I looked at how MapR customers are using Hadoop. We tried to talk about Hadoop so that it wasn’t MapR specific. MapR is a different kind of platform, so some of the workflows we showed were a little simpler than in standard Hadoop. But you could do the same work with other systems, you would just have to have more pieces. The goal was to understand why people are doing this, and not just what drives them to it, like marketing. What are their needs? In some cases, companies are just now starting to recognize their own needs and how Hadoop fits.

Why is that, do you think?

Hadoop sounds like a specialized thing, but Hadoop isn’t just one thing. There’s Hadoop itself, which has multiple parts. A larger issue is the whole Hadoop ecosystem, which includes the whole collection of tools people use.

In general terms, Hadoop allows companies to scale up to levels of data they might not have even considered using before. And, by allow, I mean that it makes it feasible; it’s practical, and it’s affordable. So, Hadoop is opening doors, not just to do the same things people have been doing, but now at lower cost. People are beginning to ask questions that have never been asked, applications they can do that they could never have approached, because you actually do need a threshold of data to be able to do that.

The other purpose, that I think is less obvious to people until they really start working with the system, is that the Hadoop style approach, including the MapR platform, opens the door to a new kind of flexibility. Now with the new ability to handle streaming data, that agility changes the way people can analyze and process data.

Part of what I try to do is to help people recognize that they have a different option in terms of flexibility. That means they can begin on the human culture side of their organization to rethink how they approach their problems. Because otherwise, they aren’t using the tool to its full advantage.

Can you give some examples of the kind of flexibility you mean?

One example is, as people begin to use a NoSQL database, like the MapR Database … it uses the Apache Hbase API, so it’s like a columnar data store. It actually has a second version, which uses a JSON document-style interface that works sort of like Mongo, so it really goes in two directions. But I think the basic principle is the same. … Those NoSQL databases (Apache, HBase, Mongo, and the others), you can put in more data, you can put in raw data, you can process data, you can do these early steps in the ETL, even if you’re going to use a traditional relational database later, you’re doing that early-stage processing, starting with huge data and going down to smaller data…

Aggregating and filtering.

Yes, and you can do it cheaper. It’s better to hand a relational database the good stuff, right? So you do your early processing on your Hadoop platform, in HBase or whatever. It gives you a different kind of flexibility.

Another flexibility example is that, unlike a traditional relational database, you don’t have to pre-decide every question you want to ask. You don’t have to know exactly how you’re going to use the data. Not to say you start with no clue, I don’t mean that at all. But let’s say I know I want to use it for this particular purpose, I’m going toward a BI relational database for example.

But you’re still storing that raw data in HBase or wherever. You didn’t have to use it and throw it away because you couldn’t afford to store it. Now you can ask, “What other questions, what other exploration do I want to do in that data?”

You don’t have to have a big committee to decide it. Because the expense of going back and asking a different question in Hadoop is different than saying I want to do an entire new line of analytics through a relational BI database. That’s a big commitment.

That type of flexibility means, people can defer decisions. They can save the data and then figure out what they want to do with it later. On the human side, they need to transform their thinking in order to take real advantage of that.

How do you feel about Spark?

As new tools come along like Apache Spark, people say, “Well, hasn’t Spark replaced Hadoop?” I say, “Well, some of the processing in Spark certainly is replacing many of the situations where people are using the computational framework in Hadoop, which is MapReduce, but it doesn’t mean it’s replacing all of Hadoop. It means it’s replacing just the piece that’s running in Spark.

Tell me about Drill. MapR has put a lot of emphasis on Drill, I’ve noticed.

Apache Drill is a very fast, very efficient SQL query engine that works on Hadoop, or without Hadoop.

What makes Drill different?

Drill works on some of these new styles of data, Parquet, JSON, it works on even nested data. And you don’t have to do a lot of pre-defining of schema. It discovers schema on the fly. The other thing that’s unusual about it, as opposed to things like Hive or Impala that are SQL-like, is that it’s standard, ANSI SQL. So it connects the traditional people and their traditional BI tools directly into this Hadoop-based, big data world in a much more seamless way.

But, one of the great things about Drill, back to this theme of flexibility, is that … people say, “How fast do its queries run? How do you compare it directly to one or the other choice?” Well, it’s very fast. It’s not always the fastest, it depends on the situation … But really, to understand if it’s a good tool for people, the question is not, “How fast did the race run once the starting gun went off and you ran that query,” that’s important. But how long did it take you to get to the starting line? Was that three weeks of preparation of your data to run the query? Well, with Drill, that three weeks may become an hour or two hours.

So, if I got to the race in two hours plus ran it in 20 seconds, versus three weeks to get there plus 15 seconds to run. Which was faster?

Not only do you get to the end of that race faster, but now you realize you can take insight from that first query and turn around in the moment and say, “Oh, now I see. I also have another question.” You loop back and you ask that second question.

So when people say, “How fast is it?” I say, “One question you should ask is, ‘how long does it take me to get to form the second query?’.” Because you can begin to build insight on insight at the rate people sit down and talk, like a conversation.

They look at a result, have a cup of coffee, chat with your colleague, go back in the same day and take it to the next step. That’s just not possible in other systems where it takes hours and hours or days or weeks to prepare the next query.

When it takes that long, your thinking is different. The expense of doing the next set of questions is different. So you say, “Do I really want to ask that question? Is it worth it?”

As opposed to following that train of thought and going, “Well what about this? What if we do that?

Exactly. So one thing that I try to help people see is with Apache Drill, it’s not just that initial convenience, which is huge. Also, you want to structure your teams, how they do their work, how they think about things, how they move the work forward, what their goals are, differently to take full advantage of it.

So, to go back to the beginning, the purpose of Hadoop is to let you have access to new and unstructured data, and let you have access to traditional data that you’ve been using before, but at larger scale, much less expensively, and, on the other side, let you start thinking in new ways of “save data, go back to it later”.

You don’t always know what is going to be the important question about that data. You see that in research all the time. When people do a study, they save what they think is important at the time. You go back and say, “If only we had asked people in the case study this question.” And you can’t help that.

I think Hadoop lets people start to use data in a much more fundamental and exciting way. But it’s not Hadoop versus traditional ways … it’s how you connect those together that optimizes the system.

Tomorrow, in Part 2, Ellen talks about Streaming Data — what she finds the most exciting about technologies and strategies for steaming, including cool things happening in the streaming data processor space, streaming architecture, metadata management for streaming data and streaming messaging systems.

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.
Recommended article from FiveFilters.org: Most Labour MPs in the UK Are Revolting.

Syncsort blog

New Citrix CEO discusses business-IT balancing act

TTlogo 379x201 New Citrix CEO discusses business IT balancing act

There’s no question Citrix lost its way in recent years, and the path back home has involved some unpleasantries.

Those included shedding technologies that don’t match Citrix’s core purpose, employee layoffs and replacing its longtime — and beloved — CEO, Mark Templeton.

The monthslong CEO replacement mission ended in January, with the hiring of Kirill Tatarinov, a former engineer and longtime Microsoft Business Solutions executive who led the Dynamics CRM and ERP team from 2007 to 2015. It’s no shock that the new Citrix CEO comes from partner Microsoft, but Tatarinov’s appointment came as a surprise to some industry insiders who didn’t recognize his name.

Tatarinov paid SearchContentManagement a visit to help us learn more about him and his mission, which is to drive home Citrix’s technology message about its place in the mobile workspace and business apps space, while balancing the needs of longtime users and downstream market changes — all under the watchful eye of vocal investors, who need Citrix to improve profitability.

Here’s an excerpt from our interview with the new Citrix CEO, which can also be heard in full in the podcast below:

Let’s start with you telling us how you came to be the new Citrix CEO.

Kirill Tatarinov: When I look at my career, and I’m celebrating 30 years in the industry, it’s split into two significant focus areas. I spent formative years of my career focused on the infrastructure technologies, which is close to what Citrix is doing. I am an engineer by training [and] the first product I wrote  was log analyzer for [IBM] System/360 … Then, I worked on multiple technologies and products that were for managing and delivering infrastructure  solutions in the service space — both pre-Microsoft and in Microsoft.

In 2007, I made a switch toward ‘biz apps;’ I was intrigued by the space. I always viewed [business] apps as something IT people and those who build IT systems needed to understand much more to see the whole business context. That was my second career at Microsoft, where I took over Dynamics ERP and CRM. It gave me a tremendous opportunity to understand the business side of enterprise, and really think more about how business works holistically and understand what IT needs to do to enable business to move faster, enable transformation and to say ‘yes’ more often to more things. 

That foundation, from the infrastructure years and the last eight years on the biz apps side, gives me a unique perspective … that I now have an opportunity to apply, to essentially bring it back to IT people, make IT people heroes again [and] enable them to do amazing things for the business.

What are some of the things you’d like to bring to Citrix in terms of business applications, which the company doesn’t offer now?

Tatarinov: Enabling our customers to be more connected to the business, to serve the needs of the end users, of the organization and the business processes. That’s really transformational …

We need to continue to push the envelope, to educate the IT constituency to be much more connected — to be business-savvy and business-oriented …

I’ve found the most innovative companies had CIOs who were deeply rooted in the business and could articulate the needs of the business. … Here at Citrix, we can empower IT people to be closely connected to the business, and, therefore, do more things for the business.

Being able to communicate IT needs to business people is critical for businesses and also for you, within Citrix. There’s pressure from Citrix shareholders to deliver quick returns on tech investments, whereas users want something different; they want something innovative and useful for their everyday needs. How will you balance what the shareholders need versus what your end users want?

Tatarinov: It is always a balance. There are always needs of multiple constituencies, and as a leader, you need to be thoughtful of the needs of your people, your customers, your investors, shareholders, partners and the ecosystem. And it’s an everyday balance I need to bring, consistently think of how that model moves, and how to address and delight every single one of those audiences.

I imagine when Citrix users see that you have an engineering background — you are one of them, at heart — they hope you’ll see things from their perspective, whereas someone who is a pure business person would care more about watching the numbers.

Tatarinov: Possibly — [but] I love the numbers.

Come on, Kirill. Make [your customers] a promise.

I am not just a technical guy; I am not just a business guy — though I love both. It’s the balance of the two that I bring to the table. Kirill TatarinovCEO, Citrix

Tatarinov: I’m proud to say during my Microsoft years, I was trained by the best. It is a balance. When people ask me, ‘What is your unique strength?’ — it is this balance. I am not just a technical guy; I am not just a business guy — though I love both. It’s the balance of the two that I bring to the table.

When I spoke with Mark Templeton last year, he was very excited about the possibilities of Octoblu with IoT [Internet of Things]. It allows you to do things like mix GoToMeeting with an iBeacon in a conference room, so when people arrive, the conference starts automatically. The success of that vision requires integration partners, customers and service providers to actually use Octoblu, along with tools like iBeacon. Is that happening now?

Tatarinov: IoT enables an amazing number of things … I haven’t looked at that scenario [yet]. [But IoT] is absolutely a mega trend we are studying and participating in across the board. Our NetScaler technology will play a critical role in enabling a broad range of sensors and devices to connect into the corporate network in a secure fashion.

The Citrix GoTo product line is being spun off into a separate company. What’s the status of that, and how will the spin-off help users who rely on the GoTo products?

Tatarinov: We are on track … we plan to spin it off into a separate company by the end of 2016. Both Citrix and the new company will benefit from focus.

The world is changing into an interconnected universe, where what in the past could only happen in conglomerates can now happen from independent companies working together. What we see happening at Citrix is part of that phenomenon. It will be easier for people to understand what Citrix stands for and what the company we [spin off] stands for… It will also be easier for independent business leaders to run those respective businesses to greater success.

What does Citrix stand for?

Tatarinov: Citrix stands for secure delivery of apps and data to people anywhere around the world, and enabling any device on any network.

Lastly, what has surprised you the most about being the CEO of a big technology company?

Tatarinov: There’s been tremendous learning over the last 60 days. And one of the things that really surprised me, impressed me and that I absolutely love is how much the industry — our customers — love Citrix. I have not seen that anywhere else I’ve worked … It’s a tribute to the past leadership — the culture that was established that enabled that level of trust and support … our customers, our partners and even industry analysts want us to succeed. That makes me wake up energized every morning to go to work to fulfill our promise to everyone we work with and work for.

Let’s block ads! (Why?)


ECM, collaboration and search news and features

Salesforce discusses impact of Wave, Lightning on Sales Cloud

SAN FRANCISCO — Cloud CRM platform vendor Salesforce has long touted its products as fully customizable and deeply integrated with other back office applications. At Dreamforce 2015, the company’s annual user conference, Sara Varni, Salesforce’s senior vice president of the Marketing and Sales Clouds, dictated a strategy of providing users with a central platform to manage all of their key business units. “Customers need to see how putting CRM at the center affects sales but also how it affects the rest of the organization,” Varni said.

Varni discussed Salesforce Lightning Experience, which aims to give sales reps the data they need faster and componentize the app-creation process, enabling sales teams to customize the elements of apps the way they see fit. She also talked about Salesforce Wave, the company’s native analytics capability to its Salesforce1 platform, and how it can potentially give sales managers insight into the sales pipeline and assist in team performance management. “It’s about giving [sales managers] insights and not having other departments be bottlenecks,” she said.

For more, check out the video above; and for more on Dreamforce 2015, check out SearchCRM.

View All Videos

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.


SearchCRM: News on CRM trends and technology