Tag Archives: Expert

Expert Interview (Part 2): Cloudera’s Mike Olson on the Gartner Hype Cycle

At the Cloudera Sessions event in Munich, Germany, Paige Roberts of Syncsort sat down with Mike Olson, Chief Strategy Officer of Cloudera. In the first part of the interview, Mike Olson went into what’s new at Cloudera, how machine learning is evolving, and the adoption of the Cloud in organizations.

In this part he talks about Gartner’s latest hype cycle, and where he sees things going.

Paige:   Did you see the latest Gartner’s hype cycle? They say that Hadoop will be obsolete before it reaches the plateau of productivity.

Mike:    Yes, and I’ll say that Gartner’s conclusions on Big Data just don’t match ours. We’ve got lots of serious customers doing really mission critical production workloads on our platform. I’m not sure who they’re talking to that’s leading to these conclusions. I will say that if you view the Big Data landscape as really just Hadoop, there’s all kinds of reasons to be skeptical, right?

Banner 6 Key Questions About Getting More from Your Mainframe with Big Data Technologies Expert Interview (Part 2): Cloudera’s Mike Olson on the Gartner Hype Cycle

Especially if you just look at it as MapReduce and HDFS.

That’s right and it’s perfectly fair to say those are awful alternatives to traditional relational databases. In fact, it’s legit to say there’s going to be a place for Oracle, SAP HANA, Teradata, Microsoft SQL Server, and DB2 Parallel Edition for the long term.  Scale out platforms are never going to be good at online transaction processing.

Distributed transactions have been hard forever and nothing about Hadoop makes it easier, but using tools like Impala to do high-performance analytic queries gives companies an alternative for certain parts of their traditional relational workloads on the scale-out platform. We’re not just bullish, we’ve been quite successful in delivering those capabilities to the enterprise.

The Gartner hype cycle, if you look at the terminology, there’s the peak of inflated expectations and then the trough of disillusionment, and then the plateau of productivity. And maybe Gartner’s current down outlook is because right now, we’re in the trough, and it’s the plateau of productivity broadly across the industry we have to get to. We’ve said publicly that we’ve got more than a thousand customers, more than 600 in the Global 8,000 running this platform in production for a bunch of very demanding workloads.

I have to wonder if they’re looking at the Hadoop of 10 years ago, as opposed to now. It used to be you had just MapReduce and HDFS which was really limited, but now it’s 25 different projects including Spark, and all these other capabilities, and that’s a completely different kind of distribution.

Frankly I think that if you look at Hadoop as just Hadoop, then there’s a bunch of stuff it doesn’t do. But the ecosystem has evolved way beyond that.

Yeah, it’s growing all the time. Actually I do, a “What is Hadoop” presentation and it starts with a slide that gives the basics of Hadoop 1.x. “Here’s this cool thing, and let me explain it to you.” Then it shows the ecosystem progressing in slide after slide, now it grew, and it grew, and it grew and grew some more.

HDFS and MapReduce are always going to be part of our platform. They’re really important. But, you can now spin a cluster infrastructure running on Microsoft Azure using ADLS object store and Spark running on top of that and there’s no HDFS or MapReduce anywhere near that thing.

It is no longer the be-all and end-all of Hadoop.

It’s a much more expansive and capable ecosystem than it used to be.

Tune in for the final installment of this interview, where Mike Olson shares his view on women in tech and explains the difference between Cloudera Altus and Director.

Make sure to check out our latest eBook, 6 Key Questions About Getting More from Your Mainframe with Big Data Technologies, and learn what you need to ask yourself to get around the challenges, and reap the promised benefits of your next Big Data project.

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 1): Mike Olson on Cloudera, Machine Learning, and the Adoption of the Cloud

At the Cloudera Sessions event in Munich, Germany, Paige Roberts of Syncsort sat down with Mike Olson, Chief Strategy Officer of Cloudera. In this first of a three-part interview, Mike Olson goes into what’s new at Cloudera, how machine learning is evolving, and the adoption of the Cloud in organizations.

Paige:   So, first off, go ahead and introduce yourself.

Mike:    I’m Mike Olson from Cloudera. I’m a co-founder and am chief strategy officer of the company and I’m excited to speak with you, Paige.

I’m excited to speak with you too. So, what is going on at Cloudera right now that’s really exciting?

Well, I spent some time in the sessions here in Munich today talking about what we’re seeing in the adoption of machine learning and some of these advanced analytic techniques. That’s really exciting, like the use cases that are getting built, using these new analytic techniques…it’s pretty awesome. I mean in healthcare, diagnosing disease better than ever before, delivering better treatments. Even intervening in real time when patients need special care and we can detect that because they’re Internet connected, they’re wearing connected devices. So, lots of cool use cases.

blog banner 2018 Big Data Trends eBook Expert Interview (Part 1): Mike Olson on Cloudera, Machine Learning, and the Adoption of the Cloud

That is pretty cool.

That’s driven by some investments that we’ve made over the last couple years. So, we bought a company in San Francisco called Sense.IO. That technology and that team basically turned into the Cloudera Data Science Workbench. I was really excited by that

I just saw a presentation earlier by somebody who said they were using it.

We think it makes developing those apps that much easier. About a month ago we concluded our acquisition of a really interesting Brooklyn-based machine learning research firm, Fast Forward Labs.

Hillary Mason! Yeah.

She’s been awesome for a long time, and now she’s running the research function at Cloudera, continuing to track the sort of cutting edge, what’s going to happen in ML (machine learning) and AI (artificial intelligence), applied to real enterprise workloads. So, we know more, we’ve got a much better informed opinion about that stuff now. I’m really excited about what that means for us.

That’s cool! So, is there anything that out on the landscape that you see coming that’s got you worried, or got you excited, or got you wondering?

If I were to highlight just a couple things, I wouldn’t say worried but respectfully attentive, large enterprises were definitely afraid of the Cloud before.

Yes, they were.

And now they’re clearly beginning to embrace the cloud. They’re trying to decide how to integrate their business practices into these new security regimes that the Cloud provides, and I absolutely believe the Cloud is at least as secure as your own data center, but you need to be sure that you’re using it properly, right?

Yeah.

But the shift from a traditional on-premises IT mindset to a cloudy one is confusing and disruptive to a lot of large enterprises, and we’re spending time with our clients, helping them think about …

Getting over the hump.  

Yeah, that’s right. And I don’t know that I would say I’m worried about it. I think it’s a big opportunity. People can do stuff in the Cloud because it’s easy to spin up a bunch of infrastructure and do some work and then spin it back down and, you know you can never do that on-premises, right?

Expert Interview Mike Olson Part 1 Quote Expert Interview (Part 1): Mike Olson on Cloudera, Machine Learning, and the Adoption of the Cloud

No, you can’t. You can’t commit to having a thousand-node cluster that you only need for two days. [Laughs]

No, that’s right. Who’s going to call their hardware vendor and say, I need three racks for a week, right? But you can do that on Amazon, on Azure, or on Google. So, helping them over that stumbling block is taking some time from us.

Yeah, I can understand that.

I told you all these reasons that I’m really excited about machine learning. If I were to highlight a modest concern I’ve got, it’s that ML is pretty hype-y, and maybe we’re contributing to that a little bit. We’re very bullish about it. I will say, we’ve got hundreds of customers actually doing it in production. This is real stuff. But you hear these terms like artificial intelligence and cognitive computing… and honestly, what we’re doing is training models on large amounts of historical data to recognize anomalous behavior and new data, it’s way more pragmatic and practical than words like cognitive computing make it sound. So, I worry that we’ll, as an industry, overpromise and then disappoint. These computers aren’t thinking, right?

Yeah, people are thinking SkyNet, and they’re getting Siri or Alexa. [Chuckle]

Exactly. By the way, Siri and Alexa are totally awesome in what they do. But if you really have it down, speaker independent voice recognition and then some good integrated search technology. That really isn’t the matrix.

[Laughs] No. We’re a little ways from bots taking over the world.

Indeed, indeed.

Make sure to check out part 2 of this interview, where Mike Olson goes into the Gartner hype cycle and what’s on the horizon.

Also, learn about 5 key Big Data trends in the coming year by checking out our report, 2018 Big Data Trends: Liberate, Integrate & Trust

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 4): Katharine Jarmul on Anonymization and Introducing Randomness to Test Data Sets

At the recent Cloudera Sessions event in Munich, Germany, Paige Roberts, Syncsort’s Big Data Product Marketing Manager, had a chat with Katharine Jarmul, founder of KJamistan data science consultancy, and author of Data Wrangling with Python from O’Reilly. She had just given an excellent presentation on the implications of GDPR for the European data science community. For this final installment, we’ll discuss some of the work Katharine is doing in the areas of anonymization so that data can be repurposed without violating privacy, and creating artificial data sets that have the kind of random noise that makes real data sets so problematic.

In the first part of the interview, we talked about the importance of being able to explain your machine learning models – not just to comply with regulations like GDPR, but also to make the models more useful.

In part 2, Katharine Jarmul went beyond the basic requirements of GDPR again, to discuss some of the important ethical drivers behind studying the data fed to machine learning models. Biased data sets can make a huge impact in a world increasingly driven by machine learning.

In part 3, we talked about being a woman in a highly technical field, the challenges of creating an inclusive company culture, and how bias doesn’t only exist in machine learning data sets.

Roberts: We’ve hit on several subjects here. What else are you working on?

Katharine Jarmul: I’ve been doing a lot more research on things like fuzzing data, test data, and how that relates to anonymization. I’ll be doing a series on that, but there are also some other cool libraries and things I can point to about that. As data scientists, we spend so much time cleaning our data, but how do we mess up our data? Not only to test our own workflow, and determine if it’s working properly, but also to perhaps do things like release it to third parties, or to other people, and have it be anonymized.

Yeah, there’s the example of the Netflix prize, the guy de-anonymized that data. And Netflix was like, “Oh, oops.”

Oops [laughing].

That was supposed to be anonymous data. We thought it was anonymous data.

Yeah, I’m also on a big kick to find out how we can create synthetic data that really looks like our data that has…

You can test with it.

Exactly.

I worked doing healthcare data integration for a long time. We were doing EDI to COBOL which is a big jump in translation. All the pipelines we built were tested with fake data sets. I talked to the guys in charge of the team, told them that the minute we put real data through this system, it’s going to crash and burn. I don’t care how many of these EDI transactions you build with Marcus Welby, MD, and Barney and Betty Rubble, it’s not going to break the system like real data. Real data is always messier than we expect.

Yeah, and I think that if we find ways to be able to test with some of that noise, maybe we can even choose exactly what types of noise, or what types of randomness we want to pursue, then we can make sure that our validation is working properly. And if we don’t have validation, we should probably set that up [laughs].

We probably need some of that, yeah. Maybe. Just a thought.[laughing]

What could go wrong, right? [laughing]

Well, thank you for talking to me.

Thanks so much, Paige. That was fun.

Yeah. I always enjoy these interviews. I always learn something new.

Read our new eBook, Keep Your Data Lake Pristine with Big Data Quality Tools, for a look at how the proper software can help align people, process, and technology to ensure trusted, high-quality Big Data.

More on GDPR:

If you want to learn more about GDPR compliance and how Syncsort can help, be sure to view the webcast recording of Michael Urbonas, Director of Data Quality Product Marketing, on Data Quality-Driven GDPR: Compliance with Confidence.

On a related subject, be sure to read the post by Keith Kohl, Syncsort’s VP of Product Management: Data Quality and GDPR Are Top of Mind at Collibra Data Citizens ’17

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 3): Katharine Jarmul on Women in Tech and the Impact of Biased Data in Both Human & Machine Learning Models

At the recent Cloudera Sessions event in Munich, Germany, Paige Roberts, Syncsort’s Big Data Product Marketing Manager, had a chat with Katharine Jarmul, founder of KJamistan data science consultancy, and author of Data Wrangling with Python from O’Reilly. She had just given an excellent presentation on the implications of GDPR for the European data science community. Part 3 dives into the position of being one of the women in tech, the challenges of creating an inclusive company culture, and how bias doesn’t only exist in machine learning data sets.

In the first part of the interview, we talked about the importance of being able to explain your machine learning models – not just to comply with regulations like GDPR, but also to make the models more useful.

In part 2, Katharine Jarmul went beyond the basic requirements of GDPR, to discuss some of the important ethical drivers behind studying the data fed to machine learning models. A biased data set can make a huge impact in a world increasingly driven by machine learning.

Paige Roberts:I know, I’m probably a little obsessive about it, but one of the things I do is look around at every event, and calculate the percentage of women to men. And I must say, the percentage at this event is a little low on women.

Katharine Jarmul: Yeah.

So, do you find yourself in that situation a lot? Do you get that, “I’m the only woman in the room” feeling?

I would say that one of the biggest problems that I see in terms of women in technology is not that there’s not a lot of amazing women interested in tech, and it’s difficult for a lot of really talented women in tech to get recognized and promoted.

blog banner 2018 Big Data Trends eBook Expert Interview (Part 3): Katharine Jarmul on Women in Tech and the Impact of Biased Data in Both Human & Machine Learning Models

It feels like women have to be twice as good, to be recognized as half as good.

Yeah. And I think we’re finding out now, there’s a lot of other minority groups as well, who find it difficult, such as women of color. Maybe you have to work four times as hard. We see this exponential thing, and when you’re at an event where it’s mainly executives, or people that have worked their way up for a while, then you just tend to see fewer women, and that’s really sad. I don’t see it as a pipeline problem. I know a lot of people talk about it as a pipeline problem, and yeah, okay, we could have a better pipeline.

Yeah, we need a few more women graduating, but that’s not the problem. The problem is they don’t get as far as they should once they graduate.

Exactly, and maybe eventually they leave because they are tired of not being promoted, having somebody else promoted over them, not getting the cool projects so they can shine.

And some of it is just cultural in tech companies. You get that exclusionary feeling. I had a conversation recently, somebody I was talking to… Oh, I was talking to Tobi Bosede. She’s a woman of color, and she’s a machine learning engineer who did a presentation at Strata. She said something along the lines of, the guys I work with say, “Let’s go play basketball after work.” And everybody on the team does. She’s thinking, “I don’t even like basketball. I don’t really want to go play basketball with the guys after work, but I still feel left out.”

Yeah, I get that. It’s difficult to make a good team culture that’s inclusive. I think you must really work for it. I know some great team leads who are doing things that help, but I think especially if say, you’re a white guy that didn’t grow up with a lot of diversity in your family or your neighborhood, it might be more difficult for you to learn how to create that culture. You must work for it. It’s not just going to happen.

It’s almost like a biased data set in your life. You don’t recognize bias in yourself, until you stop and think about it. It doesn’t just jump out and make itself known.

Of course.

Jarmul pt3 quote women in tech Expert Interview (Part 3): Katharine Jarmul on Women in Tech and the Impact of Biased Data in Both Human & Machine Learning Models

I did an interview with Neha Narkhede, she’s the CTO at Confluent, and she was talking about hiring bias. Even as a woman of color herself, when hiring, she catches herself doing it, and must stop and think, and deliberately avoid bias. It’s in your own head. You think, I should know better.

Yeah, yeah. And I think these unconscious biases are things that we have, as humans. We all have some affinity bias, right? So, if somebody is like me, I’m going to automatically think that they’re clearer. They think like me, so I can more easily see their point. That’s fine but, one of the things that helps teams grow is having arguments, …

Having different points of view, and accepting that, “Okay, this guy thinks completely different from me, but maybe he’s got a point.”

I find myself doing the thing where I think, “Why did they disagree with me? How could they?”

They’re wrong, obviously. [laughing]

[laughing] Especially when I notice that I’m doing it like that, I say, “Okay, I need to sit down and think through this. Is there perhaps a cardinal truth here? Or something that bothers me because it doesn’t necessarily fit into my world view? And should I, perhaps, poke at that a little bit, and figure it out?”

Stop and think, introspect.

Yeah [laughs].

That’s a good word. I like that.

We have our own mental models, and we need to question the bias in them, too.

Be sure to check out part 4 of this interview where we’ll discuss some of the work Ms. Jarmul is doing in the areas of anonymization so that data can be repurposed without violating privacy, and creating artificial data sets that have the kind of random noise that makes real data sets so problematic.

For a look at 5 key Big Data trends in the coming year, check out our report, 2018 Big Data Trends: Liberate, Integrate & Trust

Related Posts:

Neha Narkhede, CTO of Confluent, Shares Her Insights on Women in Big Data

Yolanda Davis, Sr Software Engineer at Hortonworks, on Women in Technology

Katharine Jarmul on If Ethics is Not None

Katharine Jarmul on PyData Amsterdam Keynote on Ethical Machine Learning

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 2): Katharine Jarmul on the Ethical Drivers for Data Set Introspection Beyond GDPR Compliance

At the recent Cloudera Sessions event in Munich, Germany, Paige Roberts of Syncsort had a chat with Katharine Jarmul, founder of KJamistan data science consultancy, and author of Data Wrangling with Python from O’Reilly. She had just given an excellent presentation on the implications of GDPR for the European data science community.

In the first part of the interview, we talked about the importance of being able to explain your machine learning models – not just to comply with regulations like GDPR, but also to make the models more useful.

In this part, Katharine Jarmul will go beyond the basic requirements of GDPR again, to discuss some of the important ethical drivers behind studying the data fed to machine learning models. A biased data set can make a huge impact in a world increasingly driven by machine learning.

blog banner webcast GDPR confidence Expert Interview (Part 2): Katharine Jarmul on the Ethical Drivers for Data Set Introspection Beyond GDPR Compliance

In part 3, we’ll talk about being a woman in a highly technical field, the challenges of creating an inclusive company culture, and how bias doesn’t only exist in machine learning data sets.

In the final installment, we’ll discuss some of the work Ms. Jarmul is doing in the areas of anonymization so that data can be repurposed without violating privacy, and creating artificial data sets that have the kind of random noise that makes real data sets challenging.

Paige Roberts:Okay, another interesting thing you were talking about in your presentation was the ethics involved in this area. If you’ve got that black box, you don’t know where your data came from, or maybe you didn’t really study it enough. You didn’t sit down and, how did you put it? Introspect. You didn’t really think about where that data came from, and how it can affect people’s lives.

Katharine Jarmul: Yeah, there’s been a lot of research coming out about this. Particularly when we have a sampling problem. For example, let’s say we have a bunch of customers, and only 5% are aliens. I will use that term, just as if they were Martians. We have these aliens that are using our product, and because we have the sampling bias, any statistical measurement we take of this 5% is not really going to make any sense, right? So, we need to recognize that our algorithm is probably not going to treat these folks fairly. Let’s think about how to combat that problem. There’s a lot of great mathematical ways to do so. There’s also ways that you can decide to choose a different sampling error, or choose to treat groups separately in your classification. There are a lot of ways to fight this, but first you must recognize that it’s a problem.

If you don’t recognize that there MIGHT be a problem, you don’t even look for it, so you never realize it’s there.

Exactly, and I think that’s key to a lot of these things that are coming out that are really embarrassing for some companies. It’s not that they’re bad companies, and it’s not that they’re using horrible algorithms, it’s that some of these things if you don’t think about all the potential ramifications in all the potential groups, it’s really easy to get to a point where you must say, “Oops, we didn’t know that,” and to have a big public apology you must issue.

Something like, I have made decisions on who gets a loan, or who gets a scholarship, or who gets promoted, or who gets hired, and it’s all based on biased data. I didn’t stop and think, “Oh, my dataset might be biased.” And now, my machine learning algorithm is propagating it. There were a lot of talks about that at Strata. Hillary Mason did a good one on that.

Oh excellent. Her work at Fast Forward Labs on interpretability is some of the best in terms of pushing the limit for how we apply interpretability, and therefore this accountability that comes with that, to “black box” models.

Because if you don’t know how your model works, you can’t tell when it’s biased.

Exactly. And, if you spend absolutely ALL of your time on “How can I get the precision higher?”, “How can I get the recall higher?” and you spend none of your time on, “Oh wait, what might happen if I give the model this data?” that the model might have not seen before, that is data from perhaps a person with a different color of skin, or a person with a different income level, or whatever it is- “How might the model react?” If you’re not thinking about those things at all, then they’ll really sneak up on you [laughs].

Be sure to check out part 3 of this interview, where we’ll discuss the challenges of women in tech, and the biases that exist, not just in our data sets, but also in our culture and our own minds.

If you want to learn more about GDPR compliance and how Syncsort can help, be sure to view the webcast recording of Michael Urbonas, Syncsort’s Director of Data Quality Product Marketing, on Data Quality-Driven GDPR: Compliance with Confidence.

Related posts:

Katharine Jarmul on If Ethics is Not None

Katharine Jarmul on PyData Amsterdam Keynote on Ethical Machine Learning

Keith Kohl, Syncsort’s VP of Product Management on Data Quality and GDPR Are Top of Mind at Collibra Data Citizens ’17

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 1): Katharine Jarmul on GDPR and the Need for Explainability in Machine Learning Models

At the recent Cloudera Sessions event in Munich, Germany, Paige Roberts of Syncsort had a chat with Katharine Jarmul, founder of KJamistan data science consultancy, and author of Data Wrangling with Python from O’Reilly. She had just given an excellent presentation on the implications of GDPR for the European data science community.

In this first part of a 4-part interview, Katharine Jarmul discusses the importance of being able to explain your machine learning models – not just to comply with regulations like GDPR, but also to make the models more useful.

Paige Roberts: Let’s get started by having you introduce yourself.

Katharine Jarmul: I’m Katharine Jarmul, and I’m a data scientist. I founded my own company called KJamistan. We are both an American company and a German company. I work with clients on solving data problems. If they need somebody who has a more small, agile approach, if they need a proof of concept, or particularly if they’re trying to figure out where to move forward with ethical, or interpretable, or testable models, those are the kinds of areas that I have started to specialize in, and that I care a lot about.

So: “Have smart data scientist brain. Will travel.”

Yeah, [laughing] to some degree, yeah.

I just saw your presentation, and a couple of things jumped out at me. One was that you did a really good job of explaining how GDPR affects the machine learning and AI community in Europe. Can you give a little bit of a summary, or a general idea of what you talked about?

GDPR changes a few of the ways that we have to inform users about automated processing of their data. And to do so, we want to take an inventory of how we’re doing that currently, and put some thought into it. For some people, this is already very well-documented. It’s put together well, things are tested and interpretable. They already have a lot of transparency. I know for other people, this is really scary.

blog banner webcast GDPR confidence Expert Interview (Part 1): Katharine Jarmul on GDPR and the Need for Explainability in Machine Learning Models

It’s a black box.

Yeah. And they haven’t thought a lot about how to make it repeatable, how to make it accountable. This is something that I think we should all be thinking about anyway, regardless of GDPR. What GDPR gives us is a motivation to get started on that. I would say we should go beyond it to create something really accountable, and reliable. Then, we can explain it to everyone within our companies. Then everybody knows: How does the decision process work? What type of data are we using?

Everyone gets it.

It’s silly to me when I talk with people, and realize that there are fiefdoms of data, and fiefdoms of automated processing. Nobody knows how the other departments are using things. I think that that’s really dangerous in a lot of ways.

It’s bad when companies don’t even know how their own decisions work.

Yeah, exactly [laughs]. How am I supposed to explain to a customer as, let’s say a sales representative, or as somebody in the customer service team, if it’s not clear and easy to be explained, then how can I even sell the product, or how can I explain the problem?

How can you justify decisions? Well, I made this decision. Why? A machine learning algorithm that I don’t understand working on some data that I don’t know anything about told me it was a good decision. That is not a good justification.

Exactly. And I think that some of the folks that are moving us forward in terms of that are people who work in medicine and financial industries. They have legal ramifications and/or life or death ramifications of trusting something they can’t explain. I think that they’re really making some good inroads into, how do we have really accurate advanced models that we can still explain.

We must understand how a conclusion or prediction was reached. Don’t just give the answer and not explain how we got there.

Really great things are happening in interpretability research. Quite a lot of the research is asking for area experts to give the ground truth. So, they’re trying to say, “Okay, what things can we basically take away from the model, and say that this should always be this way.”

What do we already know is always, basically true?

Exactly. And if we have those things that we already know are ground truths, and we then build a model on top of those, that model is usually way more performant. So, we need to talk with the area experts.

It’s going to be a lot better if you take advantage of that expertise. It’s a theme of this industry. People and machine learning working together are always going to be better than the people or the machine by themselves.

Exactly. When you read the history of AI, this is the original things that all the folks who are working in Cybernetics and so forth really wanted to say. How can we use the machine to make better decisions, rather than just listen to whatever decision that a machine had [laughs].

Sure, let’s just let the machine order us around blindly, yeah. That’s a good idea.

[laughing] Exactly, I mean [chuckles] we have a brain for a reason.

Be sure to check out part 2 of this interview, where Katharine Jarmul will go beyond the basic requirements of GDPR, to discuss some of the ethical drivers behind studying the data fed to machine learning models.

If you want to learn more about GDPR compliance and how Syncsort can help, be sure to view the webcast recording of Michael Urbonas, Director of Data Quality Product Marketing on Data Quality-Driven GDPR: Compliance with Confidence.

Related Posts:

Katharine Jarmul on Towards Interpretable Reliable Models

Keith Kohl, Syncsort’s VP of Product Management on Data Quality and GDPR Are Top of Mind at Collibra Data Citizens ’17

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 2): James Kobielus on Reasons for Data Scientist Insomnia including Neural Network Development Challenges

In the first half of our two-part conversation with Wikibon lead analyst James Kobielus (@jameskobielus), he discussed the incredible impact of machine learning in helping organizations make better business decisions and be more productive. In today’s Part 2, he addresses what aspects of machine learning should be keeping data scientists up at night. (Hint: neural networks)

Several Challenges Involved with Developing Neural Networks

Developing these algorithms is not without its challenges, Kobielus says.

The first major challenge is finding data.

Algorithms can’t do magic unless they’ve been “trained.” And in order to train them, the algorithms require fresh data. But acquiring this training data set is a big hurdle for developers.

For eCommerce sites, this is less of a problem – they have their own data in the form of transaction histories, site visits and customer information that can be used to train the model and determine how predictive it is.

blog banner 2018 Big Data Trends eBook Expert Interview (Part 2): James Kobielus on Reasons for Data Scientist Insomnia including Neural Network Development Challenges

But the process of amassing those training data sets when you don’t have data is trickier – developers have to rely upon commercial data sets that they’ve purchased or open source data sets.

After getting the training data, which might come from a dozen different sources, the next challenge is aggregating it so the data can be harmonized with a common set of variables. Another challenge is having the ability to cleanse data to make sure it’s free of contradictions and inconsistencies. All this takes time and resources in the form of databases, storage, processing and data engineers. This process is expensive but essential. (For more on this, read Uniting Data Quality and Data Integration)

Third, organizations need data scientists, who are expensive resources. They need to find enough people to manage the whole process – from building to training to evaluating to governing.

“Finding the right people with the right skills, recruiting the right people is absolutely essential,” Kobielus says.

Before jumping into machine learning, organizations should also make sure it makes sense for your business strategies.

Industries like finance and marketing have made a clear case for themselves in implementing Big Data. In the case of finance, it allows them to do high-level analysis to detect things like fraud. And in marketing, for instance, CMOs, found it useful to develop algorithms that allowed them  to conduct sentiment analysis on social media.

There are a lot of uses for it to be sure, Kobielus says, but there are methods for deriving insights from data that don’t involve neural networks. It’s up to the business to determine whether using neural networks is overkill for their purposes.

“It’s not the only way to skin these cats,” he says.

If you already have the tools in place, then it probably makes sense to keep using them. Or, if you find traditional tools can’t address needs like transcription or facial recognition, then it probably makes sense to go to a newer form of machine learning.

What Should Really Be Keeping Data Scientists Up at Night 

While those in the tech industry might be fretting over whether AI will displace the gainfully employed or that there’s a skills deficit in the field, Kobielus has other worries related to data science.

For one, the algorithms used for machine learning and AI are really complex and they drive so many decisions and processes in our lives.

“What if something goes wrong? What if a self-driving vehicle crashes? What if the algorithm does something nefarious in your bank account? How can society mitigate the risks,” Kobielus asks.

When there’s a negative outcome, the question asked is who’s responsible. The person who wrote the algorithm? The data engineer? The business analyst who defined the features?

These are the questions that should keep data scientists, businesses, and lawyers up at night. And the answers aren’t clear-cut.

In order to start answering some of these questions, there needs to be algorithmic transparency, so that there can be algorithmic accountability.

Ultimately, everyone is responsible for the outcome.

There’s a huge legal gray area when it comes to machine learning because the models used are probabilistic and you can’t predict every single execution path for a given probabilistic application built on ML.

blog kobielus quote3 Expert Interview (Part 2): James Kobielus on Reasons for Data Scientist Insomnia including Neural Network Development Challenges

“There’s a limit beyond which you can anticipate the particular action of a particular algorithm at a particular time,” Kobielus says.

For algorithmic accountability, there need to be audit trails. But an audit log for any given application has the potential to be larger than all the databases on Earth. Not just that, but how would you roll it up into a coherent narrative to hand to a jury?

“Algorithmic accountability should keep people up at night,” he says.

Just as he said concerns about automation are overblown, Kobielus says it’s also unnecessary to worry that there aren’t enough skilled data scientists working today.

Data science is getting easier.

Back in the 80s, developers had to know underlying protocols like HTTP, but today nobody needs to worry about the protocol plumbing anymore. It will be the same for machine learning, Kobielus says. Increasingly, the underlying data is being abstracted away by higher-level tools that are more user friendly.

“More and more, these things can be done by average knowledge workers, and it will be executed by underlying structure,” he says.

Does Kobielus worry about the job security of data scientists then? Not really. He believes data science automation tools will allow data scientists to do less with more and hopefully to allow them to develop their skills in more challenging and creative realms.

For 5 key trends to watch for in the next 12 months, check out our new report: 2018 Big Data Trends: Liberate, Integrate & Trust

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 1): Wikibon’s James Kobielus Discusses the Explosive Impact of Machine Learning

It’s hard to mention the topics of automation, artificial intelligence or machine learning without various parties speculating that technology will soon throw everybody out of their jobs. But James Kobielus (@jameskobielus) sees the whole mass unemployment scenario as overblown.

The Future of AI: Kobielus Sees Progress Over Fear

Sure, AI is automating a lot of knowledge-based and not-so-knowledge-based functions right now. It is causing dislocations in our work and in our world. But the way Kobielus looks at it, AI is not only automating human processes, it’s augmenting human capabilities.

“We make better decisions, we can be more productive … We’re empowering human beings to do far more with less time,” he says. “If fewer people are needed for things we took for granted, that trend is going to continue.”

It’s anybody’s guess how the world will look in the future, Kobielus says. But he doesn’t believe in the nightmare scenarios in which AI puts everyone out of a job. Why? Basic economics.

The industries that are deploying AI won’t have the ability to get customers if everyone is out of a job.

“There needs to be buying power in order to power any economy, otherwise the AI gravy train will stop,” he says.

blog kobielus quote2 Expert Interview (Part 1): Wikibon’s James Kobielus Discusses the Explosive Impact of Machine Learning

Kobielus is the lead analyst with Wikibon, which offers market research, webinars and consulting to clients looking for guidance on technology. His career in IT spans more than three decades and three-quarters of it has been in analyst roles for different firms. Before going to Wikibon, he spent five years at IBM as a data science evangelist in a thought leadership marketing position espousing all things Big Data and data science.

He talks regularly on issues surrounding Big Data, artificial intelligence, machine learning and deep learning.

How Machine Learning is Impacting Industry Today

Machine learning is a term that’s been around for a while now, Kobielus says. At its core, it’s simply using algorithms and analytics to find patterns in data that you wouldn’t have been able to find otherwise. Regression models and vector machines are examples of more established forms of machine learning. Today, newer crops of algorithms are lumped under what are called neural networks or recurrent neural networks.

“That’s what people think of as machine learning – it’s at the heart of industry now,” Kobielus says.

Brands are using these neural network tools for face and voice recognition, natural language processing and speech recognition.

Applied to text-based datasets, machine learning is often used to identify concepts and entities so that they can be distilled algorithmically to determine people’s intentions or sentiments.

blog banner 2018 Big Data Trends eBook Expert Interview (Part 1): Wikibon’s James Kobielus Discusses the Explosive Impact of Machine Learning

“More and more of what we see in the machine learning space is neural networks that are deeper,” Kobielus says. “[They’re] not just identifying a face, but identifying a specific face and identifying the mood and context of situation.”

They’re operating at much higher levels of sophistication.

And rather than just being used in a mainframe, more often these algorithms are embedded in chips that are being put into phones, smart cars and other “smart” technologies.

Consumers are using these technologies daily when they unlock their phones using facial recognition, ask questions to tools like Alexa or automatically tag their friends on Facebook photos.

More and more industries are embracing deep learning – machine learning that is able to process media objects like audio and video in real time, offering automated transcription, speech to text, facial recognition, for instance. Or, the ability to infer the intent of a user from their gesture or their words.

Beyond just translating or offering automated transcriptions, machine learning provides a real-time map of all the people and places being mentioned and shares how they relate to each other.

Looking at the internet of things market, anybody in the consumer space that wants to build a smart product is embedding deep learning capabilities right now.

Top Examples of Machine Learning: Self-Driving Cars and Translations

Kobielus points to self-driving vehicles as a prime example of how machine learning is being used.

“They would be nothing if it weren’t for machine learning – that’s their brains.”

Self-driving vehicles process a huge variety of input including images, sonar, proximity, and speed as well as the behavior of the people inside– inferring their intent, where they want to go, what alternative routes might be acceptable based on voice, gestures, their history of past travel and more.

Kobielus is also excited about advances in translation services made possible by machine learning.

“Amazon Translate, human translation between human languages in real-time, is becoming scary accurate, almost more accurate than human translation,” Kobielus says.

In the not-too-distant future, he predicts that people will be able to just wear an earpiece that will translate a foreign language in real-time so they will be able to understand what people are saying around them enough to at least get by, if not more.

“The perfect storm of technical advances are coming together to make it available to everybody at a low cost,” he says.

Learn more about the top Big Data trends for 2018 in Syncsort’s eBook based on their annual Big Data survey.

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 1): Gregory Piatetsky-Shapiro on Exciting and Worrisome Advances in Artificial Intelligence

In Part 1 of this two-part interview, Gregory Piatetsky-Shapiro of KDnuggets (@KDnuggets) discusses the how today’s advances in deep learning are cause for excitement and concern.

Tracking Big Data’s Evolution in Data Science, AI & Machine Learning

Machines have always fascinated Piatetsky-Shapiro – ever since he was a kid reading stories about robots by Isaac Asimov and other sci-fi authors.

He discovered his love for programming while studying computer science at Technion, where he spent a few weeks in the summer of his first year programming a computer (in APL) to play battleships. “I was soundly defeated by my own program,” he says. “That gave me an appreciation for the abilities of technology. I became more interested in creating programs than playing them.”

Piatetsky-Shapiro’s passion for understanding data and helping others stay up to date on developments in databases led him to launch the first Knowledge Discovery in Databases workshop in 1989, which later grew into full-fledged KDD conferences.

In 1993, after the third KDD workshop, he started KDnuggets News, an e-newsletter focused on data mining and knowledge discovery.  The first issue went to 50 researchers who attended the workshop. Today, the KDnuggets brand has more than 200,000 subscribers across email, Twitter, Facebook and LinkedIn. With over 500,000 visitors in October 2017, KDnuggets.com has become a go-to resource for data science and analytics news, software, jobs, courses, education and more.

blog banner 2018 Big Data Trends eBook Expert Interview (Part 1): Gregory Piatetsky Shapiro on Exciting and Worrisome Advances in Artificial Intelligence

Piatetsky-Shapiro is one of the leading voices in Big Data – a field he says is somewhat amorphous, encapsulating infrastructure and database management, and closely connected to data science, machine learning and artificial intelligence.

(Note: what is now called “data science” was earlier called “data mining” or “knowledge discovery” but it refers to the same field dedicated to analyzing and understanding data and extracting useful knowledge from it.) 

Exciting AI Advances with Deep Learning

“What is really most exciting now is deep learning,” he says.

While the concept of multi-level (deep learning) neural networks has been around since the 1960s, there wasn’t enough data, computer power or clever algorithms to use them effectively. But in the past few years, this approach– rebranded as “deep learning”– received sufficient data and processing powers and has been achieving amazing feats almost every week.

Examples of Deep Learning Breakthroughs

There are many examples of deep learning being deployed today.

Consumers who speak to their smartphone assistants like Siri or Cortana, or to Amazon Alexa or Google Home, are getting good results thanks to deep learning.

Google’s recent advances in machine translation are another big advance, thanks to deep learning.

It used to be that computers would do machine-based translation by using hand-crafted rules derived by thousands of linguistic experts. However, powered by large amounts of text and advanced Deep Learning network, in 2016 Google switched to Google Neural Machine Translation, which eliminates all manual rules and translates entire sentences at a time. This has significantly improved the quality of translations.

Finally, Piatetsky-Shapiro mentioned AlphaGo, a computer program developed by Google DeepMind to play the ancient Chinese game of Go. In 2016, AlphaGo, trained partly on thousands of human championship games, defeated world champion Lee Sedol 4:1.

In 2017, an improved version called AlphaGo Zero, combined Deep Learning and Reinforcement Learning methods and learned to play from scratch, entirely by self-play. After three days and a few million games, the new version reached the level of program that defeated Go world champion in 2016. After 40 days, AlphaGo Zero achieved superhuman level and defeated the previous version 100:0.

Today, it’s considered the strongest Go player in history.

“It’s very exciting and it’s also very scary,” Piatetsky-Shapiro says. As AlphaGo Zero improved its game play, it began choosing very different moves than human experts on a more frequent basis.

Be sure to continue to Part 2 of this interview, where Piatetsky-Shapiro discusses self-driving Artificial Intelligence and how businesses can approach it.

For a more Big Data insights, check out our report, 2018 Big Data Trends: Liberate, Integrate & Trust, to see what every business needs to know in the upcoming year, including 5 key trends to watch for in the next 12 months!

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog

Expert Interview (Part 2): Piatetsky-Shapiro on Self-Driving Artificial Intelligence and How Business Should Approach It

In the first half of our two-part conversation with data scientist and KDnggets founder Gregory Piatetsky-Shapiro, he provided examples of advances in deep learning continuing to push the field of AI full steam ahead. In today’s Part 2, Piatetsky-Shapiro notes some artificial intelligence concerns as it continues to advance and how business should approach incorporating AI.

Artificial Intelligence: Some Causes for Concern

The period of time when humans and computers collaborate to solve problems might not last very long. It’s not a matter of if, but when computers will be able to do jobs better than us. The question we should be asking now is What will humans do?

blog banner 2018 Big Data Trends eBook Expert Interview (Part 2): Piatetsky Shapiro on Self Driving Artificial Intelligence and How Business Should Approach It

In the short term, Piatetsky-Shapiro says he’s concerned about the use of technology to automate repetitive tasks previously done by humans. Even devices with limited intelligence will be able to complete jobs that are structured and require a lot of repetition. For example, toll booths on Mass Pike were removed and the job of collectors replaced by EZ-pass radio technology and taking photos of license plates and recognizing the numbers – a limited form of computer vision.

In regards to the developments in the field of Artificial General Intelligence (AGI) – machine learning that is able to perform the same intellectual tasks that humans can – Piatetsky-Shapiro tends to side with entrepreneur Elon Musk and physicist Stephen Hawking. It could put humanity at risk.

“I think we are not likely to have AGI in the next 10 years, but people, in general, have very poor track record of predicting long-term events.”

Even if the probability of AGI is small, its impact could be huge. A program like AlphaGo Zero demonstrates that computers can achieve super human ability in a relatively narrow field and that once they do it, they are probably using a different logic than we do, Piatetsky-Shapiro says.

“What if AGI values are not aligned with what we want to do? That’s a serious problem.”

While the AI Now Institute was founded this year at NYU to address the problem of incorporating values training in artificial intelligence, Piatetsky-Shapiro says he doesn’t think we should pretend there would be any guarantees about the way programs behave. Just like a parent can’t guarantee their children won’t rebel against the values they’re raised with, we shouldn’t assume machines would always follow the rules we put in place.

“If it is really intelligent, it will have its own values.”

blog gps quote 1 Expert Interview (Part 2): Piatetsky Shapiro on Self Driving Artificial Intelligence and How Business Should Approach It

How Businesses Should Approach Artificial Intelligence 

There are no best practices yet for companies wanting to incorporate AI and machine-learning into their business strategies today because the technology has only been viable for a few years. With that in mind, brands need to be aware of both the capabilities of these tools, but also the limitations.

He shared three guidelines to follow or be aware of when using artificial intelligence:

  1. In order to use these tools effectively, companies need large sets of data – at least 100,000 examples. The more recent the data and the more frequent the data, the more effective the predictions will be.
  2. Make sure there are people in the organization who understand the technology and know what will lead to development.
  3. Have realistic expectations. There’s a lot of randomness when it comes to predicting human behavior. If you build a model that gives you perfect predictions, chances are you probably have false predictors.

To better manage and leverage all the data they’re collecting, Piatetsky-Shapiro recommends enabling more interactive access.

“I think the approach of dumping everything together in one Data Lake and hoping you’ll discover something is probably not very useful,” he says.

Instead, have specific goals you want to answer and look at the data with the goals in mind. Look at what gives the best return on investment and what gives value. Many of the Big Data projects that create big data lakes are not able to show ROI.

“Start with business value and proceed from there,” he says.

Finally, invest in good quality data visualization. Humans are still the best at interpreting data, so the visuals should clearly present patterns that allow business stakeholders to make better decisions.

For a more Big Data insights, check out our report, 2018 Big Data Trends: Liberate, Integrate & Trust, to see what every business needs to know in the upcoming year, including 5 key trends to watch for in the next 12 months!

Let’s block ads! (Why?)

Syncsort + Trillium Software Blog