Data science handbook: 3 tips for becoming a data scientist

The impact of data science continues to reverberate through industry, government, and nonprofits. Employers are hunting data science talent, and a surplus of masters programs have sprung up to serve aspiring data acolytes.

But how do you actually enter the field?

To get a clearer picture of the state of data science, how employers and employees alike can take advantage of it, and how you can enter the field, we spoke with some of the field’s most prominent voices: DJ Patil, co-coiner of the term “data scientist”; Michelangelo D’Agostino, formerly of Obama 2012’s data team; and Clare Corthell, creator of The Open Source Data Science Masters.

We’ve distilled their insights into three main pieces of insight that we will share below.

(Their full interviews are available for free online as a part of the Data Science Handbook Pre-Release.)

1. Seek fast, collaborative environments

In all of the unicorn-chasing that goes on when companies try to hire the perfect data scientist, it’s easy to forget the importance of collaboration.

Ultimately, however, Patil said that it is a common mistake for both employers and data scientists to forget that data science is a collaborative endeavor:

People make a mistake by forgetting that Data Science is a team sport. People might point to people like me or (Jeff) Hammerbacher or Hilary (Mason) or Peter Norvig and they say, oh look at these people! It’s false, it’s totally false, there’s not one single data scientist that does it all on their own.

Data science is a team sport. Somebody has to bring the data together, somebody has to move it, someone needs to analyze it, someone needs to be there to bounce ideas around.

The authors have also published 120 Data Science Interview Questions
on VB Insight

Given this, it is far more important for employers to keep in mind that they are hiring to build a team of data scientists who can all work together, and not a cadre of lone wolves.

Aspiring data scientists should seek out such collaborative cultures to maximize their ability to learn, grow, and steep themselves in the culture of teamwork that makes for successful data science.

As Patil concluded:

“In academia, the first thing you do is sit at your desk and then close the door. There’s no door anywhere in Silicon Valley; you’re out on the open floor. These people are very much culture shocked when people tell them, ‘No, you must be working, collaborating, engaging, fighting, debating, rather than hiding behind the desk and the door.’”

2. Delve deeply into hard, dirty problems

The experience of working on difficult problems and the strategies that you use to approach them is one of the most valuable skills that D’Agostino picked up during his astrophysics PhD at UC Berkeley. To get the experience that will ultimately become relevant to data science, D’Agostino suggested:

“Work on a hard problem for a long time and figure out how to push through and not be frustrated when something doesn’t work, because things just don’t work most of the time. You just have to keep trying and keep having faith that you can get a project to work in the end. Even if you try many, many things that don’t work, you can find all the bugs, all the mistakes in your reasoning and logic and push through to a working solution in the end.”

For students, this means you should be always looking for applications of your classwork or research on real, live datasets. This gives you the wisdom of all the nuances when dealing with large, messy datasets and allows you to understand much more than just the theory of your textbook.

D’Agostino explained to us:

“You can read about it, and people can teach you techniques, but until you’ve actually dealt with a nasty data set that has a formatting issue or other problems, you don’t really appreciate what it’s like when you have to merge a bunch of data sets together or make a bunch of graphs to sanity check something and all of a sudden nothing makes sense in your distributions and you have to figure out what’s going on.”

For current practitioners, it means wrangling with the formidable mathematical and engineering challenges that naturally arise in data science. Instead of trusting the standard tools, open the mysterious black boxes of machine learning and figure out for yourself what is going on.

Summarizing this view, D’Agostino concluded:

“For aspiring data scientists, take your time to get your hands dirty by digging deeply into hard data, and mess around with demanding intellectual and engineering problems. The difficulty will stretch your mind and abilities, ultimately growing and improving you.”

3. Bootstrap yourself with projects to demonstrate mastery

There is no single path to becoming a data scientist. Although holding a PhD might be common among data science aspirants, Corthell’s path illustrates it is not the only way.

Corthell was designing for an early-stage startup when she realized that many design decisions could be augmented by data about user behavior. Eventually she left the startup, and she used her newfound free time to think about what she really wanted to do.

“On a long layover in Barcelona, I ordered an espresso and wrote down the technical skills I would need to dissect meta-trends and understand user data. That list laid out 6 months of full-time work, after which I’d really be able to do some damage. This became the Open Source Data Science Masters.”

She decided that she wanted to acquire data science skills to understand the meta-trends among users and designed a curriculum for herself to do so. Corthell bravely embarked on a six-month journey of self-education, and turned her lack of formal work experience as a data scientist into an opportunity to demonstrate her raw determination to become one. She even created visibility for herself by publishing the resources and courses she used as The Open Source Data Science Masters (OSDSM).

In constructing the OSDSM, Corthell sought out and synthesized a wide variety of publicly available online courses, tutorials, and websites. Using these resources, she steeped herself in the nuts and bolts of data science. She set up intermediate projects to test her understanding. Her tenacity won her the attention of many, including the admiration of Mattermark, a startup where Corthell now works as a data scientist.

“As Patient Zero of a new type of Internet-based institution-free education, I didn’t know what to expect. It was impossible to know how I would be judged and whether I would benefit from my experiment. This type of ambiguity usually makes people extremely uncomfortable. It’s like leaving a six-year-old in the library by herself instead of putting her in class with a teacher. What is she going to do?

I knew that it would be a risk, but I took a leap of faith and left myself alone in the library. In the end, the greatest reward didn’t come from the curriculum, it came from what taking a risk demonstrated about me. It led me to a tribe that respected the risk I had taken, and valued the grit that it required to follow through.”

The amount of information that is freely available on the Internet is staggering but surmountable.

Don’t be disheartened if you don’t fit the typical mold of a data scientist; use it as a chance to demonstrate your independence and self-discipline. Take a look at Corthell’s OSDSM and use it as inspiration for navigating your own growth vector.

Mobile developer or publisher? VentureBeat is studying mobile marketing automation. Fill out our 5-minute survey, and we’ll share the data with you.

This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at
Want something else to read? How about ‘Grievous Censorship’ By The Guardian: Israel, Gaza And The Termination Of Nafeez Ahmed’s Blog

VentureBeat » Big Data News | VentureBeat