• Home
  • About Us
  • Contact Us
  • Privacy Policy
  • Special Offers
Business Intelligence Info
  • Business Intelligence
    • BI News and Info
    • Big Data
    • Mobile and Cloud
    • Self-Service BI
  • CRM
    • CRM News and Info
    • InfusionSoft
    • Microsoft Dynamics CRM
    • NetSuite
    • OnContact
    • Salesforce
    • Workbooks
  • Data Mining
    • Pentaho
    • Sisense
    • Tableau
    • TIBCO Spotfire
  • Data Warehousing
    • DWH News and Info
    • IBM DB2
    • Microsoft SQL Server
    • Oracle
    • Teradata
  • Predictive Analytics
    • FICO
    • KNIME
    • Mathematica
    • Matlab
    • Minitab
    • RapidMiner
    • Revolution
    • SAP
    • SAS/SPSS
  • Humor

So you want to build a data science team?

January 26, 2015   Big Data

Internet companies looking to start a data science team often get overwhelmed with the challenges and specific characteristics of hiring, building and growing a team.

They can become confused by all the terms, praises and buzzwords around certain technologies, algorithms and skills. Also, starting a team of this kind is not the same as it is with an average software development team. Profiles are more specific, terminology is more exotic, and there is little consensus on the market regarding best practices and the state of the art.

One major International retailer approached me recently for advice on how to build an in-house team from scratch for their E-commerce team and I would like to share with you the elements that I consider every company should clarify before getting started in this endeavour.

In this post, I will touch on these three key topics: accountability, resources, and team composition.

Accountability

It should be very clear from the beginning for everyone exactly where in the organizational chart the team will be located and who the main stakeholders will be.

There are multiple approaches that can be used. Some organizations put the data science team under the CTO, others under the CFO or even the CMO, others prefer a federated system with specialists distributed across departments and supervised by a project manager, while others go for the R&D route where the team does not have a specific agenda or stakeholder and has an open hand to decide.

This depends on the company organization, culture, and resources, as well as the team’s mission. The risk of not deciding this from the beginning can lead to confusion in the daily activities of the team.

As data science is a sexy topic, more than one person in management would be happy to have the team under her command. These expectations can lead to friction and confusion that can seriously affect the performance of the newly formed team, if they are not addressed right from the start.


To find more exclusive insights from tech industry insiders,
explore VentureBeat’s selection of recent guest posts.


Resources

Anyone familiar with the current state of the job market must be aware that technical talent in this area does not come at a low price, yet it is surprising how budgets are not properly planned.

For an Internet company with 300 or more employees trying to create a centralized team with a specific mission (e.g. recommendation engines, customer reactivation, etc.) a good first start is a team of 5 to 8 people, where one is the technical project manager, 1-2 are the hardcore data scientists responsible for modeling, and 3-5 are the data engineers deploying the production code.

Over time, teams can become larger and similar teams with different missions can surge. Therefore, a quality team represents a significant commitment and this should be clear for every stakeholder.

Team Composition

After determining the resources available and the expected team size, the next big topic is who to hire. For the regular HR department this becomes very quickly an impossible task. Very fast mailboxes are flooded with résumés containing all types of exotic qualifications and never-heard-before terms.

Here it is also very easy to be influenced by media or technology vendors. Hence, it should be defined which hard skills and technologies are relevant, if education weighs more than experience, if big names in a résumé carry an extra weight, if it is really necessary to hire super senior engineers or long-experienced post-docs. This is easier said than done, as in the seed stage of the team there are still many unanswered questions.

My advice is to start with solid basics and not look for the über-exotic. Then, the objective in the first year or two of the existence of the team is to lay a foundation and justify the existence of the team through quick gains and harvesting the low-hanging fruit.

Taking the above example of 8 individuals, and considering that the company might not be able to compete with the Googles and Facebooks of this world in prestige, remuneration and perks, a good initial composition can look as following:

Technical Project Manager: The person has 3 to 5 years experience managing similar teams dealing with quantitative subjects. Preferably, this person has a solid technical background and although she is not expected to code, she is capable of doing it. This person not only has the skills expected in a project manager, but has also an understanding of the algorithms and techniques used by the team and great if she can also do code reviews.

Data Scientist: Someone with a solid quantitative background. Ideally, she holds a Ph.D in the fields of Physics, Mathematics, Computer Science, Biology, or associated disciplines. This person should be judged by the quality of her research, where she has published, and what she has contributed.

It is entirely possible to be an expert in machine learning and be really bad in software development. Hence, it is very important to not assume anything and double-check her coding skills. Unless you want to develop a more academic R&D team, somebody who cannot code will not be very helpful, especially in the early days of the team. Additionally, it is important to verify how hands-on the individual is, as candidates from academia sometimes have wrong expectations of what industry needs from them.

Data Engineer: This person does not need to be very academic. She can be a solid software developer with an interest in quantitative topics. This person must have a very solid understanding of algorithms, data structures and software engineering in general. Double-check the algorithms part (especially computational complexity), as many engineers have a poor understanding of the subject, yet it is essential for every robust data team. Overall, her code must be excellent. Try to look for individuals who actively contribute to open source projects. Ideally, this person uses the same technology stack as your data scientists (e.g. Python, Scala, etc).

Seniority for each of these positions depends on the company and budget. However, I do not recommend hiring very senior individuals in the beginning. Often, they have very specific expectations, but in the early days of the team, the scope and nature can change dramatically.

In addition, data teams have to create their own platforms in the beginning, as the data they need might not be there or might not be in the formats that they want it. This means doing non-glamorous tasks and getting dirty.

Therefore, it is preferable to have ambitious and adaptable individuals, even if they might not be very experienced.


Rodrigo Rivera is a tech entrepreneur and founder of Emplido, an analytics recruiting company acquired by Experteer Inc. In Asia and Europe, he has built and led data science teams for Internet companies in the areas of product management, advertisement technology, customer relationship management and sales.


VentureBeat is studying social media marketing. Chime in, and we’ll share the data with you.

Recommended article: Chomsky: We Are All – Fill in the Blank.
This entry passed through the Full-Text RSS service – if this is your content and you’re reading it on someone else’s site, please read the FAQ at fivefilters.org/content-only/faq.php#publishers.

VentureBeat » Big Data News | VentureBeat

Build, data, Science, Team, Want
  • Recent Posts

    • Someone’s having surgery
    • C’mon hooman
    • Build and Release Pipelines for Azure Resources (Logic Apps and Azure Functions)
    • Database version control: Getting started with Flyway
    • Support CRM with New Dynamics 365 Field Service Mobile App
  • Categories

  • Archives

    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    • December 2016
    • November 2016
    • October 2016
    • September 2016
    • August 2016
    • July 2016
    • June 2016
    • May 2016
    • April 2016
    • March 2016
    • February 2016
    • January 2016
    • December 2015
    • November 2015
    • October 2015
    • September 2015
    • August 2015
    • July 2015
    • June 2015
    • May 2015
    • April 2015
    • March 2015
    • February 2015
    • January 2015
    • December 2014
    • November 2014
© 2021 Business Intelligence Info
Power BI Training | G Com Solutions Limited