Data are becoming the new raw material of business
The Economist

What Kind of Data Scientist Do You Need?

An article written by Data Incubator founder Michael Li was featured on Harvard Business Review today. It can be found where it was originally posted here.


question-mark-1872634_960_720If you’re looking to hire a data scientist to join your company, you’re not alone. At The Data Incubator, we work with hundreds of companies that are looking to find data scientists from our Fellowship Program. In our experience, candidates usually come from one of two disciplines: computations or statistics.

Candidates with a strong science or math background usually have had rigorous statistical training in distinguishing between signal and noise and can tell when they are “overfitting” a complex model. Those with a computer science background frequently have the software engineering chops to handle large amounts of data by taking advantage of parallel and distributed computing. While all data scientists need to be functional in both, we’ve found that people coming from each of these backgrounds have quite different strengths and weaknesses. So which type of background should you look for when hiring? That will depend on your business — and whether you’re hiring for a digital or non-digital department. 

Digital departments. Think about the departments in the digital economy that have a regular profusion of data, generated from mobile, tablet, laptop, or desktop sources. Mobile apps, e-commerce, wearables, and digital advertising are just a few companies that fall into this category. When data is plentiful, analytics often benefits from the unreasonable effectiveness of data — the idea that as we are able to learn from more data, we are able to achieve increasingly accurate models. Doing so certainly requires a deep knowledge of statistics. But a strong computational background is needed even more. These companies often benefit from having data scientists with software engineering backgrounds who can quickly build the systems that learn new trends in real time.

Notice that this isn’t a company-wide designation but a department-based one. So-called “Old Economy” companies often have digital departments that manage websites and mobile apps, even if their core business is not digital. For example, we work with a large consumer financial institution that hires our Fellows as data scientists for their digital advertising department. Regardless of the industry, companies often have digital departments that are data rich.

Non-digital departments. The consumer financial institution we work with also hires our Fellows for their credit department that assesses underwriting risks. But its ideal candidate profile is very different. Here, the data comes in more slowly and is more expensive to collect: loan defaults result in direct loss to investor capital, and borrowers who are initially reliable could become insolvent months or years after the initial credit decision.

Because of the delayed feedback, one has to be very careful to vet models up front. In this case, pure software engineering will not be nearly as useful for a company’s survival. Instead, a strong statistics background can ensure the credit models withstand rigorous statistical scrutiny and do not overfit the data. For many non-digital businesses, regulators also need to be satisfied, adding an extra layer of scrutiny and statistical rigor. For example, in credit scoring, lenders are not allowed to discriminate on the basis of various protected classes (e.g. race or sex). Through the Disparate Impact Doctrine, or “Effects Test,” lenders must be careful not just to avoid explicit consideration of a protected class in their model, but also to avoid using factors that might implicitly be proxies. For example, while not overtly racist, factors like zip code and school district correlate highly with race. Penalties and the negative press from a racist or sexist model would be disastrous for the company.

Even though both the digital and credit groups belong to the same company, their data science needs vary greatly. Departments where data can be expensive to collect (either in up-front costs or potential losses), where data arrives slowly, or where there is intense regulatory scrutiny will benefit from having data scientists with a stronger statistical background. Those departments where data is cheap and plentiful, where data arrives quickly, or that aren’t bound by a lot of regulation should be on the lookout for candidates with stronger software engineering skills.

What is a hiring manager to do? As a first step, hiring managers should be on the lookout for these skills on resumes. But keep in mind that resume screening is far from perfect. For example, someone with a PhD in physics could easily have a strong statistical or computational background. At The Data Incubator, we go one step further and assess their skills through technical challenges that force them to show us — not just tell us — where their strengths lie. And companies that hire from our program are implementing their own technical tests as well.

When training or hiring data scientists, it’s important to keep in mind the broader business context in which data scientists need to operate (for example, will your data scientist be producing analytics for machines or humans?). Savvy managers don’t just hire data scientists who graduated from brand-name universities or have work experience at top companies; instead, they look at each potential hire’s skills and experiences to make sure the candidate’s skills and the company’s needs align.

Editor’s Note: The Data Incubator is a data science education company.  We offer a free eight-week Fellowship helping candidates with PhDs and masters degrees enter data science careers.  Companies can hire talented data scientists or enroll employees in our data science corporate training.

Back to index