Hiring a data scientist can be a tricky process. The actual definition of “Data Scientist” is vague, the day-to-day job of someone with “Data Scientist” in their job title varies dramatically between organizations, and people come to the field from a wide variety of backgrounds. Examining the past of a data scientist candidate is a science in itself, one worthy of a blog post of its own. Today we're going to stick to building an interview that examines the present.
Most data scientist job interviews fall short of exploring the full range of topics necessary to determine a proper fit. There are three key areas to examine: technical skillset, pace of work, and communication style. Based on what I hear from candidates and peers in the data science world, most interviews are almost entirely technical, and even then, they typically only cover a portion of the relevant skills. But there's a lot more to a data scientist's job than technical work. Their ability to work at the pace of your business and communicate effectively with their team are just as important as their technical chops.
Whether this is your first or your 10th data science hire, it's worth revisiting your interview process and identifying what your process filters for. Even if you think you're already doing a good job, look at your team today across all the axes below (not just technical depth). Structure the interview process to find more people who fit the attributes you believe are necessary, while bringing in complementary experience and skills.
For the most part, it's hard to suggest exact questions to ask in the interview, simply because there are few one-size-fits-all questions that would be valuable. Instead, I'll explain how to explore all three areas during the screening process, and provide some suggestions for how to come up with these questions. We'll go in order of the areas in which companies most frequently error.
Interviewing Data Scientists for Pace of Work
Pace of work is the area where the most (and biggest) mistakes are made, especially at smaller companies. The spectrum of pace in data science varies widely, from research-heavy positions that focus solely on publishing papers, to data scientists embedded with marketing teams and tasked with quick testing and agile decision-making. Most organizations simply neglect to consider how this will factor into a role.
To succeed in this area, consider your general business environment and your needs, and then figure out how you want a data scientist to work within those needs. Certain types of work take longer, like building systems. Other things are faster, like solving discrete problems. If you're not comfortable waiting for someone to build out a system, hire an expert problem solver who thrives on finding the right answers, not getting every detail perfect.
One of the biggest mistakes companies make in this area is passing on candidates whose experience is very different from the type of role for which they're hiring. Many of the most promising candidates for roles at either side of the spectrum come from backgrounds that are very different from those roles. Data scientists coming from academia, having spent months or years on a single project, may be great at juggling many projects at once and spending just hours or days on each. A business analyst, with data skills honed shipping daily or weekly reports, could be a great candidate for a data science role as part of a product team, with a monthly or quarterly project pace.
Sometimes tertiary experience can speak to a person's ability to work at a specific pace better than the primary evidence on their resume. Ask about projects or hobbies that may shine a light on their capabilities. If you can't base this decision on evidence of relevant experience, perhaps you can hedge your bet by looking for evidence of adaptability. Ask for examples of times when they've had to make significant change to their style of work, and adjust to a new environment.
Plenty of candidates specifically seek out roles that are very different than their past experience, hoping a change of pace. At this point, the decision comes down to your own risk tolerance. Consider the positive characteristics that have made you consider the candidate in the first place. Do the pros of hiring someone with a strong diversity of experience, technical strength, and communication ability outweigh the cons of uncertainty around their pace of work? The answer will depend on the risk tolerance of your organization.
Interviewing Data Scientists for Communication Style
Communication style is another area that frequently gets neglected in data scientist job interviews. Due to the wide range of technical skillsets and backgrounds of the candidate pool, most job interviews are structured to filter candidates based on these characteristics alone. Interviewers expect that they'll be able to pick up on a candidate's communication prowess through osmosis instead of asking specific questions designed to understand communication style. This is rarely adequate. Diligence in background and technical prowess is all well and good, but not when it comes at the expense of communication; data science doesn't take place in a vacuum.
Thankfully, communication is one of the easiest things to explore in a job interview. It doesn't take much to develop a set of questions to test whether a candidate can communicate complexity to both data-savvy and non-data-savvy audiences. Set aside at least a short part of the interview for the following:
“Describe to me something you've built/created/operated/solved and are proud of. Include as much detail as necessary to explain it.”
Follow the answer with the next question:
“Now describe that to me as if I were five years old.” You can also replace this with, “Now describe that to me as if I didn't know anything about the subject.”
This will show you whether they can adequately explain concepts and systems at the granularity necessary for others to collaborate on them, as well as at a high enough level for stakeholders without expertise in that domain to understand it.
Interviewing Data Scientists for Technical Depth
Every company has a different stack and a different dataset. This means the technical portion of a data scientist job interview is the part that will vary most between companies, and is the hardest part for which to create universally relevant questions. That said, it's also likely to be the easiest portion for which to conclude an objective fit: they can solve problems, or they can't.
You've likely heard of the Data Science Venn Diagram, as illustrated by Drew Conway:
(Courtesy of drewconway.com)
The Data Science Venn Diagram puts forth three areas of technical proficiency for data science: math & statistics knowledge, substantive expertise, and hacking skills. Each area will be necessary at some level for every data scientist — I've seen plenty of companies over-index on one area at the expense of the other two, and it never ends well.
For instance, a data scientist focused mostly on research will likely need to be much stronger in math and statistics than she will in hacking skills. But she'll still need those hacking skills, as they will play a critical role in finding opportunistic ways to optimize the systems used in their research. A data scientist tasked with implementing new product features that use data science as part of their function will need to be much deeper in hacking skills, so that they can go about managing the implementation of those new features. But they'll still need a deep understanding of statistics, so that the math behind those features produces the best possible outcomes. And a data scientist working in a field with niche data types and esoteric processes, like healthcare, will require much more substantive expertise. But without math and programming depth, they won't be able to put that expertise to use.
There's another side to this coin as well; you shouldn't interview a candidate for technical skills they will never use in the role. Someone with deep knowledge of machine learning may be attractive on paper. But if that's not part of the job, then prioritizing that candidate may result in a pass on someone who would have been great in the role. Or worse, you may hire someone whose expectations don't align with reality, and have to go back to square one when they leave after just a few months.
Each area will be necessary at some level for every data scientist. Structure your technical interview to explore all three with a problem-solving question for each, at the level of technical depth necessary to be successful. Due to the changing nature of technology and the speed at which languages come and go, these questions should be built such that they can be approached without specific stack experience. Technical questions should be challenging, and having questions with multiple stages is often helpful to see just how deep someone can get into a problem.
These ideas should provide a framework for interviewing data scientist candidates. The nature of this field means there's no one-size-fits-all interview structure. But you can use the three focus areas above to build your own, with the confidence that you're covering the bases you need to ensure a good fit.