According to Glassdoor, the US starting salary for a data scientist is around $120,000 and it it has more than 290 data scientist jobs on its Australian website with salaries being considerably “demand driven".
This means that opportunities are opening for data scientists, but true contenders are few and far between. Instead, companies find themselves interviewing candidates that are barely qualified and inadequately-skilled.
Yet these data science pretenders can often present themselves as a solid choice. Companies desperate to get data science skills on board can easily fall for these imposters, which leads to wasted time and money.
With that in mind, iTWire spoke to Alec Gardner, director, Global Analytics Business Consulting at Teradata, about how to spot data science pretenders and how to hire the right one.
“Having the wrong person in the role means data analytics projects are unlikely to yield the anticipated business-transforming results. And, because data science is an emerging discipline, it’s easy for data science pretenders to convince the rest of the organisation that it’s not their fault the projects are stalling or not delivering accurate, useful insights,” Gardner said.
What does a data scientist do?
Experienced data scientists are a rare commodity and organisations should snap them up when they find them. An experienced data scientist or effective data science team turns data into actionable insights, which can mean the difference between overtaking competitors or lagging.
Data scientists play a vital role in the success of an organisation because of their ability to identify business problems, identify the right data to help solve a problem and communicate the solution back to the executive team.
Data scientists can derive much deeper and more varied insights from their data. They can then recommend improvements in areas of the business ranging from supply chain and logistics to product development and customer acquisition.
What happens if an organisation hires a data science pretender?
Data science pretenders can be hard to spot, especially if there is no one already in the organisation with data analytics skills and knowledge. So, it’s important to be on the lookout for early warning signs that suggest the data scientist is not delivering on his or her promises.
If the data scientist has a poor mathematical background and no real passion for mathematics, they are unlikely to excel. Data scientists should also have strong computing skills and demonstrate strong communication skills early. This will be a key sign to know they can give context to information and work well with others.
A heavy focus on collecting the data rather than analysing it can indicate that the data scientist is only comfortable with one half of the equation. If the data scientist was hired to help optimise processes or drive decision-making, it will soon become apparent if they’re failing to deliver the information and insights that can help achieve those goals. Managers should set measurable objectives for the data scientist to hit in the first few months that will establish whether the person is capable or a pretender.
How can an organisation ensure data scientists are successful?
Even the most experienced and knowledgeable data scientists can potentially deliver less-than-impressive results if the right conditions for success aren’t in place.
One of the most important of these is mapping outcomes to core business objectives. Data can be exciting and fascinating, and it can take analysts on complex journeys, so it’s crucial to keep business outcomes in mind or the project may not deliver a return on investment. Managers should check in with their data scientist regularly to ensure they’re keeping those business goals top of mind throughout every facet of the project.
It’s also important to put strong communicators in both the data science team and the business unit. It can be worth designating a representative from both teams to make sure communication flows clearly between them.
There is no better way for a data scientist to understand the business than to be embedded in the team needing insights. Isolating data scientists from the business only serves to make it more difficult for them to understand the business objectives they’re supposed to be helping to achieve. Instead, they should sit with the team, absorbing knowledge and information that may not otherwise be apparent.
Employers should also look at building a team with varied skills. Diversity brings new ideas, approaches, and ways of looking at existing problems. It’s important to find data scientists with a range of different skills so each can bring a unique value to the team.
How can businesses avoid hiring the wrong person in a data scientist role?
There are few, if any industry-wide qualification or certification that employers can check to see if a data scientist is proficient, so it comes down to the interview process. There are several questions the interviewer can ask: depending on the answers, they’ll know if they’re talking to a true data scientist.
For example, the interviewer can ask the candidate to nominate their preferred machine learning algorithm. If the candidate doesn’t have a strong preference and can’t explain why they prefer one over the other, then they’re probably not suited for the role. Interviewers can also ask how the candidate would prove that changes to an algorithm have made an improvement. The candidate should talk about making sure the before and after results of the changes are repeatable, and were tested in a controlled environment with the same data and equipment on all occasions.
A popular question for interviewers is to get the candidate to give an example approach for root cause analysis. Their response should go much deeper than a simple definition. It should demonstrate the candidate’s precious experience in deconstructing code during troubleshooting and that they were able to solve the issue after pinpointing the cause.
Getting specific, it can be worthwhile asking the candidate to explain when they would use different tools, such as Spark versus MapReduce. There are numerous answers here. For example, in-memory processing using resilient distributed datasets (RDDs) on small datasets is faster than MapReduce because it has a higher input/output (I/O) overhead, however flexibility is the key attribute to look for. MapReduce may get the same answer as Spark but more slowly, so knowing when to use which approach and why is crucial.
These technically-specific questions are just a start but they can quickly separate data science pretenders from those with genuine knowledge and experience.
Companies that hire the right data scientists can see their business operations and profits improve as a direct result. Hiring a data science pretender can do the opposite, so it’s important to get it right.