The data science job market is hot and an incredible number of companies, large and small, are advertising a desperate need for talent.
Before jumping on the first 6-figure offer you get, it would be wise to ask the penetrating questions below to make sure that the seemingly golden opportunity in front of you isn't actually pyrite.
1) Do they have data?
You might get a good laugh at this one and probably assume that this company interviewing you must have data as they are interviewing you (a data scientist). However, you know what they say about ass-u-ming, right?
If the company tells you that the data is coming (similar to the "check is in the mail"), start asking a lot more questions. Ask if the needed data sharing agreements have been signed and even ask to see them. If not, ask what the backup plan is for if (or when) the data does not arrive. Trust me, it always takes longer than everyone thinks.
To be an entrepreneur means to be an optimist at some level because otherwise no one would do something with such a low probability of success. Thus, it is pretty easy for an entrepreneur to assume that getting data will not be that hard. It will only be after months of stalled negotiations and several failures that they will give up on getting the data or, in startup parlance, pivot. In the meantime, you best figure out some other ways of being useful and creating value for your new organization.
2) Who will you report to and what is her or his background?
So, really what you are asking is: does the person who will claim me as a minion actually have experience with data and do they understand the amount of time that wrangling data can take?
If you are reporting to an Management/Executive type, this question is all important and your very survival likely depends on your answer.
First, go read the Gervais Principle at ribbonfarm. From my experience, the ideas aren't too far off of the mark.
Second, many data-related tasks are conceptually trivial. However, these tasks can take an amount of time seemingly inversely proportional to their simplicity. Or, even worse, something that is conceptually very simple may be mathematically or statistically very challenging or require many difficult and time-consuming steps. Something like count the number of tweets for or against a particular topic is trivial for people but less so for algorithms.
Further, as everyone knows, data wrangling on any project can consume 80% or more of the total project time and, unless that manager has worked with data, she or he may not understand this reality. The rule of thumb to never forget is that if someone does not understand something, that person will almost always under appreciate it. I swear there must be a class in American MBA programs that teaches if you don't understand something it must be simple and only take five minutes.
If you are reporting to a CTO-type, the situation may seem better but it actually might be worse. Software engineering and development do not equal data science. Technical experience, most of the time, does not equal data experience. Having gone through a few semesters of calculus does not a statistics background make. Hopefully, I have made my point. There is a reason we call the fields software **engineering** (nice and predictable) and data **science** (conducting experiments to test hypotheses). However, many technically-oriented people may believe they know more than they actually do.
Short version for #2 is that time expectations are important to flesh out up front and are highly dependent on your boss' background.
Third, your communications strategy will change radically depending on your boss' background. Do they want the sordid details of how you worked through the data or do they just want the bottom line impact?
3) How will my progress and/or performance be measured?
Knowing how to succeed in your new workplace is pretty important and the expectations surrounding data science are stratospheric at the moment. Keep your eyes peeled if there is a good quick win available for you to demonstrate your value (and this is a question that I would directly ask).
The giant red flag here is if you will be included in an "agile" software process with data-work shoehorned into short-term sprints along with the engineering or development team. Data Science is science and many tasks will often have you dealing with the dreaded unknown unknown. In other words, you are exploring terra incognita, a process that is unpredictable at best. Managing data scientists is very different than managing software engineers.
4) How many other data scientists/practitioners will you be working with and are in the company overall?
What you are trying to understand here is how data-driven (versus ego-driven) the company that you are thinking of joining is.
If the company has existed for more than a few years and has few data science or analyst types, it is probably ego driven. Put another way, decisions are made by the HiPPOs (the HIghest Paid Person's Opinions). If your data analyses are going to be used for internal decision making, this possibly puts you, the new hire, directly against the HiPPOs. Guess who will win that fight? If you are going into this position, make sure you will be arming the HiPPO with knowledge as opposed to fighting directly against other HiPPOs.
5) Has anyone ever run analyses on the company's data?
This one is critical if you will be doing any type of retrospective analyses based on previously collected data. If you simply ask the company if they have ever looked at their data, the answer is often yes regardless of whether or not they have as most companies don't want to admit that they haven't. Instead, ask what types of analyses the company has done on its data, did the examination cover all of the companies data, and ask who (being careful to inquire about this person's background and credentials) did the work.
The reason this line of questioning is so important is that the first time you plumb the depths of a company's database, you are likely to dig up some skeletons. And by likely I really mean certainly. In fact, going through historically collected data is much like an archeological excavation. As you go further back into the database, you go through deeper layers of the history of the organization and will learn much. You might find out when they changed contractors or when they decided to stop collecting a particular field that you just happen to need. You might see when the servers went down for a day or when a particularly well hidden bug prevented database writes for a few weeks. The important point here is that you might uncover issues that some people still present in the company would prefer not to be unearthed. My simple advice, tread lightly.