Career Overview: What I Do As a Data Scientist
Sit in any Silicon Valley coffee shop for a while and you will no doubt hear someone talk about data analytics. What is it and who calculates these numbers? The task of the data scientist is to find patterns in large amounts of data and relate them to solutions in the real world.
Finding trends in big data is a fairly simple concept, but certainly difficult to implement. To learn more about what a data scientist does, we spoke with Dan Mullinger , who works with Think Big Data Science Practice and uses his academic background to help consult business and engineering solutions.
Tell us a little about yourself and your experience.
I am a data scientist with a degree in Mathematical Sciences and Organizational Psychology; I also have significant academic background in computer science and sociology. I have worked my career in statistics, analytics and technology, but almost entirely in business groups, which largely determines my professional worldview. Today I am the director of data analysis at Think Big and have worked for the company for four years.
What prompted you to choose your career path?
For most of my life I was interested in human organization and loved to make quantitative decisions. In college, I was a student who wanted to apply the work ofAxelrod andHamilton to sociology and social psychology in game theory courses. Working to help businesses become data driven seems like a professional extension of that identity.
How did you get a job? What kind of education and experience did you need?
Professionally, I have been in a variety of roles in statistics, research and technology early in my career. I worked in a similar way to what I do now, namely analyzing real-world data using open source technologies, but before the term “data scientist” appeared. After that, in 2010, I got into big data. But it took me a year or so to really appreciate what Hadoop and similar tools can do for data science. After that, I caught up with Rick Farnell, President of Think Big, who inspired me to create the Professional Services Data Scientist team because of the impact that data science has on the enterprise. While my statistics and technological background were critical, I think my social science education and experience with business teams were most critical to my role. They allow me to think about problems, take into account the factor behind the math, and do it with an eye to practical implementation within the organization.
What are you doing besides what normal people see? What do you actually spend most of your time on?
Most people have heard of “data processing” and now know that this is an important part of working with data. However, many people are unaware of how cross-functional data science is and how much time it takes to align business, analytics, and technology teams. Especially in an enterprise where teams have multiple competing programs, getting these groups to speak the same language and agreeing on priorities is an important part of the job.
What misconceptions do people often have about your job?
The biggest misconception in data science is that it’s all about “algorithms.” I constantly run into people and would-be scientists who think our job is to choose between a neural network and a support vector machine . In truth, data science starts with transforming a business case into an analytic agenda. Developing hypotheses, understanding data, learning patterns, and assessing impacts take much longer than choosing algorithms.
What’s your average uptime?
Data scientists are professionals and should expect a professional work week. This is currently 60 hours per week.
What personal tips and shortcuts have made your job easier?
Two tips will make life easier for our data science teams: First, keeping an internal blog that quickly records daily results (even with visuals). These are not formal reports, but random documentation of the analysis of results obtained over time that support a common understanding of analytic data and data in data science, project managers, etc. It also helps other scientists who will study the same data in later months.
The second tip is to create a “runbook” after any simulation. It is documentation of what models are used, why they were developed, and how to replicate any analysis made. This ensures the repeatability of our work, even by you. When you’re busy, it’s easy to forget a test you did three months ago.
What are you doing differently from your colleagues or colleagues in the same profession? What are they doing instead?
I spend less time looking for new technologies than many of my peers. Instead, I focus on the core set that I am familiar with. Today, tools like Hive over Hadoop, R and Python take me very far. I’ve watched teams lose countless cycles trying to do something in a “new” way — spending more time making new technology work, rather than innovating in approach. It’s a delicate balance, but I try to wait until I see intelligent use of new tools, not waiting until I feel like my existing tools are failing.
What’s the worst part of a job and how do you deal with it?
For the data scientist, what is most frustrating is creating models or doing work that doesn’t become part of the organization’s current processes. While a certain part of data science is R&D, we want our work to be meaningful and used by the organization. A canonical example is the Netflix prize , which was never realized as it was considered too expensive (although it is certainly essential for professionals). To deal with this, we have checklists that we fill out before starting the project. This ensures that we understand the business model, there are key performance indicators (KPIs) tied to results, and that there is a path to operationalization so that our work is integrated and long-term.
What is the most enjoyable part of the job?
I love seeing customers become data driven. Customers, who now have running models, tools to support question answering, and critical ones , have developed meaningful processes for moving them from data to KPIs and to decision making. This is the real purpose of data science, and it’s great to see it in action.
What advice can you give to people who need to use your services?
One of the things that is rarely talked about is how high the dropout rate is in data science. While this can be partly attributed to a competitive marketplace, I have long believed that most of the betting comes from companies hiring data scientists before they have a plan for using them, or expecting data scientists to solve business problems from bubble. … I often see data scientists in client organizations who are part of technology teams and create models that are never used meaningfully. And I’ve seen these groups dissolve due to lack of mission.
You don’t hire a plumber to build your own home; you expect them to work with other professionals or even be guided by architects. Likewise, don’t look for a data scientist or expect them to build your business. Chances are, your position required statistics and technical skills. Have a goal and plan for how to combine these skills with the drivers of your business, even before you start hiring.
How much money can you expect at your job?
Sure, it comes in many different ways, but it’s a well-paid role. Even data scientists often make over $ 80,000 in their first year. The salaries of experienced data scientists depend on their place in the organization. Those in technical positions on top teams can certainly more than double their earnings. But the highest paid data scientists are those who have learned to work in business roles, much like analytics is usually structured in enterprises. They can be up to 400 thousand dollars.
How are you progressing in your field?
There are several ways. Some data scientists are part of a technology organization (more often those working in big data) and have the same growth path as many engineers — moving up to team management. Others operate within the business (similar to how traditional analytics is structured in enterprises) and can grow to management, ownership of solutions and products, and so on. I don’t know if we’ve seen many paths from this new area to top analysts. so far (at least in large companies), but I suspect they will come from the business side.
What do your customers underestimate / overestimate?
They underestimate the importance of well-defined and communicated KPIs. These metrics of throughput, rather than output, are the most likely things data scientists will be able to measure and report on the impact of the model. In enterprises, the relationship between bandwidth and revenue is complex and difficult to assess. Well-defined KPIs serve as the focal point for communication between data science and business, set clear goals and objectives, and are the foundation for data governance. It also helps data scientists answer the frequently asked question, “When should I stop iterating over the model?” When the performance of a model exceeds the percentage or error rate, when it is a KPI, success can be clearly defined or, alternatively, when someone is spinning their wheels.
What advice would you give to those who want to become your profession?
Spend as much time studying analytic interactions as studying models. The popularity of machine learning has led many data scientists to lean towards the computer when analyzing data, but are unable to communicate the results. I’ve seen data scientists try to explain the results by trying to teach C-levels what a random forest is (with obvious implications). Analytics transfer is not about teaching your CEO to be a data scientist, but about interpreting models and correlating them with important results. Unfortunately, even related statistical methodologies, such as sensitivity and reliability analysis, have been forgotten as “algorithm” dominates in many data science curricula.