24 January 2019

Building data science capabilities: the six functions of data teams

by Wojciech Gryc

The terms “data science” and “data engineering” are used frequently and loosely, often meaning different things to different people. Companies are building data teams with objectives that can be fairly diverse, often leading to confusion around hiring, goals, and what success actually looks like. For example, how would you define a “data scientist”? Some would argue for hiring only PhD-level researchers with backgrounds in AI, while others would say anyone with undergraduate coursework or coding school experience in statistics would suffice.

The biggest issue here is actually defining what a data team is expected to do and being very clear about what needs have to be met as a result. Before you build a data-oriented team, you should decide what problems the team is trying to solve – i.e., what sort of function the team will play. Depending on the type of function it plays, you will have significantly different key performance indicators, skill requirements, and hiring strategy.

In this post, I lay out the six major functions of data teams. I’ll discuss KPIs, metrics, and other managerial topics in a future post.

From my experience, data teams typically play a combination of six functions, which are listed below:

Data Engineering. These teams focus on data warehousing, IT support, and ensuring that whatever information, data, etc. is used by your company is actually stored effectively and in a scalable manner. This role is closer to IT and operations, rather than involving strategic business decisions, and hiring DevOps engineers or those with a background in IT architecture is critical.
Reporting. Generating reports on KPIs, making dashboards available, and administering business intelligence tools. This is about ensuring the rest of the organization or company has access to the data and metrics it needs to make effective decisions. Note that similar to data engineering, the team here is responsible for ensuring reports are available in a timely manner, but the team is not responsible for interpreting results or making decisions based on those reports; that is up to the individual business units receiving them.
Insights and Analytics. Your team is now moving from generating reports or ensuring infrastructure is running, to developing insights. This is where your data science team moves into a creative problem-solving oriented role. In addition to generating reports, the team is tasked with finding insights for specific problems or challenges business units are facing. This requires hiring individuals who are more consultative in nature and have communication skills that enable them to promote their insights and help others understand their significance.
Business Process Automation. Insights and analytics generate opportunities to improve your company, but it is up to other business units to execute on those insights. In the “process automation” function, you are now turning the data team into an enabler and executor. The team will have the requirement of actually actioning their insights by automating processes and seeing if this improves the state of the company. Automation can be rules-based and seemingly simple, but culturally it is a big step from the “insights and analytics” function because your company has to actually learn to trust a process and let go of a human overseeing decision-making. Many companies are not comfortable doing this. This also requires holding your data team accountable for performance and results, which requires hiring individuals comfortable with KPI and deadline accountability.
Prediction and Optimization. If you’re automating a process, collecting data, and then using the data to make further/better/new decisions, then you’re now turning the entire automation function into a positive feedback loop. With prediction and optimization, you begin using research, machine learning, and statistics to find new ways of solving problems and optimize decision-making. This is much tougher than process automation because you might have to make unintuitive decisions and simply observe their effects. Furthermore, there might be times where the models, forecasts, or decisions are incorrect – and your company needs to be comfortable with this possibility. Hiring requirements here tend to be the most difficult, as you need to hire data professionals who can do research and build models, but also have team members who can communicate, execute, and be held accountable in business environments. This is an extremely difficult balance.
Product Development and Research. In this case, you do not automate or optimize processes, but you actually try to turn the entire algorithm into a self-serve product. This sort of product development is risky and expensive: it often requires you to hire researchers who are flexible in their skillsets, world-class in their capabilities, and who might fail at productizing whatever it is they are trying to productize. To make matters worse, product development as a creative and design-oriented pursuit requires people who can listen to customers and respond to them, which can be at odds with software engineers or AI researchers with deeply technical backgrounds. Oftentimes, these functions require multiple team members with differing and complementary strengths, and the work itself can often seem very unstructured, with unclear timelines due to the level of experimentation it requires.

The biggest mistake that most companies make when building a data team is not defining a focus for the team. While prediction, optimization, and product research all sound very exciting, few companies have the data infrastructure and data sets necessary to do this. Furthermore, many lack the culture that actually enables teams to trust algorithmic, automated decision-making.

Overinvesting in these functions before you are ready can be a costly mistake, as you’ll hire people who are overqualified for what they are doing day-to-day, or worse yet, you’ll have teams that are generating research/insights that the organization isn’t ready for. This leads to demotivated employees, wasted resources, and promising data initiatives eventually failing.

With that in mind, a safe approach to building your capabilities is starting with the simpler ones and moving linearly down the chain. For example, you can’t do reporting (#2) without some form of engineering (#1), and a company that isn’t comfortable with process automation (#4) will struggle delegating decision-making altogether to an algorithm (#5).

By taking this linear approach, later functions are more likely to be effective and successful, and your data science teams and initiatives will succeed and sustain themselves.