Why is there a career as a data scientist

Young professional

Tom Becker, Regional Vice President Central Europe at Alteryx, warns that technological developments could automate many of the data scientist's tasks in the future. In the COMPUTERWOCHE interview, he explains what consequences this has for data scientists and how the job description will change.

Astronaut, firefighter, veterinarian - those were dream jobs of our childhood. Has the job of data scientist taken on this role?

Becker: From my point of view, it is high time to critically question the hype surrounding this profession. Translated, the data scientist is a data scientist. The digital transformation is now being driven forward in companies. For this we need people who work with data on an operational level and create analyzes. However, that does not mean that every data worker needs training as a data scientist.

As an all-encompassing miracle weapon, data science no longer lives up to its role that it may once have had. We see two main reasons for this: Firstly, data projects require specialist knowledge from the departments, so a strong vertical alignment of the data teams and more domain expertise are required. Second, more and more data science tasks are being automated, available via self-service app, or just a click away in the cloud. This means that significantly more employees in the specialist departments have access to analyzes.

What are the consequences of this and do you advise against studying data science?

Becker: I don't see it that critically. However, if more people are to work with data in the future, handling it must be simplified. We should therefore expand our knowledge of data analysis in the departments. Some of the topics that can be found today in the data science course are already learned by a student in other subjects such as computer science, mathematics or mechanical engineering.

And now comes the decisive innovation. Today we can provide employees with completely new tools to evaluate data faster and more easily. The cloud simplifies the way we use powerful analysis tools and the possibilities of artificial intelligence (AI), deep learning and machine learning (ML). There are also data management platforms that provide new data pipelines with a click of the mouse. Users from all departments access these self-service solutions and are able to make decisions more quickly.

So do we need an additional course on data science in every apprenticeship?

Becker: The role of the data scientist is changing, it is becoming broader. Statisticians with a mathematical and scientific background become universal data specialists with programming knowledge. We need additional definitions in order to classify the more specialized fields of activity. Sure, the classic data scientist will continue to develop models with which added value can be generated from data. However, companies need new data-savvy employees, such as data workers and data engineers. We should therefore make sure that the value of data is recognized during the specialist training and that appropriate data analytics courses are integrated.

It is already fundamental for the economy and society that we can handle data. These skills should be taught in school. I already gave lessons in elementary school on programming Lego Mindstorms, i.e. for the robotics platform for the well-known plastic building blocks. That already helps to teach children how to handle data, computers and robots.

So we will see another specialization of data scientists?

Becker: In any case, we needed new roles for emerging specialist areas relating to data management and data analysis. Technologies such as machine learning, deep learning and AI thrive on fail-safe infrastructures, always available data pipelines and, most importantly, on high data quality. This is where data engineers help develop the IT infrastructure. There is also a need for specialists such as machine learning engineers who, for example, set up IoT environments and ensure that self-learning systems are created. Then we have the broad field of data quality. In the future, we could see a data quality security officer who ensures that the data quality is correct. Because incorrect input of an ML model also leads to incorrect analyzes of an AI application. Today you can apply to be a data scientist for all of these tasks. Do these experts also have the necessary domain knowledge?

How analytic process automation simplifies data analysis

So it is getting more complex again because a lot of specialists are needed?

Becker: Not necessarily. The work of data specialists will be further simplified by the automation of processes. This is supported by a new generation of solutions for analytic process automation, which simplify working together on data and analyzes. An analysis by Forrester says that citizen data scientists or data workers will be able to process more tasks than highly qualified data specialists by 2021.

The following figures also show why automation is urgently needed. Many employees spend most of their time looking for data. This includes the highly paid data scientists. According to IDC, data analysts spend up to 70 percent of their time searching for data. Data workers waste up to 44 percent of their working time on unsuccessful research. In addition, data workers use between four and seven different software tools for their data-related tasks, which also means wasted time.

Are there already solutions for this automated data world?

Becker: Everyone in the IT industry is talking about digital transformation. However, this must first reach people's heads before it becomes a reality in the workplace. In my view, a new data culture is needed and this is more important than hiring a group of highly paid academics. Last year there was a survey by NewVantage Partners. The IT consultancy found that 72 percent of companies have not defined a data culture and 69 percent do not see themselves as a data-driven organization. A revealing statement from the analysts at IDC: A third of business decision-makers have considerable difficulties using data in a more targeted manner for business decisions.

For me the question arises as to how a data scientist, especially if he works in the position of a lone wolf, can make a lasting difference? So it is time to talk about the fundamentals of the company.

  1. Julia Ertl, Accenture
    “In data science projects, you started with proof of concepts, which were often isolated and very experimental analyzes. However, a lot has happened since then in building IT infrastructure, and the much bigger challenge is actually leveraging the results. The crux of the matter now is to bring the IT infrastructure together with the organization, its processes and, above all, people. To do this, the right people have to be brought on board, and new knowledge and new roles have to be built up. "
  2. Dr. Kay Knoche, Pegasystems
    “In many cases the status quo is completely blind, and it makes it harder than it already is. We always advise our customers to make a decision from the existing data so that at least one action is operationalized. The end results, the KPIs, can be continuously measured against each other and thus determine which model performs best in the end. "
  3. Mehmet Yildizoglu, Data Reply
    “It's about how you can get as much as possible out of the respective use case with the various models and create added value. So you can't say in advance which algorithm will deliver the best fit for the problem. You have to try it out, and if you want to put a solution into operation, it takes more than a pure data scientist. That is also the reason why its profile is changing: away from the purely academic view and towards going live, paired with software engineering know-how. "
  4. Manuel Namyslo, SAP
    “There is still a big gap between the data scientist and IT: Models that were developed locally are discarded just because you don't know how to integrate them into your system landscape. There is great demand for a platform in which data pipelines can be set up, models can be set productively and workflows can be stored. Because at the end of the day, the knowledge I gain from the data has to be reflected in the company's business processes. "
  5. Walter Obermeier, UiPath
    “Face recognition in China is a good example that there are always two ways of looking at data protection. On the one hand, nobody wants to be recognized anywhere. On the other hand, one would also like to have security in Europe. But both do not work together. A machine learning tool only takes the data that is made available to it. The danger does not come from machine learning, but from when which data may be used, how and for what purpose. "
  6. Dr. Christian Schneider, wetter.com
    “No matter what you invent, no matter how good it may be - you can almost always abuse it for bad things. So that machine learning does not fall into disrepute, the framework conditions must be set in such a way that the algorithm is only used for the corresponding task. "

Do you think that this kind of upheaval is being heard in the current times?

Becker: Right now is the right time. Many people work from home and are forced to work purely digitally. Children learn how to use e-learning platforms because schools are not yet fully open. A transformation is already taking place in the mind. There is a risk that we will fall back into old patterns after the situation has normalized.

At the beginning of the Corona crisis, practically all processes from everyday working life were digitized ad hoc so that employees can work remotely. These processes generate new data that makes the performance of an organization transparent and shows deficits. This data can now help optimize supply chains, which will be extremely important as the economy reboots.

In our projects, we campaign to ensure that employees are empowered to use this data. Self-service tools help to carry out analyzes quickly and without IT experts and to take over tasks that were previously only carried out by data scientists. Anyone who has to go on short-time work due to the Corona will receive an online course on Udacity from us free of charge, which teaches the basics of data science in 150 hours.

So the bottom line is: the data scientist is dead, long live the data scientist?

Becker: The operational tasks related to data science have changed. Data Science 2.0 is approaching us, like the automated roll-out of models for data analysis. Software manufacturers such as DataRobot use ML to develop automated analysis models that employees in the departments can use even without a scientific background. However, under certain circumstances, employees with statistical know-how may be required to interpret the data, which speaks in favor of the thesis that above all we have to define new data science specialties.

Above all, we need to simplify and automate the data analysis tasks. With the new concept of Analytic Process Automation, companies are creating the organizational basis for this, as it unites people, processes and data. Only the combination of these three factors enables sustainable change in the workplace, as it democratizes the use of data, i.e. makes it possible for everyone. We call it a new data culture. In this way, every employee becomes a data worker who can take on tasks in the operational area that would otherwise end up on the table of a data scientist. (fm / pg)