Predy Logo
01/13/2022 11:00

Why do we think data science is an "art"?

Dr. Jabe Wilson says, “Data scientist is like an artist working on a canvas…” and We strongly agree with him!

 

The data scientist works continuously on a problem and progressively develops alternative solutions, leading to a greater understanding of the problem. Likely, the process of an artist working on a painting over several weeks and seeing the canvas changes towards his/her vision is similar. In this blog, I will reveal the process behind this art and how data scientists solve problems and take steps of creativity.

 

We can summarize the steps taken by the data scientists for solving a problem as:

 

  • Problem definition
  • Data creation and feature engineering
  • Algorithmic design
  • Adjusting the whole system

 

The problem definition is essential, and it is the part where the math is highly involved. In general, you would build your hypothesis here, and then mathematically formulate it. This part shapes your algorithm choice and all the design afterwards. For this, you should find answers to questions such as “how can I formulate the business goals as measurable and optimizable performance metrics?”, “what is my possible inputs and desired outputs?", "is the problem supervised or unsupervised?", "is my problem classification or regression?", "is there a spatial, temporal, categorical relation in the data?" Your answers to these questions define your problem and shape your overall design. Sadly, answering all these questions in a concrete way is not always easy, however, this is the most fun part of it.

 

Data creation and feature engineering are where you decide your features and explore your data. Although the promise of deep learning is to achieve this in an end to end manner, it is not always easy with limited data. This process includes many preprocessing steps: removing noise and outliers, reducing dimensions, performing scaling and similar. The observations you had in these parts will also shape the performance and design of your algorithm. For example, you could explore that your target values are significantly correlated with one of the features. Still, this correlation could mean nothing because “correlation does not mean causation”. If, in such a case, your model wrongly interprets this relation, then it may perform poorly.

 

The algorithmic choice is greatly dependent on the previous steps and it is where the ingenuity happens. Considering the algorithm as a black box that tells the truth is mostly a misconception. Methods telling you up to a certain level what is happening inside this black box, such as in deep learning models, are in early stages of research. Thus, “explainability” is required. Furthermore, most complex algorithms usually fail in real-life applications due overfitting since most of the time you won’t have enough data for modelling or the model changes constantly. It is possible to obtain high scores by selecting algorithms with trial and error. However, this process is costly or even impossible, and you could waste valuable time. In every piece of the design choice, you must ask "why" to create awareness of the process and algorithms.

 

Also, you need to have extensive knowledge of your algorithms including all their parameters, weaknesses, advantages since you need to tune them to avoid overfitting or underfitting. Monitoring your algorithm's learning process and performance with the proper metrics allows complete control of the system. Thus, a data scientist should also have a great command on monitoring tools and metrics.

 

Lastly, following the latest development in the literature is a must. Data science is a vast and dynamic area. Nearly every day, new algorithms, methods, tools and results are introduced in different domains, which may greatly change your understanding, approach, solution and even the problem you are working on.

 

Summing up all together, data science is a field that requires expertise, sensitivity and artistic mastery at every stage. This is even more true for operationally used artificial intelligence systems. At Predy, we aim to produce predictions for your future needs, which is normally a task carried out by expert staff or conventional software systems. In order to meet and exceed your performance expectations, we have little or no room for imperfections at any stage. That is why, we believe that “data science is art” and we define our values as “Meticulousness”, “Patience” and “Creativity”, i.e., the skills that a top data science artist should have.

 

We are meticulous; we love data and we study every detail of it meticulously.

 

We have the patience; we aim for the best and strive for it to the limits.

 

Our team is a collection of creative people; we think out of the box and consider every possible option.

Check Also

A powerful use-case for spatio-temporal prediction: What-if scenarios.

What-if scenarios refer to the case where one considers all contingencies and try to find the optimum action plan to cover them all.

06/22/2021 09:45

What is the "Predy Spatio-Temporal Forecasting Framework" about?

Predy is a spatio-temporal prediction system which is built on top of a very powerful framework.

06/22/2021 09:45

What is spatio-temporal forecasting?

In dictionary terms, prediction is a statement about what someone thinks will happen in the future. In data science terms, prediction is estimating future events based on past and present data most commonly by analyzing the trends and fitting a formula that gets as close to the data space as possible.

06/22/2021 09:00