Faux pas of data science

Data scientist has been the “sexiest” job  for the past years mostly thanking to all the buzzword bubble created around it. With emergence of so called Big Data came the need of new word formulation. And it is very much similar to creation of phrase business intelligence (BI). Some of you might still remember it used to be called decision support system (DSS). Regardless of the name, it evolved slowly with computer science and entrance of computers in daily life.

I am fine with DSS or BI naming, it still encapsulates the gist of how and when the acquisition and transformation of raw data into meaningful and useful information can help support business.

I am also fine with the slow evolution from decision support to research to data mining to machine learning to data science. For me, it is still just crunching the numbers, knowing mathematics and statistics, all the “non-fancy” stuff as cleaning, normalizing, de-duplicating data to exploring and even more exploring, to peer-to-peer reviews and again diving into data until coming to “fancy” part of drawing a conclusions and coming to business people with helping them on their decisions.

What I am not fine with is following:

  1. Data science combines all the standard practices and knowledge a statistician must know!
  2. Data science is sexy for the part of knowing and understanding the algorithms for multivariate statistics, for making predictions and for finding the patterns in the data. This is sexy, but to get to this point, one must be a mathematician/statistician with lots of years of experience.  The rest is just crap! Assuring the data quality (no business want to hear that, nobody wants to do this. Well. In reality, if your data is of poor quality, don’t expect good quality results), siting countless hours with one or two variables and finding out the behavior, correlation, causality, diving into literature for finding a smoothing algorithm to assure a better result, etc. Well, this is not really crap, but this is usually what “buzz-word” people don’t really like to mention!
  3. With Big data come big big big problems. Eventual consistency is probably the biggest lie ever (the abuse is similar to the one of statistical significance of p-value). having inconsistent data represents a big challenge. Big data made a big promise which a lot of data scientist couldn’t deliver (not of the lack of the knowledge but usually the lack of time or money). Big data never cared to look into the relational-model. It was never meant for business to adopt it in order to extract a relevant information. But again, this was not the fault of data scientist, but slowly adapting businesses. Stories about 4V (volume, velocity, variety, value) can be misleading mainly because technology of 4V is usually separate story to real research and mining of data (unless you are dealing with stream analysis or daily pushing new models in your business; but also a week old data will be sufficient for proving a point).
  4. Everyone wants to be a data scientist. Yes, and I want a pony. No, no. I want a rainbow unicorn. Being data scientist is dedication, is reading pile of books with formulas (usually hard to understand, but they actually make sense!), siting with random data sets, switching between random mathematical/statistical/database/script programs and languages in order to – well – just to prepare the data.
  5. All new technologies are boosting the ego of non-data-scientist with this fake vision, that a simple prediction of your company’s sales can be done with couple of clicks. I can’t argue with that. My only question is, would the result of this 5 minutes drag-and-drop prediction be of any relevance? or correct?
  6. Everyone like data scientist. But nobody like statisticians. Or mathematicians. First are usually the abusive toward data and they lie about the results and the latter are philosophers with countless formulas proving the existence of life on fifteen  decimal place. But reality is, data scientist = statisticians + mathematicians. So get over it! I still vividly remember 20+ years ago, how “data science” back then was neglected and it’s reputation was… well, it wasn’t.
  7. R and Python is the next best thing I have to learn. Well don’t, if you don’t intend to use it. Go and learn something more useful. Spanish for example. R has been in the community for past 30+ years and it wasn’t invented just recently. So has been python. And we have been using both for the purpose of supporting business decisions. If you would like to learn R, ask your self: 1) Do I know any statistics? and 2) Can I explain the difference between Naive Bayes and Pearson correlation coefficient?. If you answer on both negative, I suggest you to start learning spanish.
  8. Programing is in a lot of aspects very close to theory of statistics. Sampling for example is one of those areas where good programming knowledge will bust your abilities in data sampling and different approaches to probability theory
  9. Salaries are relative. Data scientist can get a very good salaries, especially those who are able to combine a) knowledge  of statistics/mathematics with b) computer literacy (programing, data manipulation) and c) very good understanding of business processes. A lot of knowledge and understanding come from experience and repetitive work, the rest with determination and intelligence.
  10. It is hard to be data scientist in a semi to big company! But much easier in small or as a freelance.

So next time you use term data science or data scientist or you label yourself as one, keep in mind couple of points from above. And unless you have done any kind of research for years and still get a kick out of it, please, don’t call it a sexy job. You might offend someone.

Tagged with: ,
Posted in Uncategorized
2 comments on “Faux pas of data science
  1. stefflocke says:

    Good post!

    So my personal journey was a lot of maths, mainly pure, then getting my degree in Philosophy. I always worked in data & analysis heavy roles, taking on more responsibility and scope of work. A few years back I had to buff my stats when I built some models for predicting default etc. This was when I learnt R. I studied more around R & stats and continued using R and building models.

    I’m now a “Lead Data Scientist” and I like to poke fun of myself by wearing jeans, a geeky t-shirt and a blazer so I “look the part” but I can’t yet bring myself to go Mac. My stats is stronger than people’s in BI and I know enough to hire the next people in who will be stronger on the modelling side, but my main focus is developing initial infrastructure to support data science within the company, building the first models, and building a team of data scientists.

    I would suggest that people who don’t want to be data scientists but want to do BI better should learn R (or python) – it is a fantastic data analysis tool providing analysts with the means to achieve more in their day jobs through scripted and reproducible data manipulation and data visualisations.

    I don’t think everyone should be a data scientist, and I’m still very tongue-in-cheek about my own status as a Data Scientist, but I do think more BI people should be learning R (and if they learn some stats along the way then woohoo) as it can really help them do their jobs better.


  2. tomaztsql says:

    Thank you Steff for your insights on your experience and drawing a line between Data Scientist person and BI person. It can in many ways be very similar, but the main difference is ways and methods they use in order to draw a conclusion.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Programs I use: Scraper API
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)


Top SQL Server Bloggers 2018

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond


A daily selection of the best content published on WordPress, collected for you by humans who love to read.


Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond


tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond


Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Ms SQL Girl

Julie Koesmarno's Journey In Data, BI and SQL World


R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Server Insane Asylum (A Blog by Pat Wright)

Information about SQL Server from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...

SQLPam's Blog

Life changes fast and this is where I occasionally take time to ponder what I have learned and experienced. A lot of focus will be on SQL and the SQL community – but life varies.

%d bloggers like this: