My predictions for 2021 – Data and analytics

Year 2020 has had a tremendous impact on our lives and has driven many changes. Since last year was a year of radical changes (which we were or were not prepared for, but had to accept them), these will certainly have an influence on what the year 2021 will bring us.

I have made a short list (curated list) of predictions for 2021 where data and analytics might head. For better clarity, I have grouped some of the relevant areas, mostly covering:
– Data Engineering
– Data Analytics
– Machine Learning
– Cloud Technology
– Languages and Roles
– Data Governance

Data Engineering will continue to grow and will see additional boom in 2021. Data consolidation will make this role expanding and will further more heavily depend on success of any ML project. New wave of ETL tools will emerge, making data transition, transformation and data availability easier, faster and more reliable. Depending on the infrastructure, but these might become even bigger players for data pipelining, data tool chains and ETL: dbt, Panoply, Airflow, Matillion, Dataform and Alteryx. All are vendor agnostics, some also great for connecting different tools, platftorms, OS and some are also great tools for data analytics. Exclusivity will be bought by developing fast drivers, API, connections between different data silos.

Following the expansion of data engineering teams, tasks and operations, people will become more mindful about Data strategy; term that will become more and more used. It is broadly used for describing strong data management vision, prioritising, aligning data with data analytic activities with key organisational priorities. With goals as: concepts and standards, collaboration, reuse, improved accuracy, access and sharing in mind. This will be driven – especially in Europe – throughout many of the organisations due to data growth and aligning with data teams to organisational goals.

Data Analytics have been reshaped to some extent in 2020 due to changing workplace, customer experience and faster digitalisation of daily life. Graph analytics will gain further traction due to pandemic causes, cybersecurity and need for tracking activities. Real-time dashboards and data visualisation will play further role in information segment of feeding consumers correct and non-biased information, as well as story telling will further gain popularity, due to changes in daily life of every individual. All will contribute to understanding basics on what is going on, making basic business decisions and understanding underlying concepts of why changes have happened. Many aspect of data analytics will play key role to dramatic changes and impact of pandemic and related events. Therefore we can also expect more logs being generated, kept for longer period of time and opening up many new opportunities.

Machine Learning (AI) will continue to rise in mid-size to large organisations. And will continue to decline in small organisations. Data scientists will continue to hunger for meaningful training datasets. They will fed their ML Algorithms to understand predictions, changes over time and results to cloud based services or SaaS applications. Giving more compute power will also create more pressure for data scientists to capture and ingest single change. Encapsulated environments will further drive the expansion among data science. Platforms as Databricks will grow in popularity, usability and will help DataOps ecosystem in large enterprises, making data more actionable for data science.

CI/CD and MLOps will continue to bloom and should gain even more traction in 2021. Year 2020 was the explosion year, offering many tools to data scientists, with the explosion of many startups and many offerings, there might be some consolidation and only few (frontrunners) vendors will remain. More focus will be put in developing solutions that require more and more effort due to rapid data changes, bringing build/deploy prediction model to higher frequency. This will also make the testing more difficult and version control more complex.

Natural language processing will see even further growth in 2021, mostly to digitalisation of many of the daily processes and storing many of the conversations. Also health industry (as other industries) will have a huge gain in NLP.

Machine learning will get further commoditised, and many of the cloud services and cloud platforms are offering ML out of the box. On the other hand, the need for white box (in comparison to black box ML algorithms) will be available in many of the platforms, from interpretability, explainability to fairness and many more.

Cloud technologies will have several players that will advocate new standards. Snowflake will become number top 3 in field of Data warehousing, bringing new concepts of datawarehouse to cloud. Decoupling compute from storage, making it cross-platform and cross-language available, ingesting any type of data, anywhere will bring closer cloud and into everyday use to ,big organizations. Cloud will be even more used in 2021 due to changes in workplace and how we make work, so additional services for making work easier, to collaborate better, exchange work will bring a lot of fundings from investors and many of smaller start-ups will flourish.

Live recordings of work in bigger companies will drive appetite in this direction with the help of cloud storage and services. Fog computing (in respect to edge computing) will be the buzz-word of the year with the companies that deal with IoT or organisations adopting IoT.

Everything as a code will revolutionise “as Code” concept in 2021, making it bigger part of DevOps teams

Languages and Roles will also change in 2021. Bringing new data roles as: Cloud data Prep, Analytics Engineer, Data Trustee, Data-Lake engineers, and mesh-up roles as DataOps Engineer will appear further more in large organisations. Data team will start aligning their methodologies to core software development for better data understanding, better data services to other data-orientated teams.

Data-Ops practices will become part Data Team, Data Engineers and in 2022 or later, of almost every team, because fast growing business needs will be tailoring new business use cases and cloud technologies will be pushing the data literacy further. In 2021, having knowledge in Python, R, Scala, Julia, PowerShell, Spark, or Machine Learning will not be an advantage anymore, but more a prerequisite for any data-orientated position.

Many of roles that have emerged in 2019,2020 will be further stabilised and will have a continuative growth.

R and Python, alongside with Scala, Julia, will remain and hold even a stronger position in data science. But the necessity of general comprehension of SQL, JavaScript, Bash/PowerShell, Java, C++ will become even bigger.

Spark will the key language for 2021, when we will talk about data science and infrastructure, alongside Presto and others. Investing in Spark in 2021 will pay-off.

Data Governance will become much bigger focus in 2021 as it has been in past 10 years. With the surge of data teams, data-ops and data officers, the need for catalogues, definitions and business rules will be corner stones to data trust. Having trustworthy data will speed up many of the later data ingestion, preparation or data analysis processes and thus making data much more agile and operationalised to business needs. Governance will almost be a key component between smart data cleaning, better ETL/data chaining/data processing operations, making and helping a stronger data management vision and building strong business cases on top.

Feel free to comment, post your views, agree, disagree, and debate. πŸ™‚ I know we are bad at giving such predictions, but it is always nice to share the vision and have a contra-argument for incentive and further thinking.

As always, Stay Healthy and happy coding!

Tagged with: , , , , , , , , ,
Posted in thoughts, Uncategorized
5 comments on “My predictions for 2021 – Data and analytics
  1. […] by data_admin [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]


  2. Opac says:

    “Having knowledge in Python, R, Scala, Julia, PowerShell, Spark, or Machine Learning will not be an advantage anymore, but more a prerequisite for any data-orientated position.” Are we back to unicorning? At our organization we are having trouble holding on to even the most basic data talent. I predict that data science will remain a seller’s market in 2021.


    • tomaztsql says:

      Bigger organisation will continue to flourish in realms of data science, question is about small to mid-sized. As of unicorn positions; based on the speed of changing technologies, job postings and the mesh of knowledge (e.g.: start working with cloud technologies) will set the bar high, as what we have seen over 15 years.
      And I am not saying this is correct, but trend is clearly going in this direction.


  3. […] article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]


  4. […] article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Follow TomazTsql on
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (


Top SQL Server Bloggers 2018

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond


A daily selection of the best content published on WordPress, collected for you by humans who love to read.


Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond


Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond


Bringing meaning to data & insights through experiences users love


R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Server Insane Asylum (A Blog by Pat Wright)

Information about SQL Server from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...

SQLPam's Blog

Life changes fast and this is where I occasionally take time to ponder what I have learned and experienced. A lot of focus will be on SQL and the SQL community – but life varies.

%d bloggers like this: