R or Python? Python or R? The ongoing debate.

On every SQL community event, where there could be a cluster of sessions dedicated to BI or analytics, I would have people asking me, “which one would you recommend?” or “which one I  prefer?”

2018-01-28 15_15_13-Edit Post ‹ TomazTsql — WordPress.com

So, questions about recommendation and preferences are in my opinion the hardest one. And not that I would know my preferences but because you are inadvertently creating someone’s taste or preferences by imposing yours. And expressing taste through someone else taste is even harder.

My initial reaction is a counter-question, why are you asking this? Simply because my curiosity goes beyond the question of preferring A over B, respectively.  And most of the time, the answer I get is, “because everyone is asking this question” or “because someone said this and the other said that”. In none of the cases, and I mean literally in none, I got the response (the one I would love to get) back like “we are running this algorithm and there are issues…” or “this library suits us better…”. So the community is mainly focused on asking themselves which one is better, instead of asking, can R / Python do the job. And I can assure you, that both can do the job! Period.

Image I ask you, would you prefer Apple iPhone over Samsung Galaxy, respectively? Or if I would ask you, would you prefer BMW over Audi, respectively? In all the cases, both phones or both cars will get the job done. So will Python or R, R or Python. So instead of asking which one I prefer, ask your self, which one suits my environment better? If your background is more statistics and less programming, take R, if you are more into programming and less into statistics, take Python; in both cases you will have faster time to accomplish results with your preferred language. If you ask me, can I do gradient boosting or ANOVA or MDS in Python or in R, the answer will be yes, you can do both in any of the languages.

Important questions are therefore the one that will give you fast results, easier adaptation and adoption, will give a better fit into your environment and will have less impact on your daily tasks.

Some might say, R is a child’s play language, while Python is a real programming language. Or some might say, Python is so complex and you have to program everything, whereas in R, everything is ready. And so on and on. All these allegations have some truth, but to fully understand them, I guess one needs to understand the background of the people saying this.  Obviously, Python in comparison to R is more general purpose scripting and programming language, therefore the number of packages is 10x higher, when compared to R. And both come with variety of different packages, giving users a specific functions, classes and procedures to execute their results. R on the other hand has had it’s moment in past couple of years and the community grew rapidly, whereas Python community is in it’s steady phase.

When you are deciding which one to select, here are some questions to be answered:

  • how big my corporate environment and how many end users will I have
  • who is the end user and how will the end user handle the results
  • what is current general knowledge with the language
  • which statistical and predictive algorithms will the company be using
  • would there be a need to parallel and distributed on-prem computations
  • if needed, do we need to connect (or copy/paste) the code to the cloud
  • how fast can the company adopt the language and the amount of effort needed
  • which language would fit easier with existing BI stack and visualization tools
  • how is your data centralized and silosd and which data sources are you using
  • governance and providence issues
  • installation, distribution of the core engine and packages
  • selection and the costs of IDE and GUI
  • corporate support and SLA
  • possibility to connect to different data sources
  • released dates of the most useful packages
  • community support
  • third party tools and additional programs for easier usage of the language
  • total cost of using the language once completely in place
  • asses the risk of using an GNU/open source software

After answering these questions, I implore you to do the stress and load tests against your datasets and databases to see, what perform better.

All in all, both languages, when doing statistical and predictive analysis, also have couple of annoyances that should also be addressed:

  • memory limitations (unless spilling to disk)
  • language specifics (e.g.: R is case-sensitive, Python is indent-sensitive and both will annoy you)
  • parallel and distributed computations (CPU utilization, multi-threading)
  • multi-OS running environment
  • cost of GUI/IDE
  • engine and package dependencies and versioning
  • and others

So next time, when you ask yourself or overhear the conversation in the community, which one is better (bigger, faster, stable,…), start asking the questions on your needs and effort to adopt it. Otherwise, I always add, learn both. It does not hurt to learn and use both (for at least the statistical and predictive purposes).

All best!

Tagged with: , , ,
Posted in Uncategorized
9 comments on “R or Python? Python or R? The ongoing debate.
  1. […] leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data […]

    Like

  2. […] article was first published on R – TomazTsql, and kindly contributed to […]

    Like

  3. Andrew says:

    Critiques on post after reading:
    * R is becoming two languages: original base R language and RStudio’s “Tidyverse” R language (based on RStudio team and Wickham and all them)
    * R is like a BMW, not an Audi
    * Author uses abbreviations without defining them
    * English is not the writer’s first language

    Like

  4. […] Tomaz Kastrun shares his thoughts on the topic of R versus Python: […]

    Like

  5. Gerald Britton says:

    Why is this even a question? Apples and oranges. R is a domain-specific (stats) language. Python is general purpose. Use either Python or R if you know it and it does what you need for analysis. Use Python (or Java or C# or your-favorite-language) for other stuff.

    Like

  6. […] run across a few pieces that ask the question about where to spend time. There’s a blog on R v Python (I’ve seen quite a few of these) and a thread on deciding if ML skills are something a […]

    Like

  7. davegugg says:

    Great article, thanks for the perspective.

    I would just add that comparing an Audi to a BMW is like comparing a fancy steakhouse to McDonalds.

    Like

  8. Noah Laycock says:

    As someone who has very little experience with both R and Python, I can say that R has been easier to so far. I have taken a semester-long class in Python and i am about a month down in an R class i am currently enrolled in. During my first month of my Python class, I was completely lost and had no ambition to further my knowledge in the language. R, however, has been fun and I’ve been enjoying the simplicity of RStudio. I agree that R has a bigger use with statistics and Python is used for more of a general use, and that is probably why I enjoyed R more, as I am a math major with a statistics minor. I believe the ease of RStudio is one of the major reasons why I enjoy R more than Python.

    Like

  9. […] run across a few pieces that ask the question about where to spend time. There’s a blog on R v Python (I’ve seen quite a few of these) and a thread on deciding if ML skills are something a […]

    Like

Leave a comment

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover WordPress

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Insane Asylum (A Blog by Pat Wright)

Information about SQL (PostgreSQL & SQL Server) from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...