Advent of 2020, Day 10 – Using Azure Databricks Notebooks with SQL for Data engineering tasks

Series of Azure Databricks posts:

Yesterday we were working our way to get data from Azure Storage to Azure Databricks using dbfs mount function and sorted credentials (secret, key).

Log into Azure Databricks portal and create a new Notebook (day 7 explains how to create Notebook). In this notebook, we will use SQL to explore data engineering tasks. I have given my a name Day10_SQL_EngineeringTasks and selecting default Language Python. Attach a cluster to Notebook.

And here is the Notebook, and how it looks:

  1. Exploring Databases with SHOW

SHOW is a useful clause to see that your database holds.

%sql
SHOW TABLES;
%sql 
SHOW TABLES FROM default;
%sql 
SHOW TABLES IN default LIKE 'day6*'

2, Creating database and getting information with DESCRIBE

Creating a database is simple, by defining the location and adding some information.

%sql 
CREATE DATABASE IF NOT EXISTS Day10 COMMENT 'This is a sample database for day10' LOCATION '/user';

Getting some additional information can be done with DESCRIBE clause.

%sql
DESCRIBE DATABASE EXTENDED Day10;

3. Creating tables and connecting it with CSV

For the underlying CSV we will create a table. We will be using CSV file from Day 6, and it should be still available on location dbfs:/FileStore/Day6_data_dbfs.csv. This dataset has three columns (Date, Temperature and City) and it should be good starting example.

%sql
USE Day10;

DROP TABLE IF EXISTS temperature;
CREATE TABLE temperature (date STRING, mean_daily_temp STRING, city STRING)

And we can check the content of the table and the database:

%sql
USE Day10;

SELECT * FROM temperature
%sql
SHOW TABLES IN Day10;

And now connect CSV with the table (or view):

%sql
USE Day10;

DROP VIEW IF EXISTS temp_view2;
CREATE TEMPORARY VIEW temp_view2
USING CSV
OPTIONS (path "/FileStore/Day6Data_dbfs.csv", header "true", mode "FAILFAST")

And check the content:

%sql
USE Day10;
SELECT * FROM temp_view2

IF you would want to change the data type of a particular column, you can also do it as:

%sql
USE Day10;

ALTER TABLE temperature CHANGE COLUMN mean_daily_temp INT

4. Creating a JOIN between two tables

Let’s create two sample tables :

%sql
USE Day10;

DROP TABLE IF EXISTS temp1;
DROP TABLE IF EXISTS temp2;


CREATE TABLE temp1 (id_t1 INT, name STRING, temperature INT);
CREATE TABLE temp2 (id_t2 INT, name STRING, temperature INT);

And add some insert statements:

%sql
USE Day10;

INSERT INTO temp1 VALUES    (2, 'Ljubljana', 1);
INSERT INTO temp1 VALUES    (3, 'Seattle', 5);
INSERT INTO temp2 VALUES    (1, 'Ljubljana', -3);
INSERT INTO temp2 VALUES    (2, 'Seattle`', 3);

And create an inner join

%sql
USE Day10;

SELECT 
t1.Name as City1
,t2.Name AS City2
,t1.temperature*t2.Temperature AS MultipliedTemperature

FROM temp1 AS t1
JOIN temp2 AS t2
ON t1.id_t1 = t2.id_t2
WHERE 
t1.name <> t2.name
LIMIT 1

If you follow the notebook, you will find some additional information, but all in all, the HIVE SQL is ANSI compliant and getting started, should be no problem. When using notebook, each cell must have a language defined at the beginning, unless it is a language of kernel. %sql for SQL language, %md for Markdown, %r for R language, %scala for Scala. Beware, these language pointers are case sensitive, so %sql will interpret as SQL script, where as %SQL will return an error.

Tomorrow we will check and explore how to use R to do data engineering, but mostly the data analysis tasks. So, stay tuned.

Complete set of code and Notebooks will be available at the Github repository.

Happy Coding and Stay Healthy!

Tagged with: , , , , , , , , ,
Posted in Azure Databricks, Uncategorized
23 comments on “Advent of 2020, Day 10 – Using Azure Databricks Notebooks with SQL for Data engineering tasks
  1. […] by data_admin [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

    Like

  2. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  3. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  4. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  5. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  6. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  7. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  8. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  9. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  10. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  11. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  12. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  13. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  14. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  15. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  16. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  17. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  18. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  19. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  20. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  21. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

  22. […] Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks […]

    Like

Leave a comment

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover WordPress

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Insane Asylum (A Blog by Pat Wright)

Information about SQL (PostgreSQL & SQL Server) from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...