Advent of 2021, Day 5 – Setting up Spark Cluster

Series of Apache Spark posts:

We have explore the Spark architecture and look into the differences between local and cluster mode.

So, if you navigate to your local installation of Apache-Spark (/usr/local/Cellar/apache-spark/3.2.0/bin) you can run Spark in R, Python, Scala with following commands.

For Scala

spark-shell --master local

Python

pyspark --master local

and R

sparkR --master local

and your WEB UI will change the application language accordingly.

SparkR application UI

Spark can run both by itself, or over several existing cluster managers. It currently provides several options for deployment. If you decide to use Hadoop and YARN, there is usually the installation needed to install everything on nodes. Installing Java, JavaJDK, Hadoop and setting all the needed configuration. This installation is preferred when installing several nodes. A good example and explanation is available here. you will also be installing HDFS that comes with Hadoop.

Spark Standalone Mode

Besides running Hadoop YARN, Kubernetes or Mesos, this is the simplest way to deploy Spark application on private cluster.

In local mode, WEB UI would be available at: http://localhost:4040, the standalone mode is available at http://localhost:8080.

Installing Spark Standalone mode is made simple. You copy the complied version of Spark on each node on the cluster.

Starting a cluster manually, navigate to folder: /usr/local/Cellar/apache-spark/3.2.0/libexec/sbin and run

start-master.sh 
bash start-master.sh

Once started, go to URL on a master’s web UI: http://localhost:8080.

We can add now a worker by calling this command:

start-worker.sh spark://tomazs-MacBook-Air.local:7077

and the message in CLI will return:

Refresh the Spark master’s Web UI and check the worker node:

Connecting and running application

To run the application on Spark cluster, use the spark://tomazs-MacBook-Air.local:7077 URL of the master with SparkContext constructor.

Or simply run the following command (in the folder: /usr/local/Cellar/apache-spark/3.2.0/bin) and run

spark-shell --master spark://tomazs-MacBook-Air.local:7077

With spark-submit command we can run the application with Spark Standard cluster with cluster deploy mode. Navigate to /usr/local/Cellar/apache-spark/3.2.0/bin and execute:

spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://tomazs-MacBook-Air.local:7077\
  --executor-memory 20G \
  --total-executor-cores 100 \
python1hello.py 

With Python script as simple as:

x = 1
if x == 1:
    print("Hello, x = 1.")

Tomorrow we will look into IDE and start working with the code.

Compete set of code, documents, notebooks, and all of the materials will be available at the Github repository: https://github.com/tomaztk/Spark-for-data-engineers

Happy Spark Advent of 2021! 🙂

Tagged with: , , , , , , ,
Posted in Spark, Uncategorized
23 comments on “Advent of 2021, Day 5 – Setting up Spark Cluster
  1. […] by data_admin [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]

    Like

  2. […] Dec 05: Setting up Spark Cluster […]

    Like

  3. […] Dec 05: Setting up Spark Cluster […]

    Like

  4. […] Dec 05: Setting up Spark Cluster […]

    Like

Leave a comment

Follow TomazTsql on WordPress.com
Programs I Use: SQL Search
Programs I Use: R Studio
Programs I Use: Plan Explorer
Rdeči Noski – Charity

Rdeči noski

100% of donations made here go to charity, no deductions, no fees. For CLOWNDOCTORS - encouraging more joy and happiness to children staying in hospitals (http://www.rednoses.eu/red-noses-organisations/slovenia/)

€2.00

Top SQL Server Bloggers 2018
TomazTsql

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Discover WordPress

A daily selection of the best content published on WordPress, collected for you by humans who love to read.

Revolutions

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

tenbulls.co.uk

tenbulls.co.uk - attaining enlightenment with the Microsoft Data and Cloud Platforms with a sprinkling of Open Source and supporting technologies!

SQL DBA with A Beard

He's a SQL DBA and he has a beard

Reeves Smith's SQL & BI Blog

A blog about SQL Server and the Microsoft Business Intelligence stack with some random Non-Microsoft tools thrown in for good measure.

SQL Server

for Application Developers

Business Analytics 3.0

Data Driven Business Models

SQL Database Engine Blog

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

Search Msdn

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

Tomaz doing BI and DEV with SQL Server and R, Python, Power BI, Azure and beyond

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Data Until I Die!

Data for Life :)

Paul Turley's SQL Server BI Blog

sharing my experiences with the Microsoft data platform, SQL Server BI, Data Modeling, SSAS Design, Power Pivot, Power BI, SSRS Advanced Design, Power BI, Dashboards & Visualization since 2009

Grant Fritchey

Intimidating Databases and Code

Madhivanan's SQL blog

A modern business theme

Alessandro Alpi's Blog

DevOps could be the disease you die with, but don’t die of.

Paul te Braak

Business Intelligence Blog

Sql Insane Asylum (A Blog by Pat Wright)

Information about SQL (PostgreSQL & SQL Server) from the Asylum.

Gareth's Blog

A blog about Life, SQL & Everything ...