apache spark installation guide
Apache Spark installation steps - Verifying Java and Scala setup
Updated: May 20, 2025 by Your Name or Brand Name
Apache Spark is a powerful open-source framework for distributed data processing, widely used for big data analytics and machine learning. Whether you're working on Windows or Ubuntu, installing Spark is a straightforward process if you follow the right steps. This guide will walk you through the installation process for both operating systems, including prerequisites, installation steps, and verification.
Before installing Apache Spark, ensure your system meets the following requirements:
.tar
files.Apache Spark requires Java 8 or later. To verify if Java is installed on your system:
Open Command Prompt or PowerShell and run:
java -version
If Java is installed, you'll see output like:
java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)
If Java is not installed, download it from the official Java website and install it.
Run the following command in the terminal:
java -version
If Java is not installed, install it using:
sudo apt install default-jdk -y
Scala is another prerequisite for Apache Spark. To check if Scala is installed:
Run the following command in Command Prompt or PowerShell:
scala -version
If Scala is installed, you'll see output like:
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL
If Scala is not installed, proceed to the next step.
Run:
scala -version
If Scala is not installed, install it using:
sudo apt install scala -y
If Scala is not installed, download the latest version from the official Scala website. For this tutorial, we’ll use Scala 2.11.6.
.tar
file.tar xvf scala-2.11.6.tgz
C:\Scala
.set PATH=%PATH%;C:\Scala\bin
wget http://www.scala-lang.org/files/archive/scala-2.11.6.tgz
.tar
file:
tar xvf scala-2.11.6.tgz
/usr/local/scala
:
sudo mv scala-2.11.6 /usr/local/scala
echo "export PATH=$PATH:/usr/local/scala/bin" >> ~/.bashrc
source ~/.bashrc
Download the latest version of Apache Spark from the official Spark website. For this tutorial, we’ll use Spark 3.5.3 with Hadoop 3.
.tgz
file.wget https://dlcdn.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz
wget https://downloads.apache.org/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3.tgz.sha512
shasum -a 512 -c spark-3.5.3-bin-hadoop3.tgz.sha512
.tgz
file:
tar xvf spark-3.5.3-bin-hadoop3.tgz
C:\Spark
.set PATH=%PATH%;C:\Spark\bin
/opt/spark
:
sudo mv spark-3.5.3-bin-hadoop3 /opt/spark
echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export PATH=$PATH:$SPARK_HOME/bin" >> ~/.bashrc
source ~/.bashrc
Open Command Prompt or PowerShell and run:
spark-shell
If Spark is installed correctly, you’ll see the Spark logo and the Scala shell prompt.
Run:
spark-shell
You should see the Spark logo and the Scala shell prompt.
start-master.sh
start-worker.sh spark://:7077
By following this guide, you’ve successfully installed Apache Spark on Windows and Ubuntu. You’re now ready to start building and running Spark applications for big data processing. For advanced tasks, explore configuring multi-node clusters or working with Spark DataFrames.
Apache Spark is a distributed computing framework for big data processing and analytics.
Yes, Spark can run in standalone mode without Hadoop.
Run spark-shell
and check for the Spark logo and Scala prompt.
Java, Scala, and a compatible operating system (Windows or Ubuntu).
Use start-master.sh
and start-worker.sh
commands.
By following this guide, you’ll be able to install and set up Apache Spark on both Windows and Ubuntu efficiently. Happy coding! 🚀