Spark architecture tutorial

Updated:06/06/2021 by Computer Hope

The Architecture of a Spark ApplicationSpark supports multiple widely-used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and Spark runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or an incredibly large scale.

Spark Architecture Overview

The Spark driver

The behavior of the spark job depends on the “driver” component and here, the”driver” component of spark job will run on the machine from which job is submitted. Hence, this spark mode is basically called “client mode”.
You will be able to see the detailed logs in the terminal.
You will be able to terminate the job from the terminal itself.

The cluster manager

The Spark Driver and Executors do not exist in a void, and this is where the cluster manager comes in. The cluster manager is responsible for maintaining a cluster of machines that will run your Spark Application(s). Somewhat confusingly, a cluster manager will have its own “driver” (sometimes called master) and “worker” abstractions.

The Spark executors

Spark executors are the processes that perform the tasks assigned by the Spark driver. Executors have one core responsibility: take the tasks assigned by the driver, run them, and report back their state (success or failure) and results. Each Spark Application has its own separate executor processes.

The Spark executors

An execution mode gives you the power to determine where the aforementioned resources are physically located when you go running your application. You have three modes to choose from:
  • Cluster mode
  • Client mode
  • Local mode