Spark Tutorial for beginner

Updated:01/20/2021 by Computer Hope

Apache Spark began at UC Berkeley in 2009 as the Spark research project, Licence umder forApache License 2.0Apache Spark first published the following year in a paper entitled “Spark: Cluster Computing with Working Sets” by Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, and Ion Stoica of the UC Berkeley AMPlab. At the time, Hadoop MapReduce was the dominant parallel programming engine for clusters, being the first open source system to tackle data-parallel processing on clusters of thousands of nodes
Apache Spark is a lightning-fast cluster computing framework designed for real-time processing. Spark is an open-source project from Apache Software Foundation. Spark overcomes the limitations of Hadoop MapReduce, and it extends the MapReduce model to be efficiently used for data processing
To conclude, Spark is an extremely versatile Big Data platform with features that are built to impress. Since it an open-source framework, it is continuously improving and evolving, with new features and functionalities being added to it. As the applications of Big Data become more diverse and expansive, so will the use cases of Apache Spark.

Now, let’s look into the architecture of Apache Spark.
Spark Tutorial for beginner

Features of Spark

  • Fault tolerance
  • Dynamic In Nature
  • Lazy Evaluation
  • Real-Time Stream Processing
  • Speed
  • Reusability
  • Advanced Analytics
  • In Memory Computing
  • Supporting Multiple languages
  • Integrated with Hadoop
  • Cost efficient