Apache Spark began at UC Berkeley in 2009 as the Spark research project, Licence umder forApache License 2.0Apache Spark first published the following year in a paper entitled “Spark: Cluster Computing with Working Sets” by Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, and Ion Stoica of the UC Berkeley AMPlab. At the time, Hadoop MapReduce was the dominant parallel programming engine for clusters, being the first open source system to tackle data-parallel processing on clusters of thousands of nodesApache Spark is a lightning-fast cluster computing framework designed for real-time processing. Spark is an open-source project from Apache Software Foundation. Spark overcomes the limitations of Hadoop MapReduce, and it extends the MapReduce model to be efficiently used for data processingTo conclude, Spark is an extremely versatile Big Data platform with features that are built to impress. Since it an open-source framework, it is continuously improving and evolving, with new features and functionalities being added to it. As the applications of Big Data become more diverse and expansive, so will the use cases of Apache Spark.