Spark RDD Operations-Transformation & Action with Example

Updated:01/20/2021 by Computer Hope

The following table lists some of the common transformations supported by Spark. Refer to the RDD API doc (Scala, Java, Python, R) and pair RDD functions doc (Scala, Java) for details.

Spark RDD Operations-Transformation

  • map(func)
  • flatmap(func)
  • filter(func)
  • mapPartitions(func type Iterator)
  • mapPartitionsWithIndex(func)
  • sample(withReplacement, fraction, seed)
  • union(otherDataset)
  • intersection(otherDataset)
  • distinct()
  • reduceByKey()
  • foldByKey()
  • combineByKey()
  • aggregateByKey()
  • groupByKey()
  • sortByKey()
  • join()
  • cartesian()
  • coalesce()
  • repartition()

Spark RDD Operations-Action

  • collect
  • diff
  • distinct
  • drop
  • dropWhile
  • filter
  • filterNot
  • find
  • foldLeft
  • foldRight
  • head
  • headOption
  • init
  • intersect
  • last
  • lastOption
  • reduceLeft
  • reduceRight
  • remove
  • slice
  • tail
  • take
  • takeWhile
  • union


Below is screenshot of out put of Word Count program in Apache Spark

Unstructured

spark Modes of Deployment – Cluster mode and Client Mode