Apache Hadoop Component

Updated:01/20/2022 by Computer Hope

This HDFS tutorial designed to be an all in one package to answer all your questions about hadoop Component.

Component of Hadoop

  • HDFS Hadoop Distributed File System
  • MAPReduce

HDFS

HDFS is the Hadoop Distributed File System, which runs on inexpensive commodity hardware. It is the storage layer for Hadoop. The files in HDFS are broken into block-size chunks called data blocks. These blocks are then stored on the slave nodes in the cluster. The block size is 128 MB by default, which we can configure as per our requirements.


MapReduce

It is the data processing layer of Hadoop. It is a software framework for writing applications that process vast amounts of data (terabytes to petabytes in range) in parallel on the cluster of commodity hardware
The MapReduce framework works on thepairs.The MapReduce job is the unit of work the client wants to perform. MapReduce job mainly consists of the input data, the MapReduce program, and the configuration information. Hadoop runs the MapReduce jobs by dividing them into two types of tasks that are map tasks and reduce tasks. The Hadoop YARN scheduled these tasks and are run on the nodes in the cluster


Apache Hadoop Component core components of hadoop

Component of Hadoop

Core components of hadoop

  • JobTracker -- the master node that manages all the jobs and resources in a cluster.
  • TaskTrackers -- agents deployed to each machine in the cluster to run the map and reduce task
  • JobHistory Server -- a component that tracks completed jobs and is typically deployed as a separate function or with JobTracker
Here is concluson of Our Article
  • Storage layer – HDFS
  • Batch processing engine – MapReduce
  • Resource Management Layer – YARN

HDFS ‐

HDFS (Hadoop Distributed File System) is the storage unit of Hadoop. It isresponsible for storing different kinds of data as blocks in a distributed environment. Itfollows master and slave topology.
Components of HDFS are NameNode and DataNode

MapReduce ‐

For processing large data sets in parallel across a hadoop cluster, HadoopMapReduce framework is used. Data analysis uses a two‐step map and reduce process.

YARN ‐

YARN (Yet Another Resource Negotiator not in hadoop 1) is the processing framework in Hadoop 2,which manages resources and provides an execution environment to the processes.Main Components of YARN are Node Manager and Resource Manager.

Conclusion :

Hadoop is an open source software developed by the Apache Software Foundation (ASF). You can download Hadoop directly from the project website at http://hadoop.apache.org. Cloudera is a company that provides support, consulting, and management tools for Hadoop. Cloudera also has a distribution of software called Cloudera’s Distribution Including Apache Hadoop (CDH).
Here in this article , These parts make up the central part of the Hadoop ecosystem, offering a fault-tolerant and scalable large data processing and storing platform. More features and higher-level abstractions are offered by Hive, Pig, and Spark, among other Hadoop ecosystem projects and tools, for various use cases.