Advanced Apache Hadoop commands

Updated:01/06/2022 by Computer Hope

A number of commands in the File System Shell directly communicate with the HDFS and other file systems that it supports, including the S3 File System and the HFTP (Hadoop File System Implementation) File System.

Advanced Hadoop commands are typically used for more specialized or administrative tasks in a Hadoop cluster. Here are some advanced Hadoop commands and operations:

Hadoop advanced commands

In the Hadoop ecosystem, basic Hadoop commands are frequently utilised for routine file and job operations, especially when working with the Hadoop Distributed File System (HDFS) and Hadoop MapReduce. These are a few fundamental Hadoop commands:

1)Hadoop jar command

Here we are runing MapReduce jobs, we can specify additional parameters like hdfs input path , output path etc

hadoop jar myjob.jar -Dmapred.map.tasks=5 input_path output_path

2)Hadoop job history

Here we are monitoring detailed history information about a completed job.

hadoop job -history job_id

3)Hadoop conf

Here we are monitoring detailed and modify Hadoop configuration parameters..

hadoop conf
hadoop conf -get fs.defaultFS
hadoop conf -set fs.defaultFS hdfs://localhost:9000

4)hdfs dfsadmin -report

here we monitor report on the status of the HDFS cluster, including information about datanodes and blocks.

hdfs dfsadmin -report

5)hdfs dfsadmin -refreshNodes

Refresh the list of datanodes in the Hadoop cluster.

hdfs dfsadmin -refreshNodes

6)hadoop dfs -chown

Change the owner of files or directories in HDFS.

hdfs dfs -chown user:group /path/to/file

7)hadoop dfs -chmod

change and apply permission.

hdfs dfs -chmod 755 /path/to/file

8)Hadoop safemode

Safe mode is a maintenance state where HDFS is read-only and does not replicate or delete blocks.

hdfs dfsadmin -safemode get
hdfs dfsadmin -safemode enter
hdfs dfsadmin -safemode leave

Hadoop basic commands

In the Hadoop ecosystem, basic Hadoop commands are frequently utilised for routine file and job operations, especially when working with the Hadoop Distributed File System (HDFS) and Hadoop MapReduce. These are a few fundamental Hadoop commands:

mkdir :

This is no different from the UNIX mkdir command and is used to create a directory on an HDFS environment.
hadoop fs -mkdir /user/hadoop/

ls:

This is no different from the UNIX ls command and it is used for listing the directories present under a specific directory in an HDFS system. The –lsr command may be used for the recursive listing of the directories and files under a specific folder.
hadoop fs -ls /user/hadoop/

Put:

This command is used to copy files from the local file system to the HDFS filesystem. This command is similar to –copyFromLocal command. This command will not work if the file already exists unless the –f flag is given to the command. This overwrites the destination if the file already exists before the copy
hadoop fs -put sample.txt /user/data/

Cat:

This command is similar to the UNIX cat command and is used for displaying the contents of a file on the console.
hadoop fs -cat /user/data/developerTxt.txt

Get:

This command is used to copy files from HDFS file system to the local file system, just the opposite to put command.
hadoop fs -get /develop/sample.txt /user/data/

hadoop setrep

Set the replication factor for a file or directory.
hdfs dfs -setrep -w 3 /path/to/file

fsck command in hadoop

Hadoop fsck is used to check the health of the file system, to find missing files, over replicated, under replicated and corrupted blocks. Command for finding the block for a file:
$ hdfs fsck /

rm:

This command is similar to the UNIX rm command, and it is used for removing a file from the HDFS file system. The command –rm r can be used to delete files recursively.
hadoop fs -rm [-f] [-r|-R] PATH

Conclusion :

In this article ,we see advance hadoop command is a fundamental building block of the Hadoop ecosystem. here we learn about use case and advance hadoop hdfs command. check next article for first example for Map and Reduce.