Configuring Hive with Hadoop: A Step-by-Step Guide
Configuring Hadoop core-site.xml for Hive integration
Apache Hive is a powerful data warehousing tool built on top of Hadoop that allows SQL-like querying of big data. Proper configuration of Hive with Hadoop is essential for optimal performance and seamless data processing. In this guide, we'll cover the steps to successfully configure Hive with Hadoop.
Open the .bashrc
file and add the following environment variables:
nano ~/.bashrc
Add these lines at the end of the file:
export HADOOP_HOME=/path/to/hadoop
export HIVE_HOME=/path/to/hive
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
Save the file by pressing CTRL+O
and exit with CTRL+X
. Then, update the environment:
source ~/.bashrc
Navigate to the Hadoop configuration directory:
nano $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following configuration:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Navigate to the Hive configuration directory:
nano $HIVE_HOME/conf/hive-site.xml
Add the following settings:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=metastore_db;create=true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
</configuration>
Format the Hadoop namenode (only the first time):
hdfs namenode -format
Start the Hadoop services:
start-dfs.sh
start-yarn.sh
Create the Hive warehouse directory in HDFS:
hadoop fs -mkdir /user/hive/warehouse
hadoop fs -chmod g+w /user/hive/warehouse
If you're using Derby as the default database, initialize the schema:
$HIVE_HOME/bin/schematool -dbType derby -initSchema
Start HiveServer2:
$HIVE_HOME/bin/hiveserver2
Open another terminal and connect with Beeline:
$HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -n hive
Run the following query to check the available databases:
show databases;
You have successfully configured Apache Hive with Hadoop. Now you can efficiently perform SQL queries on large datasets. For more tutorials and advanced Hive concepts, visit orientalguru.co.in!