Apache Hive Installation – Install Hive on Ubuntu in terminal
Extracting Apache Hive 4.0.0 tar file in Ubuntu terminal
Apache Hive is a data warehousing infrastructure tool built on top of Hadoop. This guide will help you install the latest version of Apache Hive (4.0.0) on Ubuntu, configure HiveServer2, and interact with Hive using the Beeline Command shell.
Apache Hive is a data warehouse system that facilitates data query and analysis using SQL-like syntax. To install Hive, you need to have Java and Hadoop properly installed and functioning on your Linux OS.
Step 1: Download the latest Hive 4.0.0 from the official Apache website:
wget https://dlcdn.apache.org/hive/hive-4.0.0/apache-hive-4.0.0-bin.tar.gz
Step 2: Extract the downloaded tar file:
tar -xzf apache-hive-4.0.0-bin.tar.gz
Step 3: Set Hive environment variables in the .bashrc
file:
nano ~/.bashrc
Add the following lines:
export HIVE_HOME="/home/ubuntu/apache-hive-4.0.0-bin"
export PATH=$PATH:$HIVE_HOME/bin
Save the file by pressing CTRL+O
, then exit with CTRL+X
.
Step 4: Load the updated environment variables:
source ~/.bashrc
Step 5: Configure core-site.xml file located in Hadoop’s configuration directory (/etc/hadoop/
):
nano /etc/hadoop/core-site.xml
Add the following configuration:
<configuration>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
</configuration>
Save and exit.
Step 6: Create Hive directories in HDFS:
hadoop fs -mkdir /tmp
hadoop fs -mkdir /user
hadoop fs -mkdir /user/hive
hadoop fs -mkdir /user/hive/warehouse
Step 7: Grant write permissions:
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse
Step 8: Initialize the Derby database:
$HIVE_HOME/bin/schematool -dbType derby -initSchema
Step 9: Start HiveServer2:
$HIVE_HOME/bin/hiveserver2
Step 10: Open a new terminal and connect using Beeline:
$HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -n hive
Step 11: Run a simple Hive query to list databases:
show databases;
You have successfully installed Apache Hive 4.0.0 on Ubuntu. You can now run SQL queries on your Hadoop data warehouse efficiently. For further Hive tutorials and advanced query operations, stay tuned to orientalguru.co.in!