Apache Hive Installation – Install Hive on Ubuntu in 5 Minutes

Apache Hive is a data warehousing infrastructure tool built on top of Hadoop. This guide will help you install the latest version of Apache Hive (4.0.0) on Ubuntu, configure HiveServer2, and interact with Hive using the Beeline Command shell.

What is Apache Hive?

Apache Hive is a data warehouse system that facilitates data query and analysis using SQL-like syntax. To install Hive, you need to have Java and Hadoop properly installed and functioning on your Linux OS.

Prerequisites

Java installed and configured
Hadoop up and running

Extracting Apache Hive 4.0.0 tar file in Ubuntu terminal

Step-by-step guide to installing Apache Hive 4.0.0 on Ubuntu

1. Download Hive

Step 1: Download the latest Hive 4.0.0 from the official Apache website:

wget https://dlcdn.apache.org/hive/hive-4.0.0/apache-hive-4.0.0-bin.tar.gz

Step 2: Extract the downloaded tar file:

tar -xzf apache-hive-4.0.0-bin.tar.gz

2. Configuring Hive Files

Step 3: Set Hive environment variables in the .bashrc file:

nano ~/.bashrc

Add the following lines:

export HIVE_HOME="/home/ubuntu/apache-hive-4.0.0-bin"
export PATH=$PATH:$HIVE_HOME/bin

Save the file by pressing CTRL+O, then exit with CTRL+X.

Step 4: Load the updated environment variables:

source ~/.bashrc

Step 5: Configure core-site.xml file located in Hadoop’s configuration directory (/etc/hadoop/):

nano /etc/hadoop/core-site.xml

Add the following configuration:

<configuration>
  <property>
    <name>hadoop.proxyuser.hive.groups</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hive.hosts</name>
    <value>*</value>
  </property>
</configuration>

Save and exit.

3. Create Hive Warehouse Directory in HDFS

Step 6: Create Hive directories in HDFS:

hadoop fs -mkdir /tmp
hadoop fs -mkdir /user
hadoop fs -mkdir /user/hive
hadoop fs -mkdir /user/hive/warehouse

Step 7: Grant write permissions:

hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

4. Initialize Derby Database

Step 8: Initialize the Derby database:

$HIVE_HOME/bin/schematool -dbType derby -initSchema

5. Launch HiveServer2

Step 9: Start HiveServer2:

$HIVE_HOME/bin/hiveserver2

Step 10: Open a new terminal and connect using Beeline:

$HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -n hive

6. Verify Hive Installation

Step 11: Run a simple Hive query to list databases:

show databases;

Conclusion

You have successfully installed Apache Hive 4.0.0 on Ubuntu. You can now run SQL queries on your Hadoop data warehouse efficiently. For further Hive tutorials and advanced query operations, stay tuned to orientalguru.co.in!

Table of content

Introduction to Apache Hive
- Hive Introduction
Hive Architecture and Components
Hive Modes
- Local Mode
- Distributed Mode
Installation and Setup
Working with Hive Tables
HiveQL Basics
Advanced Hive Concepts
- Partition Pruning
- Dynamic Partitioning
- Query Optimization in Hive
- Working with Hive Indexes
- ACID Transactions in Hive
File Formats in Hive
- Text File
- ORC (Optimized Row Columnar)
- Parquet
- Avro
- Sequence File
Hive Functions
- Built-in Functions (String, Date, Math)
- Aggregate Functions
- User-Defined Functions (UDFs)
Integrating Hive with Other Tools
- Hive and Apache Spark
- Hive and Pig
- Hive and HBase
Hive Interview Questions
- Hive Questions
Best Practices in Hive
- Performance Optimization
- Handling Large Datasets
- Security and Access Control
FAQs and Common Errors in Hive
- Troubleshooting Hive Issues
- Frequently Asked Questions
Resources and References
- Official Hive Documentation
- Recommended Books and Tutorials