Internal (Managed) Tables: Step-by-Step Guide

3/17/2025

Hive metastore configuration for internal tables

Go Back

Internal (Managed) Tables: Step-by-Step Guide

Introduction

Apache Hive is a popular data warehousing solution that simplifies querying and managing large datasets in Hadoop. One of its core features is Internal (Managed) Tables, which are automatically managed by Hive. In this guide, we will walk through everything you need to know about Internal Tables, including their creation, management, and best practices.

Hive metastore configuration for internal tables

What are Internal (Managed) Tables?

Internal Tables, also known as Managed Tables, are tables where Hive fully controls both the metadata and the data. When a managed table is dropped, both the schema and the data are deleted from the storage. These tables are useful when you want Hive to handle all data management.

Key Features of Internal Tables:

  • Data is stored in Hive’s default warehouse directory (/user/hive/warehouse).
  • Dropping the table deletes both metadata and actual data.
  • Ideal for temporary datasets or fully managed Hive workflows.

Step-by-Step Guide to Creating an Internal Table

Step 1: Launch Hive

To begin, open your terminal and start the Hive shell by running:

hive

Step 2: Create an Internal Table

Use the CREATE TABLE statement to define the schema for your Internal Table. For example:

CREATE TABLE employees (
    emp_id INT,
    name STRING,
    age INT,
    department STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';

This creates an Internal Table named employees in Hive’s warehouse directory.

Step 3: Load Data into the Table

To insert data into the table, you can either use the LOAD DATA command or INSERT INTO statements.

Method 1: Load Data from a Local File

LOAD DATA LOCAL INPATH '/home/user/employees.csv' INTO TABLE employees;

This moves the data file into Hive’s warehouse directory.

Method 2: Insert Data Manually

INSERT INTO TABLE employees VALUES (1, 'John Doe', 30, 'IT');
INSERT INTO TABLE employees VALUES (2, 'Jane Smith', 28, 'HR');

Step 4: Query the Table

Once the data is loaded, you can run queries like:

SELECT * FROM employees;

Step 5: Drop the Table (Optional)

If you no longer need the table, you can drop it:

DROP TABLE employees;

Since it is a Managed Table, the data will also be deleted permanently.

Best Practices for Internal Tables

  • Use Internal Tables for datasets that are entirely managed by Hive.
  • Be cautious when dropping tables, as data loss is irreversible.
  • Regularly back up important data if you need to retain it.

Conclusion

Internal (Managed) Tables in Hive are useful for fully managed data storage and automatic cleanup. By following this step-by-step guide, you can efficiently create, manage, and delete Hive Managed Tables for your big data processing needs.

Would you like me to add more details, such as indexing, partitioning, or performance optimization tips? 🚀

Table of content