Creating External Hive table and importing data

We will see how to create an external table in Hive and how to import data into the table.

External Table

External tables in Hive do not store data for the table in the hive warehouse directory. External table in Hive stores only the metadata about the table in the Hive metastore. Any directory on HDFS can be pointed to as the table data while creating the external table. All files inside the directory will be treated as table data.

When external table is deleted, only the table metadata from the hive metastore is deleted. The directory containing the data remains intact.

External tables allows a user to manage data outside of hive.

Data on HDFS

We have a directory ‘/home/hadoop/employee_data/‘ containing a data file employees_1.csv. We will treat this directory as location for external table Employee.

Contents of employees_1.csv:-

1,aarti,28,30000
2,sakshi,22,20000
3,mahesh,25,25000

Create Table

We will create an external table Employee by using the directory ‘/home/hadoop/employee_data/‘ on HDFS as data location. We need to use EXTERNAL keyword while creating an external table :-

CREATE EXTERNAL TABLE Employee(
ID BIGINT,
NAME STRING, 
AGE INT,
SALARY BIGINT 
)
COMMENT 'This is External table Employee using textfile format'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE location '/home/hadoop/employee_data';

Verify Data

Once the table is created, we can see that the data from the files in the HDFS directory can be queried as Employee table’s data.

select * from Employee;
+--------------+----------------+---------------+------------------+--+
| employee.id  | employee.name  | employee.age  | employee.salary  |
+--------------+----------------+---------------+------------------+--+
| 1            | aarti          | 28            | 30000            |
| 2            | sakshi         | 22            | 20000            |
| 3            | mahesh         | 25            | 25000            |
+--------------+----------------+---------------+------------------+--+

Add Data to the Table by Dumping Files into HDFS Directory

We have a file ‘/home/hadoop/employees_2.csv’ on HDFS. Now let us copy this file in the ‘/home/hadoop/employee_data/‘ directory.

Contents of employees_2.csv :-

10001,rajesh,29,50000
10002,rahul,23,25000
10003,dinesh,35,70000

Lets see if the data from newly dumped file is visible as Employee table’s data.

select * from Employee;
+--------------+----------------+---------------+------------------+--+
| employee.id  | employee.name  | employee.age  | employee.salary  |
+--------------+----------------+---------------+------------------+--+
| 1            | aarti          | 28            | 30000            |
| 2            | sakshi         | 22            | 20000            |
| 3            | mahesh         | 25            | 25000            |
| 10001        | rajesh         | 29            | 50000            |
| 10002        | rahul          | 23            | 25000            |
| 10003        | dinesh         | 35            | 70000            |
+--------------+----------------+---------------+------------------+--+

Add Data to the Table by Using LOAD DATA Query

We can add data to the table by using LOAD Data query. We have a file ‘/home/hadoop/employees_3.csv’ on HDFS. We can directly copy this file into ‘/home/hadoop/employee_data/‘  directory or we can use LOAD DATA query.

Contents of employees_3.csv :-

20001,john,38,36000
20002,sam,32,28000
20003,joe,35,29000

Load Data :-

LOAD DATA INPATH '/home/hadoop/employees_3.csv' INTO TABLE Employee;

We can see that the file employees_3.csv gets copied in the ‘/home/hadoop/employee_data/‘  directory.

Verify Data :-

select * from Employee;
+--------------+----------------+---------------+------------------+--+
| employee.id  | employee.name  | employee.age  | employee.salary  |
+--------------+----------------+---------------+------------------+--+
| 1            | aarti          | 28            | 30000            |
| 2            | sakshi         | 22            | 20000            |
| 3            | mahesh         | 25            | 25000            |
| 10001        | rajesh         | 29            | 50000            |
| 10002        | rahul          | 23            | 250000           |
| 10003        | dinesh         | 35            | 70000            |
| 20001        | john           | 38            | 36000            |
| 20002        | sam            | 32            | 28000            |
| 20003        | joe            | 35            | 29000            |
+--------------+----------------+---------------+------------------+--+

Drop Table

Dropping the External table does not delete data from the ‘/home/hadoop/employee_data/‘ directory. All the data inside this directory remains intact.

drop table Employee;

 

Leave a Reply

Your email address will not be published. Required fields are marked *