We will see different ways for inserting data into a Hive table.
We have a table Employee in Hive with the following schema:-
0: jdbc:hive2://localhost:10000> desc employee;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| id | bigint | |
| name | string | |
| age | int | |
| salary | bigint | |
+-----------+------------+----------+--+
1. Load data from File/Directory
We can load data into a Hive table directly from a file OR from a directory(all the files in the directory will be loaded in the Hive table).
Input File :-
On our HDFS, we have a file ‘/home/hadoop/employee.csv‘ containing the following data. We will load this data in our Employee table :-
1,rajesh,28,30000 2,rahul,22,20000 3,mahesh,25,10000
Load Data :-
#To overwrite the data in the table use - LOAD DATA INPATH '/home/hadoop/employee.csv' OVERWRITE INTO TABLE Employee; #To append the data in the table use - LOAD DATA INPATH '/home/hadoop/employee.csv' INTO TABLE Employee;
After loading of data is successful, the file ‘/home/hadoop/employee.csv’ will get deleted.
2. Insert data into Hive tables from queries
We can load result of a query into a Hive table. To insert data into the table Employee using a select query on another table Employee_old use the following:-
#Overwrite data from result of a select query into the table INSERT OVERWRITE TABLE Employee SELECT id, name, age, salary from Employee_old; #Append data from result of a select query into the table INSERT INTO TABLE Employee SELECT id, name, age, salary from Employee_old;
3. Directly insert values
We can directly insert rows into a Hive table. In this case Hive actually dumps the rows into a temporary file and then loads that file into the Hive table.
#Insert a single row INSERT INTO table Employee values(50000, 'Rakesh', 28, 57000); #Insert Multiple rows INSERT INTO table Employee values(60001, 'Sudip', 34, 62000),(70001, 'Suresh', 45, 76000);