Some Configuration Properties in Hive
We will see some of the configuration properties available in Hive. Hive Warehouse Directory hive.metastore.warehouse.dir Location of directory on HDFS which will be used for storing the hive warehouse data. Default Value: /user/hive/warehouse 0: jdbc:hive2://localhost:10000> show conf “hive.metastore.warehouse.dir”; +———————–+———+————————————————-+–+ | default | type | desc | +———————–+———+————————————————-+–+ | /user/hive/warehouse |…
Export and Import a Hive Table/Partition
EXPORT We use EXPORT command to export data of a table or partition into a specified output location. The EXPORT command exports the metadata along-with the data at the output location. EXPORT a table :- EXPORT table employee to ‘/home/hadoop/employee’; EXPORT a partition :- EXPORT table employee partition(department=’BIGDATA’) to ‘/home/hadoop/employee_bigdata’;…
What is Hive
Apache Hive is a data warehouse infrastructure for querying, analyzing and summarizing the data stored in Hadoop’s HDFS. It provides an SQL-like language called HiveQL with schema on read and implicitly converts queries to MapReduce, Tez or Spark jobs. Some of the Hive features:- Different storage formats for data in…
Data Storage Formats in Hive
We will see different file formats for storing data into a Hive table. Using a right file format for Hive table will save a lot of disk space as well as will improve performance of Hive queries. TEXTFILE Textfile format stores data as plain text files. Textfile format enables rapid development…
Managed and External Tables in Hive
Hive allows us to create two type of tables. Managed tables External tables Managed Tables Hive manages the table and its data. When a Managed table is deleted, Hive deletes the data from the table as well as the table metadata from the Hive metastore. When we create a Hive…
Partitioning in Hive
Partitioning We can use partitioning feature of Hive to divide a table into different partitions. Each partition of a table is associated with a particular value(s) of partition column(s). Partitioning allows Hive to run queries on a specific set of data in the table based on the value of partition…
Select Query with Joins in Hive
Joins are used in a query to combine data from two or more tables based on the values of some columns. We will see how to write queries using join in Hive. Hive Tables We have the following two tables in Hive. Employee table containing data about Employees:- 0: jdbc:hive2://localhost:10000>…
Select Query with Group by clause in Hive
We will see how to write a Select query using Group by clause in Hive. Hive Table We have a table ‘Employee’ in Hive with the following schema and data. 0: jdbc:hive2://localhost:10000> desc Employee; +————-+————+———-+–+ | col_name | data_type | comment | +————-+————+———-+–+ | id | bigint | | |…
Select Query with Where clause in Hive
We will see how to write simple ‘Select’ queries with Where clause in Hive. Hive Table We have a table ‘Employee’ in Hive with the following schema. 0: jdbc:hive2://localhost:10000> desc Employee; +————-+————+———-+–+ | col_name | data_type | comment | +————-+————+———-+–+ | id | bigint | | | name | string …
Create, Use and Drop a Database in Hive
We will see how to Create, Use and Drop a database in Hive. Create a Database #List all the databases 0: jdbc:hive2://localhost:10000> show databases; +—————-+–+ | database_name | +—————-+–+ | default | +—————-+–+ #Create a new Database 0: jdbc:hive2://localhost:10000> create database mydb; #List all the databases 0: jdbc:hive2://localhost:10000> show databases;…