Bucketing in Hive

Bucketing Bucketing is a method to evenly distributed the data across many files. Create multiple buckets and then place each record into one of the buckets based on some logic mostly some hashing algorithm. Bucketing feature of Hive can be used to distribute/organize the table/partition data into multiple files such…

Continue reading

Some Configuration Properties in Hive

We will see some of the configuration properties available in Hive. Hive Warehouse Directory hive.metastore.warehouse.dir Location of directory on HDFS which will be used for storing the hive warehouse data. Default Value: /user/hive/warehouse 0: jdbc:hive2://localhost:10000> show conf “hive.metastore.warehouse.dir”; +———————–+———+————————————————-+–+ |        default        |  type   |                      desc                       | +———————–+———+————————————————-+–+ | /user/hive/warehouse  |…

Continue reading

Export and Import a Hive Table/Partition

EXPORT We use EXPORT command to export data of a table or partition into a specified output location. The EXPORT command exports the metadata along-with the data at the output location. EXPORT a table :- EXPORT table employee to ‘/home/hadoop/employee’; EXPORT a partition :- EXPORT table employee partition(department=’BIGDATA’) to ‘/home/hadoop/employee_bigdata’;…

Continue reading

What is Hive

Apache Hive is a data warehouse infrastructure for querying, analyzing and summarizing the data stored in Hadoop’s HDFS. It provides an SQL-like language called HiveQL with schema on read and implicitly converts queries to MapReduce, Tez or Spark jobs. Some of the Hive features:- Different storage formats for data in…

Continue reading

Select Query with Group by clause in Hive

We will see how to write a Select query using Group by clause in Hive. Hive Table We have a table ‘Employee’ in Hive with the following schema and data. 0: jdbc:hive2://localhost:10000> desc Employee; +————-+————+———-+–+ |  col_name   | data_type  | comment  | +————-+————+———-+–+ | id          | bigint     |          | |…

Continue reading

Select Query with Where clause in Hive

We will see how to write simple ‘Select’ queries with Where clause in Hive. Hive Table We have a table ‘Employee’ in Hive with the following schema. 0: jdbc:hive2://localhost:10000> desc Employee; +————-+————+———-+–+ |  col_name   | data_type  | comment  | +————-+————+———-+–+ | id          | bigint     |          | | name        | string    …

Continue reading