Some Configuration Properties in Hive

We will see some of the configuration properties available in Hive.

Hive Warehouse Directory

hive.metastore.warehouse.dir

Location of directory on HDFS which will be used for storing the hive warehouse data.
Default Value: /user/hive/warehouse

0: jdbc:hive2://localhost:10000> show conf "hive.metastore.warehouse.dir";
+-----------------------+---------+-------------------------------------------------+--+
|        default        |  type   |                      desc                       |
+-----------------------+---------+-------------------------------------------------+--+
| /user/hive/warehouse  | STRING  | location of default database for the warehouse  |
+-----------------------+---------+-------------------------------------------------+--+

Hive Execution Engine

hive.execution.engine

Set the execution engine for Hive queries. The available options are – (mr/tez/spark). mr is for MapReduce, tez for Apache Tez and spark for Apache Spark.
Default Value: mr

Setting Number of Reducers

mapred.reduce.tasks

Set the number of reduce tasks per job.  If set to -1 Hive will automatically figure out the number of reducers for the job.
Default Value: -1

#View the current value
0: jdbc:hive2://localhost:10000> set mapred.reduce.tasks;
+-------------------------+--+
|           set           |
+-------------------------+--+
| mapred.reduce.tasks=-1  |
+-------------------------+--+

#Set value to 2
0: jdbc:hive2://localhost:10000> set mapred.reduce.tasks=2;

#View the modified value
0: jdbc:hive2://localhost:10000> set mapred.reduce.tasks;
+------------------------+--+
|          set           |
+------------------------+--+
| mapred.reduce.tasks=2  |
+------------------------+--+

Partitioning

hive.exec.dynamic.partition.mode

Values can be (strict/nostrict). In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic.
Default Value: strict

Parallel Execution

hive.exec.parallel

Values can be (true/false). If set to true, then the jobs for different stages will be executed in parallel. A Hive query is converted to a number of MapReduce Jobs. All the MapReduce jobs will execute sequentially if this property is set to false. If this property is set to true, then all independent jobs can be executed in parallel.
Default Value: false

hive.exec.parallel.thread.number

The maximum number of MapReduce jobs which can be executed in parallel if hive.exec.parallel property is set to true.
Default Value: 8

Leave a Reply

Your email address will not be published. Required fields are marked *