We will see some of the configuration properties available in Hive.
Hive Warehouse Directory
hive.metastore.warehouse.dir
Location of directory on HDFS which will be used for storing the hive warehouse data.
Default Value: /user/hive/warehouse
0: jdbc:hive2://localhost:10000> show conf "hive.metastore.warehouse.dir"; +-----------------------+---------+-------------------------------------------------+--+ | default | type | desc | +-----------------------+---------+-------------------------------------------------+--+ | /user/hive/warehouse | STRING | location of default database for the warehouse | +-----------------------+---------+-------------------------------------------------+--+
Hive Execution Engine
hive.execution.engine
Set the execution engine for Hive queries. The available options are – (mr/tez/spark). mr is for MapReduce, tez for Apache Tez and spark for Apache Spark.
Default Value: mr
Setting Number of Reducers
mapred.reduce.tasks
Set the number of reduce tasks per job. If set to -1 Hive will automatically figure out the number of reducers for the job.
Default Value: -1
#View the current value 0: jdbc:hive2://localhost:10000> set mapred.reduce.tasks; +-------------------------+--+ | set | +-------------------------+--+ | mapred.reduce.tasks=-1 | +-------------------------+--+ #Set value to 2 0: jdbc:hive2://localhost:10000> set mapred.reduce.tasks=2; #View the modified value 0: jdbc:hive2://localhost:10000> set mapred.reduce.tasks; +------------------------+--+ | set | +------------------------+--+ | mapred.reduce.tasks=2 | +------------------------+--+
Partitioning
hive.exec.dynamic.partition.mode
Values can be (strict/nostrict). In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic.
Default Value: strict
Parallel Execution
hive.exec.parallel
Values can be (true/false). If set to true, then the jobs for different stages will be executed in parallel. A Hive query is converted to a number of MapReduce Jobs. All the MapReduce jobs will execute sequentially if this property is set to false. If this property is set to true, then all independent jobs can be executed in parallel.
Default Value: false
hive.exec.parallel.thread.number
The maximum number of MapReduce jobs which can be executed in parallel if hive.exec.parallel property is set to true.
Default Value: 8