Setup Hive 1.x

We will see how to setup Hive 1.x.

Download Hive

We will download hive-1.2.1, from – https://hive.apache.org/downloads.html

Hive distribution file to download – apache-hive-1.2.1-bin.tar.gz

Extract the contents of the file to a directory /home/hadoopUser/hive.

tar -xvf apache-hive-1.2.1-bin.tar.gz

Set Environment Variables

For Hive to work, we need to set $HADOOP_HOME or $HADOOP_PREFIX
Or we should have hadoop in the path

#Point to your hadoop installation directory
export HADOOP_HOME=/home/hadoopUser/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

That’s all we need to do to run Hive.
To learn about Hive metastore and its setup, continue reading.
To start using Hive with default metastore settings, jump to – Start Hiveserver2, Connect Through Beeline and Run Hive Queries

Hive Metastore

The Hive metastore stores the metadata for Hive tables and partitions in a relational database. By default Hive uses Derby for creating embedded metastore. The default embedded metastore is suitable only for testing purposes and is not recommended for production use. It can support only one active user at a time. Both the database and the metastore service run embedded in the main HiveServer process.

Default Metastore Absolute Path Setup

Hive will always create/use default metastore DB from the directory where hiveserver2 is started.

To override this behavior we need to do a small configuration. Inside the hive installation directory we have a ‘conf‘ directory. The ‘conf‘ directory contains hive-default.xml.template which contains the default values for various configuration variables in a Hive distribution. In order to override any of the properties, create hive-site.xml under the ‘conf‘ directory and add configuration properties to override.

This is the default setting in hive-default.xml.template:

<property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:derby:;databaseName=metastore_db;create=true</value>
      <description>JDBC connect string for a JDBC metastore</description>
</property>

Update it to as shown below and add to hive-site.xml:

<property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:derby:;databaseName=/home/hadoopUser/metastore_db;create=true</value>
      <description>JDBC connect string for a JDBC metastore</description>
</property>

Now the hiveserver2 will always point to the metastore_db directory as provided in the configuration.

Using embedded metastore is not recommended for production use and at a time only one active user can use the embedded metastore. To configure Hive to use a dedicated metastore on mysql visit this link- Configure Hive Metastore on MySQL

 

Leave a Reply

Your email address will not be published. Required fields are marked *