bigdata - My IT Learnings

Comparison of Storage formats in Hive - TEXTFILE vs ORC vs PARQUET

rajesh • April 4, 2016bigdata

We will compare the different storage formats available in Hive. The comparison will be based on the size of the data on HDFS and time for executing a simple query. Cluster summary The performance is bench marked using a 5 node Hadoop cluster. Each node is a 8 core, 8…

Sampling in Hive

rajesh • April 4, 2016bigdata

Sampling Sampling is concerned with the selection of a subset of data from a large dataset to run queries and verify results. The dataset may be too large to run queries on the whole data. Therefore in development and testing phases it is a good idea to run queries on…

Running Sampling Queries in Hive

rajesh • March 31, 2016bigdata

We will see how to run sampling queries in Hive. Hive Table We have the following table Employee in Hive, bucketed by ID into 5 buckets:- CREATE TABLE Employee( ID BIGINT, NAME STRING, AGE INT, SALARY BIGINT, DEPARTMENT STRING ) COMMENT ‘This is Employee table stored as textfile clustered by…

Complex data type in Hive: Map

rajesh • March 25, 2016bigdata

Map - a complex data type in Hive which can store Key-Value pairs. Values from a map can be accessed using the keys. Create Table While creating a table with Map data type, we need to specify the - ‘COLLECTION ITEMS TERMINATED BY’ character to specify different key-value pairs. ‘MAP KEYS…

Complex data type in Hive: Struct

rajesh • March 25, 2016bigdata

Struct - a complex data type in Hive which can store a set of fields of different data types. The elements of a struct are accessed using dot notation. Create Table While creating a table with Struct data type, we need to specify the ‘COLLECTION ITEMS TERMINATED BY’ character. This…

Complex data type in Hive: Array

rajesh • March 25, 2016bigdata

Array - a complex data type in Hive which can store an ordered collection of similar elements accessible using 0 based index. Create Table While creating a table with Array data type, we need to specify the ‘COLLECTION ITEMS TERMINATED BY’ character. This character will be used to specify different…

Basic Data types in Hive

rajesh • March 25, 2016bigdata

We will see the Basic data types in Hive. Numeric Types TINYINT 1 byte signed integer. Values range -128 to 127 SMALLINT 2 byte signed integer. Values range -32,768 to 32,767 INT 4 byte signed integer. Values range -2,147,483,648 to 2,147,483,647 BIGINT 8 byte signed integer. Values range -9,223,372,036,854,775,808 to…

Partitioning vs Bucketing in Hive

rajesh • March 23, 2016bigdata

We will see some of the differences between partitioning and bucketing in Hive. Partitioning Partitioning is used to divide the table into different partitions. Each partition is stored as a different directory. A partition is created for each unique value of the partition column. Hierarchical partitioning can be done by…

Creating Bucketed and Sorted Table in Hive and Inserting Data

rajesh • March 23, 2016bigdata

Create Table A bucketed and sorted table stores the data in different buckets and the data in each bucket is sorted according to the column specified in the SORTED BY clause while creating the table. For creating a bucketed and sorted table, we need to use CLUSTERED BY (columns) SORTED…

Creating Bucketed Table in Hive and Inserting Data

rajesh • March 23, 2016bigdata

Create Table For creating a bucketed table, we need to use CLUSTERED BY clause to define the columns for bucketing and provide the number of buckets. Following query creates a table Employee bucketed using the ID column into 5 buckets. CREATE TABLE Employee( ID BIGINT, NAME STRING, AGE INT, SALARY…

Tag Archives: bigdata

Comparison of Storage formats in Hive - TEXTFILE vs ORC vs PARQUET

Sampling in Hive

Running Sampling Queries in Hive

Complex data type in Hive: Map

Complex data type in Hive: Struct

Complex data type in Hive: Array

Basic Data types in Hive

Partitioning vs Bucketing in Hive

Creating Bucketed and Sorted Table in Hive and Inserting Data

Creating Bucketed Table in Hive and Inserting Data

Archives

Categories

Archives

Categories

Tags