• Home
  • Algorithms
  • Java
  • Hive
  • Learn Java
  • Hadoop
  • OrientDB
  • Database
  • Others
YouTube Twitter RSS
My IT Learnings
Posts related to computer science, algorithms, software development, databases etc
 
Skip to content
  • Home
  • Algorithms
  • Java
  • Hive
  • Learn Java
  • Hadoop
  • OrientDB
  • Database
  • Others
Posts tagged "bigdata"

Tag Archives: bigdata

Comparison of Storage formats in Hive - TEXTFILE vs ORC vs PARQUET

rajesh • April 4, 2016bigdata bigdata, hive, hive orc format, hive parquet format, hive storage format comparisons, hive textfile format
0

We will compare the different storage formats available in Hive. The comparison will be based on the size of the data on HDFS and time for executing a simple query. Cluster summary The performance is bench marked using a 5 node Hadoop cluster. Each node is a 8 core, 8…

Continue reading

Sampling in Hive

rajesh • April 4, 2016bigdata bigdata, hive, hive sampling
0

Sampling Sampling is concerned with the selection of a subset of data from a large dataset to run queries and verify results. The dataset may be too large to run queries on the whole data. Therefore in development and testing phases it is a good idea to run queries on…

Continue reading

Running Sampling Queries in Hive

rajesh • March 31, 2016bigdata bigdata, hive, sampling queries
1

We will see how to run sampling queries in Hive. Hive Table We have the following table Employee in Hive, bucketed by ID into 5 buckets:- CREATE TABLE Employee( ID BIGINT, NAME STRING, AGE INT, SALARY BIGINT, DEPARTMENT STRING ) COMMENT ‘This is Employee table stored as textfile clustered by…

Continue reading

Complex data type in Hive: Map

rajesh • March 25, 2016bigdata bigdata, complex data type, hive, map
0

Map - a complex data type in Hive which can store Key-Value pairs. Values from a map can be accessed using the keys. Create Table While creating a table with Map data type, we need to specify the - ‘COLLECTION ITEMS TERMINATED BY’ character to specify different key-value pairs. ‘MAP KEYS…

Continue reading

Complex data type in Hive: Struct

rajesh • March 25, 2016bigdata bigdata, complex data type, hive, struct
0

Struct - a complex data type in Hive which can store a set of fields of different data types. The elements of a struct are accessed using dot notation. Create Table While creating a table with Struct data type, we need to specify the ‘COLLECTION ITEMS TERMINATED BY’ character. This…

Continue reading

Complex data type in Hive: Array

rajesh • March 25, 2016bigdata array data type, bigdata, complex data type, data types, hive
0

Array - a complex data type in Hive which can store an ordered collection of similar elements accessible using 0 based index. Create Table While creating a table with Array data type, we need to specify the ‘COLLECTION ITEMS TERMINATED BY’ character. This character will be used to specify different…

Continue reading

Basic Data types in Hive

rajesh • March 25, 2016bigdata bigdata, data types, hive
0

We will see the Basic data types in Hive. Numeric Types TINYINT 1 byte signed integer. Values range -128 to 127 SMALLINT 2 byte signed integer. Values range -32,768 to 32,767 INT 4 byte signed integer. Values range -2,147,483,648 to 2,147,483,647 BIGINT 8 byte signed integer. Values range -9,223,372,036,854,775,808 to…

Continue reading

Partitioning vs Bucketing in Hive

rajesh • March 23, 2016bigdata bigdata, bucketing, hive, partitioning
0

We will see some of the differences between partitioning and bucketing in Hive. Partitioning Partitioning is used to divide the table into different partitions. Each partition is stored as a different directory. A partition is created for each unique value of the partition column. Hierarchical partitioning can be done by…

Continue reading

Creating Bucketed and Sorted Table in Hive and Inserting Data

rajesh • March 23, 2016bigdata bigdata, bucketing, hive, sorting
0

Create Table A bucketed and sorted table stores the data in different buckets and the data in each bucket is sorted according to the column specified in the SORTED BY clause while creating the table. For creating a bucketed and sorted table, we need to use CLUSTERED BY (columns) SORTED…

Continue reading

Creating Bucketed Table in Hive and Inserting Data

rajesh • March 23, 2016bigdata bigdata, bucketing, hive
0

Create Table For creating a bucketed table, we need to use CLUSTERED BY clause to define the columns for bucketing and provide the number of buckets. Following query creates a table Employee bucketed using the ID column into 5 buckets. CREATE TABLE Employee( ID BIGINT, NAME STRING, AGE INT, SALARY…

Continue reading

1234
  • YouTube Twitter RSS
  • Archives

    • August 2016 (3)
    • July 2016 (3)
    • May 2016 (3)
    • April 2016 (2)
    • March 2016 (30)
    • February 2016 (2)
    • January 2016 (5)
    • December 2015 (3)
    • November 2015 (4)
    • October 2015 (6)
  • Categories

    • algorithms (5)
    • bigdata (34)
    • computer science (3)
    • database (7)
    • eclipse (1)
    • hadoop (3)
    • java (8)
    • maven (1)
  • Tags

    algorithms ArrayList base64 bfs bigdata binary search BlockingQueue bucketing complex data type consumer create table database data structures data types eclipse encoding external graph hadoop hdfs hive IN operator insert java jdbc jsch load data maven metastore mysql oracle orc partitioning performance producer remote script select sequencefile sql sqlldr ssh ssh2 stack unicode update
  • Home
  • Algorithms
  • Java
  • Hive
  • Learn Java
  • Hadoop
  • OrientDB
  • Database
  • Others
Powered by Nirvana & WordPress.
YouTube Twitter RSS