Array – a complex data type in Hive which can store an ordered collection of similar elements accessible using 0 based index.
Create Table
While creating a table with Array data type, we need to specify the ‘COLLECTION ITEMS TERMINATED BY’ character. This character will be used to specify different elements in an array. We will create a table containing an array<int> data type.
CREATE TABLE Array_test( id int, all_nums array<int> ) COMMENT 'This is a table stored as textfile' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY '$' STORED AS TEXTFILE;
Load Data
We have a file ‘/home/hadoop/array_test‘ on our HDFS.
The file contains 4 rows and each row contains an id and an array. Contents of the file:-
1,326362$3443$23432$875665$3443$43534$234$342 2,123$323$546$546$5476 3,435$345$678$122$98987 4,234$7234$65242$6272
Load the data into table:-
LOAD DATA INPATH '/home/hadoop/array_test' overwrite into table array_test;
Verify data
Now let us execute some queries on the array data type.
#Select the id and all_nums array for all the rows. 0: jdbc:hive2://localhost:10000> select id, all_nums from array_test; +-----+------------------------------------------------+--+ | id | all_nums | +-----+------------------------------------------------+--+ | 1 | [326362,3443,23432,875665,3443,43534,234,342] | | 2 | [123,323,546,546,5476] | | 3 | [435,345,678,122,98987] | | 4 | [234,7234,65242,6272] | +-----+------------------------------------------------+--+
#Select the id and 2nd array element from all the rows. the index is 0 based 0: jdbc:hive2://localhost:10000> select id, all_nums[1] as num from array_test; +-----+-------+--+ | id | num | +-----+-------+--+ | 1 | 3443 | | 2 | 323 | | 3 | 345 | | 4 | 7234 | +-----+-------+--+
#Select the id and 5th array element from all the rows. NULL is returned in case no #element is present at the given index. 0: jdbc:hive2://localhost:10000> select id, all_nums[4] as num from array_test; +-----+--------+--+ | id | num | +-----+--------+--+ | 1 | 3443 | | 2 | 5476 | | 3 | 98987 | | 4 | NULL | +-----+--------+--+