Complex data type in Hive: Array

Array – a complex data type in Hive which can store an ordered collection of similar elements accessible using 0 based index.

Create Table

While creating a table with Array data type, we need to specify the ‘COLLECTION ITEMS TERMINATED BY’ character. This character will be used to specify different elements in an array. We will create a table containing an array<int> data type.

CREATE TABLE Array_test(
         id int,       
         all_nums array<int> 
) COMMENT 'This is a table stored as textfile'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '$'
STORED AS TEXTFILE;

Load Data

We have a file ‘/home/hadoop/array_test‘ on our HDFS.
The file contains 4 rows and each row contains an id and an array. Contents of the file:-

1,326362$3443$23432$875665$3443$43534$234$342
2,123$323$546$546$5476
3,435$345$678$122$98987
4,234$7234$65242$6272

Load the data into table:-

LOAD DATA INPATH '/home/hadoop/array_test' overwrite into table array_test;

Verify data

Now let us execute some queries on the array data type.

#Select the id and all_nums array for all the rows.
0: jdbc:hive2://localhost:10000> select id, all_nums from array_test;
+-----+------------------------------------------------+--+
| id  |                    all_nums                    |
+-----+------------------------------------------------+--+
| 1   | [326362,3443,23432,875665,3443,43534,234,342]  |
| 2   | [123,323,546,546,5476]                         |
| 3   | [435,345,678,122,98987]                        |
| 4   | [234,7234,65242,6272]                          |
+-----+------------------------------------------------+--+
#Select the id and 2nd array element from all the rows. the index is 0 based
0: jdbc:hive2://localhost:10000> select id, all_nums[1] as num from array_test;
+-----+-------+--+
| id  |  num  |
+-----+-------+--+
| 1   | 3443  |
| 2   | 323   |
| 3   | 345   |
| 4   | 7234  |
+-----+-------+--+
#Select the id and 5th array element from all the rows. NULL is returned in case no 
#element is present at the given index.
0: jdbc:hive2://localhost:10000> select id, all_nums[4] as num from array_test;
+-----+--------+--+
| id  |  num   |
+-----+--------+--+
| 1   | 3443   |
| 2   | 5476   |
| 3   | 98987  |
| 4   | NULL   |
+-----+--------+--+

 

Leave a Reply

Your email address will not be published. Required fields are marked *