Saturday 8 February 2014

Hive Internal & External Table

A Hive table is a logical concept that’s physically comprised of a number of files in HDFS. Tables can either be

Hive Internal Table:

Internal table—If our data available into local file system then we should go for Hive internal table. Where Hive organizes them inside a warehouse directory, which is controlled by the hive.metastore.warehouse.dir property whose  default value is /user/hive/warehouse (in HDFS);

Note: In internal table the data and table is tightly coupled,  if we are trying to drop the table means both table, data and metadata droped.

**** Internal table with load is recommended.****

Example:


1. To create the internal table
    Hive>CREATE TABLE managed_table (dummy STRING);
         Row format delimited
         Fields terminated by ‘\t’;

2. Load the data into internal table
    Hive>LOAD DATA INPATH '/user/tom/data.txt' INTO table managed_table;


3. Display the content of the table

    Hive>select * from managed_table;

4. To drop the internal table
    Hive>DROP TABLE managed_table;

If you dropped the internal/managed table, including its metadata and its data, is deleted.


Hive External Table:

external table—If our data available into HDFS the we should go for Hive External table. in this case Hive doesn’t manage them.

Note: In External table the data and table is loosely  coupled, if we are trying to drop the External table, the table is droped data is available into HDFS.

**** External table with location is recommended.****

Internal tables are useful if you want Hive to manage the complete lifecycle of your data including the deletion, whereas external tables are useful when the files are being used outside of Hive.

Example:
 
1. Create External table
    Hive>CREATE EXTERNAL TABLE external_table (dummy STRING)
               Row format delimited
               Fields terminated by ‘\t’
               LOCATION '/user/tom/external_table';

2. If we are not specifying the location at the time of table creation, we can load the data manually
    Hive>LOAD DATA INPATH '/user/tom/data.txt' INTO TABLE external_table;


3. Display the content of the table
    Hive>select * from external_table;

4. To drop the internal table
    Hive>DROP TABLE external_table;

When you drop an external table, Hive will leave the data untouched and only delete the metadata


1 comment:

  1. great Hive interview tips, 99% in production, used external tables only. thanks to share

    ReplyDelete