Saturday, 8 February 2014

Hive Services

Cli  ---The command line interface to Hive (the shell). This is the default service.

Hiveserver --Runs Hive as a server exposing a Thrift service, enabling access from a range of  clients written in different languages. Applications using the Thrift, JDBC, and ODBC connectors need to run a Hive server to communicate with Hive. Set the HIVE_PORT environment variable to specify the port the server will listen on (defaults to 10,000).

Hwi--The Hive Web Interface

Jar -- The Hive equivalent to hadoop jar, a convenient way to run Java applications that includes both Hadoop and Hive classes on the classpath.

Metastore --By default, the metastore is run in the same process as the Hive service. Using this service, it is possible to run the metastore as a standalone (remote) process. Set the METASTORE_PORT environment variable to specify the port the server will listen on

Hive maintains metadata in a metastore, which is stored in a relational database. This metadata contains information about what tables exist, their columns, privileges, and more.

By default Hive uses Derby to store the metastore, which is an embedded Java relational database. Because it’s embedded, Derby can’t be shared between users, and as such it can’t be used in a multiuser environment where the metastore needs to be shared

Hive can support multiple databases, which can be used to avoid table name collisions (two teams or users that have the same table name) and to allow separate databases for different users or products.

A Hive table is a logical concept that’s physically comprised of a number of files in HDFS.
Tables can either be

internal—where Hive organizes them inside a warehouse directory, which is controlled by the hive.metastore.warehouse.dir property whose  default value is /user/hive/warehouse (in HDFS);

                                                                 or

 external—in which case Hive doesn’t manage them.

Internal tables are useful if you want Hive to manage the complete lifecycle of your data including the deletion, whereas external tables are useful when the files are being used outside of Hive.

No comments:

Post a comment