Monday, 30 June 2014

Hive Architecture




Command line interface: It’s the default and the most common way of accessing hive.
Hiveserver : Runs hive as a server exposing a thrift service,enabling access from a range of clients written in different languages.
HWI :  Hive web interface



Shell:Shell is the command line interface.It allows interactive queries like MySQL shell connected to database.Also supports web and JDBC clients.
Driver,compiler and execution engine take the HiveQL scripts and run in Hadoop environment.
Driver: The component which receives the queries. This component implements the notion of session handles and provides execute and fetch APIs modeled on JDBC/ODBC interfaces.
Compiler: The component that parses the query, does semantic analysis on the different queries blocks and query expressions and eventually generates an execution plan with the help of the table and partition metadata looked up from the metastore.
Execution engine: The component which executes the execution plan created by the compiler. The plan is a DAG of stages. The execution engine manages the dependencies between these different stages of the plan and executes these stages on the appropriate system components
Metastore: It store system catalog, The component that stores all the structure information of the various table and partitions in the warehouse including column and column type information, the serializers and deserializers necessary to read and write data and the corresponding HDFS files where the data is stored.


         The Compiler is invoked by the driver upon receiving a HiveQL statement. The compiler translates this statement into a plan which consists of a DAG of mapreduce jobs.
         The driver submits the individual map-reduce jobs from the DAG to the Execution Engine in a topological order. Hive currently uses Hadoop as its execution engine.