Thursday, 21 November 2013

Dancing with Sqoop

What is Sqoop?

Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. (or) Import/Export data from RDBMS to Hadoop(HDFS) by using sqoop.

You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Step1: Consider I have a table in mysql(emp) as follows

mysql>create database <database name>;
mysql>use <database>;

Step2: Now we need to give grant permissions to our created database as follows

mysql>grant all privileges on *.* to '<database username>'@'%' identified by '<database password>';
mysql> flush priviliges;

Step3: Now we need to create one table and insert values into table

mysql> create table emp(id int,name varchar(20),sal float);
mysql>insert into emp values(101,'xxxx',1234.5);
mysql>insert into emp values(201,'yyyy',567.7);
mysql>select * from emp;
101,xxxx,1234.5
201,yyyy,567.7
mysql> create table empcity(city varchar(20),venue varchar(20),cityid int);
mysql> insert into empcity values('xxx','rrr',101);
mysql> insert into empcity values('xxxx','rrrx',106);

Step 4: Now i need to load this table values into HDFS in other system(PC).

Let us assume we had already pre-installed Hadoop, Sqoop and other Hadoop ecosystem(cdh3/cdh4) available.
First open cdh3/cdh4 Hadoop cluster environment and then open terminal now we need to execute following commands,

To display the list of databases in MySQL by using Hadoop system:

training@localhost$ sqoop list-databases --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)> -username root -P

Ex: training@localhost$localhost$sqoop list-databases --connect jdbc:mysql://192.168.1.11:3306     -username root -P

Tto display all the tables in particuler database by using Hadoop system:

training@localhost$ sqoop list-tables --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)>/<database> -username <db username> -P

Ex: training@localhost$ sqoop list-tables --connect jdbc:mysql://192.168.1.11:3306/mydb -username root -P

Import the data from MySQL to HDFS:



training@localhost$ sqoop import --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)>/<database> --table emp --username root -P --split-by id --target-dir /user/training/emp -m 1

Ex: training@localhost$ sqoop import --connect jdbc:mysql://192.168.1.87/mydb --table emp --username root -P --split-by id --target-dir /user/training/emp

Import data into Hive table from MySQL:

training@localhost$ sudo sqoop import --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)>/<database name> --table <table name> --username <db username> --password <db password> --target-dir /user/training/hive/warehouse/<hive database>/<table name> -m 1

Ex: training@localhost$sudo sqoop import --connect jdbc:mysql://192.168.3.79:3306/mydb --table emp --username root --password training --target-dir /user/training/hive/warehouse/myown.db/emp -m 1

Importing specific resultset based on our requirement:

training@localhost$sudo sqoop import --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)>/<database name> --query "select * from emp where id=101 AND \$CONDITIONS" --username <username> --password <password> --target-dir /user/training/data -m 1

Ex: training@localhost$ sudo sqoop import --connect jdbc:mysql://192.168.3.79:3306/mydb --query "select * from exp where id=101 AND \$CONDITIONS" --username root --password training --target-dir /user/training/datadir -m 1

Importing data after performing join into HDFS:

training@localhost$ sudo sqoop import --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)>/<database> --query "select emp.*,empcity.* from emp join empcity on(emp.id=empcity.cityid) AND \$CONDITIONS" --username <dbusername> --password <password> --target-dir /user/training/datadir -m 1



Export data from HDFS to MySQL table:



First of all we need to create one table in MySQL:

mysql> create table emp1(id int,name varchar(20),sal float);

Consider we had the data in HDFS:

training@localhost$ sudo sqoop export --connect jdbc:mysql://<ip address of mysql installed system>:<portnumber(3306 default)>/<database> --table <table name in mysql> --username <username> --password <password> --export-dir <hdfs directory> -m 1

Ex: training@localhost$ sudo sqoop export --connect jdbc:mysql://192.168.3.79:3306/mydb --table exp --username root --password training --export-dir /user/training/qqqq -m 1

Export data with specific delimiter:

training@localhost$sudo sqoop export --connect jdbc:mysql://192.168.3.79:3306/<database  bame> --table <table name> --username <username> --password <password> --export-dir <hdf directory> --input-fields-terminated-by '@'  -m 1

Ex: training@localhost$ sudo sqoop export --connect jdbc:mysql://192.168.3.79:3306/mydb --table exp1 --username root --password training --export-dir /user/training/exp --input-fields-terminated-by '@'  -m 1




2 comments:

  1. Really very useful blog sir.... i have learnt the concepts from this.... thanks you so much hope you will post some concepts

    ReplyDelete
  2. Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post. It is really very helpful for us and I have gathered some important information from this blog.


    Big Data Course in Chennai

    ReplyDelete