Sqoop:Importing data from MySQL into HDFS

Step 1: Install and start MySQL if you have not already done so

MySQL Installation Tutorial for instructions of how to install MySQL.

Step 2: Configure the MySQL Service and Connector

Download mysql-connector-java-5.0.5.jar file and copy it to $SQOOP_HOME/lib directory.

Step 3: Sqoop Installation

Sqoop Installation Tutorial for instructions of how to install Sqoop.

  • Database and table creation in MySQL

First connect to MySQL

$ mysql -u root -p

Enter password:

Create database ‘testDb’ and use ‘testDb’ database as a current database.

mysql> create database testDb;

mysql> use testDb;

Create table ‘student’

mysql> create table student(id integer,name char(20));

Add following 2 records to the table

mysql> insert into student values(1,'Archana');

mysql> insert into student values(2,'XYZ');

 Exit from MySQL

mysql> exit;

Sqoop import
1. Importing a table into HDFS

  • 1st way

Command for import

sqoop import --connect jdbc:mysql://localhost/databasename --username $USER_NAME --password $PASSWORD$ --table tablename --m 1

Execute the sqoop import

Here we are using database ‘testDb’ , username ‘root’, password ‘hadoop123’, and table student.

$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --m 1

——————- NOTE——————–

If you have not defined primary key for your table then you have to give ‘-m 1′ option for import.
Otherwise it gives error
ERROR tool.ImportTool: Error during import: No primary key could be found for table student1. Please specify one with --split-by or perform a sequential import with '-m 1'.
  • 2nd Way

Create a config file $HOME/import.txt add following to the config file

              import.txt

import
--connect
jdbc:mysql://localhost/testDb
--username
root
--password
hadoop123

Execute the sqoop import

sqoop --options-file /home/hduser/import.txt --table student -m 1

Once import is done you can find student.jar, student.class and student.java at following location /tmp/sqoop-hduser/compile/—-/student.jar

Files created in HDFS

$ hadoop dfs -ls -R student

Found 3 items

-rw-r--r--   1 hduser supergroup          0 2013-09-13 15:38 /user/hduser/student/_SUCCESS

drwxr-xr-x   - hduser supergroup          0 2013-09-13 15:38 /user/hduser/student/_logs

-rw-r--r--   1 hduser supergroup         16 2013-09-13 15:38 /user/hduser/student/part-m-00000

Data file contents

$ hadoop dfs -cat /user/hduser/student/part-m-00000

1,Archana
2,XYZ

2 Import all rows of a table in MySQL, but specific columns of the table

$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --columns "name" -m 1

Data file contents

$ hadoop dfs -cat  /user/hduser/student/part-m-00000

Archana
Xyz

3 Import all columns, filter rows using where clause

$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --where "id>1" -m 1 --target-dir /user/hduser/ar

Data file contents

$ hadoop dfs -cat  /user/hduser/ar/part-m-00000
2,XYZ

——————- NOTE——————– 

If you are facing FileExistsException then use “--target-dir /user/hduser/give_diff_name”
Error:
org.apache.commons.io.FileExistsException: Destination '/home/hduser/./student.java' already exists

Reference

http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html

Note: The information provided here is best of my knowledge and experiences if at all any modifications are to be made please help me with your valuable suggestions which are always welcome…. :)

Advertisements

21 Responses to Sqoop:Importing data from MySQL into HDFS

  1. Thiruvengatam says:

    Thank you… Its really very nice very good post regarding sqoop import and export. I have successfully executed.
    Threw following your post…

    If you have chance mean please share how to send the LOG files into HDFS using sqoop import.

    Like

  2. Hi Thiruvengatam , thanks for visiting, I hope you enjoy the article 🙂

    Like

  3. Rahul Mane says:

    Hi Archana,

    Its really helpful for the beginners. Thanks for sharing, Please keep posting such a good posts.

    Like

  4. Venkat S says:

    Very helpful thank you for the post.

    Like

  5. ND says:

    Hi Archana,

    Wonderful blog..
    I have a query…I am getting below error while executing the sqoop import command:

    sqoop:000> sqoop import –connect jdbc:mysql://localhost/books –username root –password thanks –table authors –m 1;
    Exception has occurred during processing command
    Exception: org.codehaus.groovy.control.MultipleCompilationErrorsException Message: startup failed:
    groovysh_parse: 1: expecting EOF, found ‘import’ @ line 1, column 7.
    sqoop import –connect jdbc:mysql://localhost/books –username root –password thanks –table authors –m 1;
    ^

    1 error

    Could you let me know what could be the reason for the error?
    Thank you

    Like

  6. Dallas_user says:

    Hi,

    I am trying to import data from sqlserver into hdfs and getting the below error. I checked and see the jar file – usr/lib/sqoop/lib/jackson-mapper-asl-1.9.13.jar is present. Could you please help?

    14/09/22 20:34:19 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://master1:8020/usr/lib/sqoop/lib/jackson-mapper-asl-1.9.13.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
    at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
    at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
    at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:146)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

    Like

  7. Namita says:

    Hi,

    I am trying to use sqoop for Hadoop 2.2.0 on Linux Ubantu 14.0 LTS
    I have used the Sqoop release sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz
    but getting following error on executing sqoop import command

    14/11/24 11:53:27 INFO tool.CodeGenTool: Beginning code generation
    14/11/24 11:53:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    14/11/24 11:53:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    14/11/24 11:53:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/gslab/setup/hadoop-2.2.0/share/hadoop/mapreduce
    Note: /tmp/sqoop-gslab/compile/8874fa11f57ff37a2a3541bae93b3824/test.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    14/11/24 11:53:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-gslab/compile/8874fa11f57ff37a2a3541bae93b3824/test.jar
    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/InputFormat
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

    any idea what could be reason for this error

    Like

  8. balwan singh says:

    Hi,
    can somebody tell me using sqoop how to import a oracle table having all constraints (primary key+foreign keys+data types which are not there in hive but in oracle like date in oracle and timestamp in hive) into hive or hdfs

    Like

  9. Naved Khan says:

    When i use 1 way import command the it is giving this error..

    ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter

    Like

  10. ronnie says:

    Thanks for the guide, why my sqoop imported file is not named as part-m-00000 but it is named to faac0e6b-5935-4ae3-ba4d-cbaa742623b1.txt

    Like

  11. Mahesh says:

    Good for beginner but not enough content

    Like

  12. Thank You Very Much. Great Article to explore Sqoop Commands for importing data from MySql to HDFS. I also found another good article of Importing commands to hive in http://www.geoinsyssoft.com/sqoop-commands/.

    Like

  13. Dheeraj says:

    how can we do using hive and hbase please let me know…..using sqoop.

    Like

  14. divya says:

    Thanks for simple explanation to beginners.really helpful…:)

    Like

  15. Venkat says:

    tI’s very helpful. The below link also show some information about sqoop import from mysql to hbase and hadoop eco systems. http://www.geoinsyssoft.com/sqoop-import-hbase/ kindly visit site.

    Like

  16. Venkat says:

    It’s very helpful. The below link also show some information about sqoop import from mysql to hbase and hadoop eco systems. http://www.geoinsyssoft.com/sqoop-import-hbase/ kindly visit site.

    Like

  17. karthik says:

    hi all,
    i have installed sqoop and its running perfectly. i have checked this by “sqoop version” command. this showing sqoop 1.4.6 is running.

    After that when i tried to import MySQL table to HDFS by following command, it throws error as

    “Got error creating database manager: java.io.IOException: No manager for connect string”

    how can i resolve this. please share if you know the reason.
    thanks.

    Like

  18. Kaleem says:

    Hi All,

    My cluster has one namenode and one datanode on two separate computers.

    Sqoop import is giving the following error

    java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class student (which is Table Name) not found

    my Sqoop import command is

    sqoop import –connect “jdbc:mysql://localhost/testDB” –username sqoop –password sqoop –table tbl –m 1

    any help would be appreciated.

    Thanks
    Kaleem

    Like

  19. Hello Archana,

    This post on sqoop is really very useful for the beginners trying to learn the connectivity and fetching specific records or tables from mysql. Thanks for sharing, and waiting for the post on advanced scoop commands.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: