Sqoop:Importing data from MySQL into HDFS

Step 1: Install and start MySQL if you have not already done so

MySQL Installation Tutorial for instructions of how to install MySQL.

Step 2: Configure the MySQL Service and Connector

Download mysql-connector-java-5.0.5.jar file and copy it to $SQOOP_HOME/lib directory.

Step 3: Sqoop Installation

Sqoop Installation Tutorial for instructions of how to install Sqoop.

  • Database and table creation in MySQL

First connect to MySQL

$ mysql -u root -p

Enter password:

Create database ‘testDb’ and use ‘testDb’ database as a current database.

mysql> create database testDb;

mysql> use testDb;

Create table ‘student’

mysql> create table student(id integer,name char(20));

Add following 2 records to the table

mysql> insert into student values(1,'Archana');

mysql> insert into student values(2,'XYZ');

 Exit from MySQL

mysql> exit;

Sqoop import
1. Importing a table into HDFS

  • 1st way

Command for import

sqoop import --connect jdbc:mysql://localhost/databasename --username $USER_NAME --password $PASSWORD$ --table tablename --m 1

Execute the sqoop import

Here we are using database ‘testDb’ , username ‘root’, password ‘hadoop123’, and table student.

$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --m 1

——————- NOTE——————–

If you have not defined primary key for your table then you have to give ‘-m 1′ option for import.
Otherwise it gives error
ERROR tool.ImportTool: Error during import: No primary key could be found for table student1. Please specify one with --split-by or perform a sequential import with '-m 1'.
  • 2nd Way

Create a config file $HOME/import.txt add following to the config file

              import.txt

import
--connect
jdbc:mysql://localhost/testDb
--username
root
--password
hadoop123

Execute the sqoop import

sqoop --options-file /home/hduser/import.txt --table student -m 1

Once import is done you can find student.jar, student.class and student.java at following location /tmp/sqoop-hduser/compile/—-/student.jar

Files created in HDFS

$ hadoop dfs -ls -R student

Found 3 items

-rw-r--r--   1 hduser supergroup          0 2013-09-13 15:38 /user/hduser/student/_SUCCESS

drwxr-xr-x   - hduser supergroup          0 2013-09-13 15:38 /user/hduser/student/_logs

-rw-r--r--   1 hduser supergroup         16 2013-09-13 15:38 /user/hduser/student/part-m-00000

Data file contents

$ hadoop dfs -cat /user/hduser/student/part-m-00000

1,Archana
2,XYZ

2 Import all rows of a table in MySQL, but specific columns of the table

$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --columns "name" -m 1

Data file contents

$ hadoop dfs -cat  /user/hduser/student/part-m-00000

Archana
Xyz

3 Import all columns, filter rows using where clause

$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --where "id>1" -m 1 --target-dir /user/hduser/ar

Data file contents

$ hadoop dfs -cat  /user/hduser/ar/part-m-00000
2,XYZ

——————- NOTE——————– 

If you are facing FileExistsException then use “--target-dir /user/hduser/give_diff_name”
Error:
org.apache.commons.io.FileExistsException: Destination '/home/hduser/./student.java' already exists

Reference

http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html

Note: The information provided here is best of my knowledge and experiences if at all any modifications are to be made please help me with your valuable suggestions which are always welcome…. :)

25 thoughts on “Sqoop:Importing data from MySQL into HDFS

  1. Thiruvengatam says:

    Thank you… Its really very nice very good post regarding sqoop import and export. I have successfully executed.
    Threw following your post…

    If you have chance mean please share how to send the LOG files into HDFS using sqoop import.

    Like

  2. ND says:

    Hi Archana,

    Wonderful blog..
    I have a query…I am getting below error while executing the sqoop import command:

    sqoop:000> sqoop import –connect jdbc:mysql://localhost/books –username root –password thanks –table authors –m 1;
    Exception has occurred during processing command
    Exception: org.codehaus.groovy.control.MultipleCompilationErrorsException Message: startup failed:
    groovysh_parse: 1: expecting EOF, found ‘import’ @ line 1, column 7.
    sqoop import –connect jdbc:mysql://localhost/books –username root –password thanks –table authors –m 1;
    ^

    1 error

    Could you let me know what could be the reason for the error?
    Thank you

    Like

  3. Dallas_user says:

    Hi,

    I am trying to import data from sqlserver into hdfs and getting the below error. I checked and see the jar file – usr/lib/sqoop/lib/jackson-mapper-asl-1.9.13.jar is present. Could you please help?

    14/09/22 20:34:19 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://master1:8020/usr/lib/sqoop/lib/jackson-mapper-asl-1.9.13.jar
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
    at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
    at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
    at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
    at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
    at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
    at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
    at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:146)
    at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
    at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
    at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

    Like

  4. Namita says:

    Hi,

    I am trying to use sqoop for Hadoop 2.2.0 on Linux Ubantu 14.0 LTS
    I have used the Sqoop release sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz
    but getting following error on executing sqoop import command

    14/11/24 11:53:27 INFO tool.CodeGenTool: Beginning code generation
    14/11/24 11:53:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    14/11/24 11:53:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
    14/11/24 11:53:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/gslab/setup/hadoop-2.2.0/share/hadoop/mapreduce
    Note: /tmp/sqoop-gslab/compile/8874fa11f57ff37a2a3541bae93b3824/test.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    14/11/24 11:53:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-gslab/compile/8874fa11f57ff37a2a3541bae93b3824/test.jar
    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/InputFormat
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

    any idea what could be reason for this error

    Like

  5. balwan singh says:

    Hi,
    can somebody tell me using sqoop how to import a oracle table having all constraints (primary key+foreign keys+data types which are not there in hive but in oracle like date in oracle and timestamp in hive) into hive or hdfs

    Like

  6. Naved Khan says:

    When i use 1 way import command the it is giving this error..

    ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter

    Like

  7. ronnie says:

    Thanks for the guide, why my sqoop imported file is not named as part-m-00000 but it is named to faac0e6b-5935-4ae3-ba4d-cbaa742623b1.txt

    Like

  8. karthik says:

    hi all,
    i have installed sqoop and its running perfectly. i have checked this by “sqoop version” command. this showing sqoop 1.4.6 is running.

    After that when i tried to import MySQL table to HDFS by following command, it throws error as

    “Got error creating database manager: java.io.IOException: No manager for connect string”

    how can i resolve this. please share if you know the reason.
    thanks.

    Like

  9. Kaleem says:

    Hi All,

    My cluster has one namenode and one datanode on two separate computers.

    Sqoop import is giving the following error

    java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class student (which is Table Name) not found

    my Sqoop import command is

    sqoop import –connect “jdbc:mysql://localhost/testDB” –username sqoop –password sqoop –table tbl –m 1

    any help would be appreciated.

    Thanks
    Kaleem

    Like

  10. Hello Archana,

    This post on sqoop is really very useful for the beginners trying to learn the connectivity and fetching specific records or tables from mysql. Thanks for sharing, and waiting for the post on advanced scoop commands.

    Like

  11. veeralakshmi says:

    in sqoop atlast i have an error
    i cant see the data in hdfs

    hadoop dfs -cat /user/hduser/ar/part-m-00000;
    DEPRECATED: Use of this script to execute hdfs command is deprecated.
    Instead use the hdfs command for it.

    cat: `/user/hduser/ar/part-m-00000′: No such file or directory

    please please give me a soltion for these

    Like

  12. krishna says:

    ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/tmp already exists

    i am getting this error while importing data?

    Like

Leave a comment