Step 1: Install and start MySQL if you have not already done so
MySQL Installation Tutorial for instructions of how to install MySQL.
Step 2: Configure the MySQL Service and Connector
Download mysql-connector-java-5.0.5.jar file and copy it to $SQOOP_HOME/lib directory.
Step 3: Sqoop Installation
Sqoop Installation Tutorial for instructions of how to install Sqoop.
- Database and table creation in MySQL
First connect to MySQL
$ mysql -u root -p Enter password:
Create database ‘testDb’ and use ‘testDb’ database as a current database.
mysql> create database testDb; mysql> use testDb;
Create table ‘student’
mysql> create table student(id integer,name char(20));
Add following 2 records to the table
mysql> insert into student values(1,'Archana'); mysql> insert into student values(2,'XYZ');
Exit from MySQL
mysql> exit;
Sqoop import
1. Importing a table into HDFS
- 1st way
Command for import
sqoop import --connect jdbc:mysql://localhost/databasename --username $USER_NAME --password $PASSWORD$ --table tablename --m 1
Execute the sqoop import
Here we are using database ‘testDb’ , username ‘root’, password ‘hadoop123’, and table student.
$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --m 1
——————- NOTE——————–
If you have not defined primary key for your table then you have to give ‘-m 1′ option for import.
Otherwise it gives error
ERROR tool.ImportTool: Error during import: No primary key could be found for table student1. Please specify one with --split-by or perform a sequential import with '-m 1'.
- 2nd Way
Create a config file $HOME/import.txt add following to the config file
import.txt
import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123
Execute the sqoop import
sqoop --options-file /home/hduser/import.txt --table student -m 1
Once import is done you can find student.jar, student.class and student.java at following location /tmp/sqoop-hduser/compile/—-/student.jar
Files created in HDFS
$ hadoop dfs -ls -R student Found 3 items -rw-r--r-- 1 hduser supergroup 0 2013-09-13 15:38 /user/hduser/student/_SUCCESS drwxr-xr-x - hduser supergroup 0 2013-09-13 15:38 /user/hduser/student/_logs -rw-r--r-- 1 hduser supergroup 16 2013-09-13 15:38 /user/hduser/student/part-m-00000
Data file contents
$ hadoop dfs -cat /user/hduser/student/part-m-00000 1,Archana 2,XYZ
2 Import all rows of a table in MySQL, but specific columns of the table
$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --columns "name" -m 1
Data file contents
$ hadoop dfs -cat /user/hduser/student/part-m-00000 Archana Xyz
3 Import all columns, filter rows using where clause
$ sqoop import --connect jdbc:mysql://localhost/testDb --username root --password hadoop123 --table student --where "id>1" -m 1 --target-dir /user/hduser/ar
Data file contents
$ hadoop dfs -cat /user/hduser/ar/part-m-00000 2,XYZ
——————- NOTE——————–
If you are facing FileExistsException then use “--target-dir /user/hduser/give_diff_name” Error: org.apache.commons.io.FileExistsException: Destination '/home/hduser/./student.java' already exists
Reference
http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html
Note: The information provided here is best of my knowledge and experiences if at all any modifications are to be made please help me with your valuable suggestions which are always welcome….
Thank you… Its really very nice very good post regarding sqoop import and export. I have successfully executed.
Threw following your post…
If you have chance mean please share how to send the LOG files into HDFS using sqoop import.
LikeLike
Hi Thiruvengatam , thanks for visiting, I hope you enjoy the article 🙂
LikeLike
Hi Archana your articles are have very clear step by step by procedure. Thank you. Please share your knowledge for develop my skills..
LikeLike
Hi Archana,
Its really helpful for the beginners. Thanks for sharing, Please keep posting such a good posts.
LikeLike
Very helpful thank you for the post.
LikeLike
Hi Archana,
Wonderful blog..
I have a query…I am getting below error while executing the sqoop import command:
sqoop:000> sqoop import –connect jdbc:mysql://localhost/books –username root –password thanks –table authors –m 1;
Exception has occurred during processing command
Exception: org.codehaus.groovy.control.MultipleCompilationErrorsException Message: startup failed:
groovysh_parse: 1: expecting EOF, found ‘import’ @ line 1, column 7.
sqoop import –connect jdbc:mysql://localhost/books –username root –password thanks –table authors –m 1;
^
1 error
Could you let me know what could be the reason for the error?
Thank you
LikeLike
Hi,
I am trying to import data from sqlserver into hdfs and getting the below error. I checked and see the jar file – usr/lib/sqoop/lib/jackson-mapper-asl-1.9.13.jar is present. Could you please help?
14/09/22 20:34:19 ERROR tool.ImportTool: Encountered IOException running import job: java.io.FileNotFoundException: File does not exist: hdfs://master1:8020/usr/lib/sqoop/lib/jackson-mapper-asl-1.9.13.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:146)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:601)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
LikeLike
Hi,
I am trying to use sqoop for Hadoop 2.2.0 on Linux Ubantu 14.0 LTS
I have used the Sqoop release sqoop-1.4.5.bin__hadoop-2.0.4-alpha.tar.gz
but getting following error on executing sqoop import command
14/11/24 11:53:27 INFO tool.CodeGenTool: Beginning code generation
14/11/24 11:53:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
14/11/24 11:53:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
14/11/24 11:53:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/gslab/setup/hadoop-2.2.0/share/hadoop/mapreduce
Note: /tmp/sqoop-gslab/compile/8874fa11f57ff37a2a3541bae93b3824/test.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/11/24 11:53:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-gslab/compile/8874fa11f57ff37a2a3541bae93b3824/test.jar
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/InputFormat
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
any idea what could be reason for this error
LikeLike
Hi,
can somebody tell me using sqoop how to import a oracle table having all constraints (primary key+foreign keys+data types which are not there in hive but in oracle like date in oracle and timestamp in hive) into hive or hdfs
LikeLike
When i use 1 way import command the it is giving this error..
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: No columns to generate for ClassWriter
LikeLike
Thanks for the guide, why my sqoop imported file is not named as part-m-00000 but it is named to faac0e6b-5935-4ae3-ba4d-cbaa742623b1.txt
LikeLike
Good for beginner but not enough content
LikeLike
Thank You Very Much. Great Article to explore Sqoop Commands for importing data from MySql to HDFS. I also found another good article of Importing commands to hive in http://www.geoinsyssoft.com/sqoop-commands/.
LikeLike
how can we do using hive and hbase please let me know…..using sqoop.
LikeLike
Thanks for simple explanation to beginners.really helpful…:)
LikeLike
tI’s very helpful. The below link also show some information about sqoop import from mysql to hbase and hadoop eco systems. http://www.geoinsyssoft.com/sqoop-import-hbase/ kindly visit site.
LikeLike
It’s very helpful. The below link also show some information about sqoop import from mysql to hbase and hadoop eco systems. http://www.geoinsyssoft.com/sqoop-import-hbase/ kindly visit site.
LikeLike
hi all,
i have installed sqoop and its running perfectly. i have checked this by “sqoop version” command. this showing sqoop 1.4.6 is running.
After that when i tried to import MySQL table to HDFS by following command, it throws error as
“Got error creating database manager: java.io.IOException: No manager for connect string”
how can i resolve this. please share if you know the reason.
thanks.
LikeLike
Karthik,
I guess you need a MySQL connector jar file in sqoop/lib directory.
Kaleem
LikeLike
Hi All,
My cluster has one namenode and one datanode on two separate computers.
Sqoop import is giving the following error
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class student (which is Table Name) not found
my Sqoop import command is
sqoop import –connect “jdbc:mysql://localhost/testDB” –username sqoop –password sqoop –table tbl –m 1
any help would be appreciated.
Thanks
Kaleem
LikeLike
Hello Archana,
This post on sqoop is really very useful for the beginners trying to learn the connectivity and fetching specific records or tables from mysql. Thanks for sharing, and waiting for the post on advanced scoop commands.
LikeLike
super ,neat n clean n green explanation ,
thank you
thank yo so much,
LikeLike
please upload remaining hadoop concepts
please
please
please
please
please
LikeLike
in sqoop atlast i have an error
i cant see the data in hdfs
hadoop dfs -cat /user/hduser/ar/part-m-00000;
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
cat: `/user/hduser/ar/part-m-00000′: No such file or directory
please please give me a soltion for these
LikeLike
ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/tmp already exists
i am getting this error while importing data?
LikeLike