We can use SQOOP to Import the results of a sql query from a relational database into HDFS
Let us take the following RDBMS Mysql Table called student.
Here let us only import std_id = 103 Query Results into HDFS using SQOOP IMPORT command.
Let us take the following RDBMS Mysql Table called student.
Here let us only import std_id = 103 Query Results into HDFS using SQOOP IMPORT command.
1
2
3
4
5
6
7
8
9
|
mysql> select * from student;
+--------+---------------+
| std_id | std_name |
+--------+---------------+
| 101 | javachainAnto |
| 102 | kumar |
| 103 | Sofia |
+--------+---------------+
3 rows in set (0.00 sec)
|
Let us import row where std_id = 103 using SQOOP IMPORT.
1
2
3
4
5
6
7
|
sqoop import \
--connect 'jdbc:mysql://localhost/StudentInfo' \
--username root \
--password cloudera \
--query 'select * from student where std_id=103 AND $CONDITIONS' \
--split-by std_id \
--target-dir '\user\cloudera\student\';
|
Let us run the above SQOOP Import and see the Execution flow of the SQOOP IMPORT using –query.
Please note that we no need to use –table parameter if we use –query.
Please note that we no need to use –table parameter if we use –query.
if you are using where condition, make sure you end with AND $CONDITIONS in a single qoates ‘ or ” ”
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
|
[cloudera@quickstart ~]$ sqoop import \
> --connect 'jdbc:mysql://localhost/StudentInfo' \
> --username root \
> --password cloudera \
> --query 'select * from student where std_id=103 AND $CONDITIONS' \
> --split-by std_id \
> --target-dir 'student';
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/07/20 19:07:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
15/07/20 19:07:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/07/20 19:07:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/20 19:07:13 INFO tool.CodeGenTool: Beginning code generation
15/07/20 19:07:13 INFO manager.SqlManager: Executing SQL statement: select * from student where std_id=103 AND (1 = 0)
15/07/20 19:07:13 INFO manager.SqlManager: Executing SQL statement: select * from student where std_id=103 AND (1 = 0)
15/07/20 19:07:13 INFO manager.SqlManager: Executing SQL statement: select * from student where std_id=103 AND (1 = 0)
15/07/20 19:07:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/e11bc6ff1f5bec392ea27817d7c4342d/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/20 19:07:16 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/e11bc6ff1f5bec392ea27817d7c4342d/QueryResult.jar
15/07/20 19:07:16 INFO mapreduce.ImportJobBase: Beginning query import.
15/07/20 19:07:16 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/20 19:07:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/20 19:07:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/20 19:07:18 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/20 19:07:21 INFO db.DBInputFormat: Using read commited transaction isolation
15/07/20 19:07:21 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(std_id), MAX(std_id) FROM (select * from student where std_id=103 AND (1 = 1) ) AS t1
15/07/20 19:07:21 INFO mapreduce.JobSubmitter: number of splits:1
15/07/20 19:07:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437428820805_0004
15/07/20 19:07:22 INFO impl.YarnClientImpl: Submitted application application_1437428820805_0004
15/07/20 19:07:22 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1437428820805_0004/
15/07/20 19:07:22 INFO mapreduce.Job: Running job: job_1437428820805_0004
15/07/20 19:07:35 INFO mapreduce.Job: Job job_1437428820805_0004 running in uber mode : false
15/07/20 19:07:35 INFO mapreduce.Job: map 0% reduce 0%
15/07/20 19:07:46 INFO mapreduce.Job: map 100% reduce 0%
15/07/20 19:07:46 INFO mapreduce.Job: Job job_1437428820805_0004 completed successfully
15/07/20 19:07:46 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=135322
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=107
HDFS: Number of bytes written=10
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=8558
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=8558
Total vcore-seconds taken by all map tasks=8558
Total megabyte-seconds taken by all map tasks=8763392
Map-Reduce Framework
Map input records=1
Map output records=1
Input split bytes=107
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=109
CPU time spent (ms)=900
Physical memory (bytes) snapshot=120209408
Virtual memory (bytes) snapshot=1505222656
Total committed heap usage (bytes)=60751872
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=10
15/07/20 19:07:46 INFO mapreduce.ImportJobBase: Transferred 10 bytes in 28.1674 seconds (0.355 bytes/sec)
15/07/20 19:07:46 INFO mapreduce.ImportJobBase: Retrieved 1 records.
|
1
2
3
4
5
6
7
8
|
[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/student/
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2015-07-20 19:07 /user/cloudera/student/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 10 2015-07-20 19:07 /user/cloudera/student/part-m-00000
[cloudera@quickstart ~]$ hadoop fs -cat /user/cloudera/student/part-m-00000
103,Sofia
|