- Before moving the RDBMS Table data from Relational data base to Hadoop HDFS, We need to ensure whether table along with the data exists or not.
- use SQOOP IMPORT to import the Relational table data into HDFS.
For example:
1
2
3
4
5
6
7
8
|
sqoop import \
--connect 'jdbc:mysql://localhost/StudentInfo' \
--table student \
--username root \
--password cloudera \
--split-by std-id \
--m 1 \
--target-dir '/user/cloudera/student';
|
Practical Steps
Let us practically try with an example for this scenario.
Step 1: Log in to Mysql database:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[cloudera@quickstart ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 18
Server version: 5.1.66 Source distribution
Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
|
Step 2: Create a Database called StudentInfo.
1
2
3
|
mysql> create database StudentInfo
-> ;
Query OK, 1 row affected (0.05 sec)
|
Step 3: Use the newly created Databases
1
2
|
mysql> use StudentInfo;
Database changed
|
step 4: Create a table called student in Mysql
1
2
|
mysql> create table student( std_id integer, std_name varchar(43));
Query OK, 0 rows affected (0.06 sec)
|
step 5: Insert values into student using Mysql INSERT command and then using select check whether the values are properly inserted or not.
1
2
3
4
5
6
7
8
9
10
11
12
|
mysql> insert into student values (101,'javachainAnto'),
(102,'kumar'),(103,'sofia');
mysql> select * from student;
+--------+---------------+
| std_id | std_name |
+--------+---------------+
| 101 | javachainAnto |
| 102 | kumar |
| 103 | Sofia |
+--------+---------------+
3 rows in set (0.00 sec)
|
Step 6: Use SQOOP IMPORT to import the above RDBMS table into HDFS
Use the following SQOOP IMPORT command :
1
2
3
4
5
6
7
8
|
sqoop import \
--connect 'jdbc:mysql://localhost/StudentInfo' \
--table student \
--username root \
--password cloudera \
--split-by std-id \
--m 1 \
--target-dir '/user/cloudera/student';
|
You can see the SQOOP Import execution like something below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
|
[cloudera@quickstart ~]$ sqoop import --connect 'jdbc:mysql://localhost/StudentInfo' --table student --username root --password cloudera --target-dir '/user/cloudera/student' --split-by std_id;
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/07/20 15:31:51 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
15/07/20 15:31:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/07/20 15:31:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/20 15:31:52 INFO tool.CodeGenTool: Beginning code generation
15/07/20 15:31:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
15/07/20 15:31:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
15/07/20 15:31:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/e75f4f6477a8fd6d642b79f42757c3c9/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/20 15:31:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/e75f4f6477a8fd6d642b79f42757c3c9/student.jar
15/07/20 15:31:55 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/07/20 15:31:55 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/07/20 15:31:55 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/07/20 15:31:55 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/07/20 15:31:55 INFO mapreduce.ImportJobBase: Beginning import of student
15/07/20 15:31:55 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/20 15:31:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/20 15:31:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/20 15:31:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/20 15:32:01 INFO db.DBInputFormat: Using read commited transaction isolation
15/07/20 15:32:01 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`std_id`), MAX(`std_id`) FROM `student`
15/07/20 15:32:01 INFO mapreduce.JobSubmitter: number of splits:3
15/07/20 15:32:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437428820805_0001
15/07/20 15:32:02 INFO impl.YarnClientImpl: Submitted application application_1437428820805_0001
15/07/20 15:32:02 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1437428820805_0001/
15/07/20 15:32:02 INFO mapreduce.Job: Running job: job_1437428820805_0001
15/07/20 15:32:20 INFO mapreduce.Job: Job job_1437428820805_0001 running in uber mode : false
15/07/20 15:32:21 INFO mapreduce.Job: map 0% reduce 0%
15/07/20 15:32:53 INFO mapreduce.Job: map 100% reduce 0%
15/07/20 15:32:54 INFO mapreduce.Job: Job job_1437428820805_0001 completed successfully
15/07/20 15:32:54 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=405708
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=331
HDFS: Number of bytes written=38
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=90633
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=90633
Total vcore-seconds taken by all map tasks=90633
Total megabyte-seconds taken by all map tasks=92808192
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=331
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=1470
CPU time spent (ms)=2880
Physical memory (bytes) snapshot=315695104
Virtual memory (bytes) snapshot=4515340288
Total committed heap usage (bytes)=182255616
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=38
15/07/20 15:32:54 INFO mapreduce.ImportJobBase: Transferred 38 bytes in 57.0157 seconds (0.6665 bytes/sec)
15/07/20 15:32:54 INFO mapreduce.ImportJobBase: Retrieved 3 records.
|
Now let us see the output on our Command shell:
1
2
3
4
5
6
7
8
9
10
11
|
[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/student
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2015-07-20 15:44 /user/cloudera/student/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 38 2015-07-20 15:44 /user/cloudera/student/part-m-00000
[cloudera@quickstart ~]$ hadoop fs -cat /user/cloudera/student/part-m-00000
101,javachainAnto
102,kumar
103,Sofia
[cloudera@quickstart ~]$
|
No comments:
Post a Comment