Temime Diaries: Sqoop Import RDBMS Table into HDFS

Before moving the RDBMS Table data from Relational data base to Hadoop HDFS, We need to ensure whether table along with the data exists or not.
use SQOOP IMPORT to import the Relational table data into HDFS.

For example:

1

2

3

4

5

6

7

8

sqoop import  \

--connect 'jdbc:mysql://localhost/StudentInfo' \

--table student \

--username root \

--password cloudera \

--split-by std-id \

--m 1 \

--target-dir '/user/cloudera/student';

Practical Steps

Let us practically try with an example for this scenario.

Step 1: Log in to Mysql database:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

[cloudera@quickstart ~]$ mysql -u root -p

Enter password: 

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 18

Server version: 5.1.66 Source distribution

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Step 2: Create a Database called StudentInfo.

1

2

3

mysql> create database StudentInfo

    -> ;

Query OK, 1 row affected (0.05 sec)

Step 3: Use the newly created Databases

1

2

mysql> use StudentInfo;

Database changed

step 4: Create a table called student in Mysql

1

2

mysql> create table student( std_id integer, std_name varchar(43));

Query OK, 0 rows affected (0.06 sec)

step 5: Insert values into student using Mysql INSERT command and then using select check whether the values are properly inserted or not.

1

2

3

4

5

6

7

8

9

10

11

12

mysql> insert into student values (101,'javachainAnto'),

(102,'kumar'),(103,'sofia');

mysql> select * from student;

+--------+---------------+

| std_id | std_name      |

+--------+---------------+

|    101 | javachainAnto |

|    102 | kumar         |

|    103 | Sofia         |

+--------+---------------+

3 rows in set (0.00 sec)

Step 6: Use SQOOP IMPORT to import the above RDBMS table into HDFS

Use the following SQOOP IMPORT command :

1

2

3

4

5

6

7

8

sqoop import  \

--connect 'jdbc:mysql://localhost/StudentInfo' \

--table student \

--username root \

--password cloudera \

--split-by std-id \

--m 1 \

--target-dir '/user/cloudera/student';

You can see the SQOOP Import execution like something below.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

[cloudera@quickstart ~]$ sqoop import --connect 'jdbc:mysql://localhost/StudentInfo' --table student --username root --password cloudera --target-dir '/user/cloudera/student' --split-by std_id;

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

15/07/20 15:31:51 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0

15/07/20 15:31:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

15/07/20 15:31:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

15/07/20 15:31:52 INFO tool.CodeGenTool: Beginning code generation

15/07/20 15:31:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1

15/07/20 15:31:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1

15/07/20 15:31:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/e75f4f6477a8fd6d642b79f42757c3c9/student.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

15/07/20 15:31:55 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/e75f4f6477a8fd6d642b79f42757c3c9/student.jar

15/07/20 15:31:55 WARN manager.MySQLManager: It looks like you are importing from mysql.

15/07/20 15:31:55 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

15/07/20 15:31:55 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

15/07/20 15:31:55 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

15/07/20 15:31:55 INFO mapreduce.ImportJobBase: Beginning import of student

15/07/20 15:31:55 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

15/07/20 15:31:56 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

15/07/20 15:31:57 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

15/07/20 15:31:57 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

15/07/20 15:32:01 INFO db.DBInputFormat: Using read commited transaction isolation

15/07/20 15:32:01 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`std_id`), MAX(`std_id`) FROM `student`

15/07/20 15:32:01 INFO mapreduce.JobSubmitter: number of splits:3

15/07/20 15:32:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437428820805_0001

15/07/20 15:32:02 INFO impl.YarnClientImpl: Submitted application application_1437428820805_0001

15/07/20 15:32:02 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1437428820805_0001/

15/07/20 15:32:02 INFO mapreduce.Job: Running job: job_1437428820805_0001

15/07/20 15:32:20 INFO mapreduce.Job: Job job_1437428820805_0001 running in uber mode : false

15/07/20 15:32:21 INFO mapreduce.Job:  map 0% reduce 0%

15/07/20 15:32:53 INFO mapreduce.Job:  map 100% reduce 0%

15/07/20 15:32:54 INFO mapreduce.Job: Job job_1437428820805_0001 completed successfully

15/07/20 15:32:54 INFO mapreduce.Job: Counters: 30

 File System Counters

 FILE: Number of bytes read=0

 FILE: Number of bytes written=405708

 FILE: Number of read operations=0

 FILE: Number of large read operations=0

 FILE: Number of write operations=0

 HDFS: Number of bytes read=331

 HDFS: Number of bytes written=38

 HDFS: Number of read operations=12

 HDFS: Number of large read operations=0

 HDFS: Number of write operations=6

 Job Counters 

 Launched map tasks=3

 Other local map tasks=3

 Total time spent by all maps in occupied slots (ms)=90633

 Total time spent by all reduces in occupied slots (ms)=0

 Total time spent by all map tasks (ms)=90633

 Total vcore-seconds taken by all map tasks=90633

 Total megabyte-seconds taken by all map tasks=92808192

 Map-Reduce Framework

 Map input records=3

 Map output records=3

 Input split bytes=331

 Spilled Records=0

 Failed Shuffles=0

 Merged Map outputs=0

 GC time elapsed (ms)=1470

 CPU time spent (ms)=2880

 Physical memory (bytes) snapshot=315695104

 Virtual memory (bytes) snapshot=4515340288

 Total committed heap usage (bytes)=182255616

 File Input Format Counters 

 Bytes Read=0

 File Output Format Counters 

 Bytes Written=38

15/07/20 15:32:54 INFO mapreduce.ImportJobBase: Transferred 38 bytes in 57.0157 seconds (0.6665 bytes/sec)

15/07/20 15:32:54 INFO mapreduce.ImportJobBase: Retrieved 3 records.

Now let us see the output on our Command shell:

1

2

3

4

5

6

7

8

9

10

11

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/student

Found 2 items

-rw-r--r--   1 cloudera supergroup          0 2015-07-20 15:44 /user/cloudera/student/_SUCCESS

-rw-r--r--   1 cloudera supergroup         38 2015-07-20 15:44 /user/cloudera/student/part-m-00000

[cloudera@quickstart ~]$ hadoop fs -cat  /user/cloudera/student/part-m-00000

101,javachainAnto

102,kumar

103,Sofia

[cloudera@quickstart ~]$

Temime Diaries

Monday, February 20, 2017

Sqoop Import RDBMS Table into HDFS

Practical Steps

No comments:

Post a Comment

How To Fix Hive – Partition Table Query Failed When Stored As Parquet