Temime Diaries: Sqoop Import from RDBMS to Hive Table

Using Sqoop Import we can even import the data from Relational Databases to Hive table in following ways.

Its very simple use –hive-import option, If you want to just load the data from RDBMS into new Hive Table or
If you have a hive table already with data, then you can use –hive-overwrite option this option will overwrite the existing table data.

Let us practice this Sqoop Import from RDBMS to Hive table by following step by step Instructions.

Here is the Mysql Table called Student in StudentInfo Data Base.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

mysql> use StudentInfo;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;

+-----------------------+

| Tables_in_StudentInfo |

+-----------------------+

| student               |

+-----------------------+

1 row in set (0.00 sec)

mysql> select * from student;

+--------+---------------+

| std_id | std_name      |

+--------+---------------+

|    101 | javachainAnto |

|    102 | kumar         |

|    103 | Sofia         |

+--------+---------------+

3 rows in set (0.04 sec)

Let us import the above Mysql Table into Hive

use the following Sqoop Import with –hive-import Option.

sqoop import \
–connect ‘jdbc:mysql://localhost/StudentInfo’ \
–table ‘student’ \
–username root
–password cloudera
–split-by std_id \
–hive-import

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

[cloudera@quickstart ~]$ sqoop import --connect 'jdbc:mysql://localhost/StudentInfo' --table 'student' --username root --password cloudera --split-by std_id --hive-import

Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

15/07/23 18:39:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0

15/07/23 18:39:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

15/07/23 18:39:27 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override

15/07/23 18:39:27 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.

15/07/23 18:39:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

15/07/23 18:39:28 INFO tool.CodeGenTool: Beginning code generation

15/07/23 18:39:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1

15/07/23 18:39:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1

15/07/23 18:39:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce

Note: /tmp/sqoop-cloudera/compile/ea2f56451e6ded56cbd202f22ff8d898/student.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

15/07/23 18:39:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ea2f56451e6ded56cbd202f22ff8d898/student.jar

15/07/23 18:39:32 WARN manager.MySQLManager: It looks like you are importing from mysql.

15/07/23 18:39:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

15/07/23 18:39:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

15/07/23 18:39:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

15/07/23 18:39:32 INFO mapreduce.ImportJobBase: Beginning import of student

15/07/23 18:39:32 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

15/07/23 18:39:32 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

15/07/23 18:39:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

15/07/23 18:39:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

15/07/23 18:39:36 INFO db.DBInputFormat: Using read commited transaction isolation

15/07/23 18:39:37 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`std_id`), MAX(`std_id`) FROM `student`

15/07/23 18:39:37 INFO mapreduce.JobSubmitter: number of splits:3

15/07/23 18:39:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437695502077_0003

15/07/23 18:39:37 INFO impl.YarnClientImpl: Submitted application application_1437695502077_0003

15/07/23 18:39:38 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1437695502077_0003/

15/07/23 18:39:38 INFO mapreduce.Job: Running job: job_1437695502077_0003

15/07/23 18:39:48 INFO mapreduce.Job: Job job_1437695502077_0003 running in uber mode : false

15/07/23 18:39:48 INFO mapreduce.Job:  map 0% reduce 0%

15/07/23 18:40:11 INFO mapreduce.Job:  map 33% reduce 0%

15/07/23 18:40:12 INFO mapreduce.Job:  map 100% reduce 0%

15/07/23 18:40:13 INFO mapreduce.Job: Job job_1437695502077_0003 completed successfully

15/07/23 18:40:13 INFO mapreduce.Job: Counters: 30

 File System Counters

 FILE: Number of bytes read=0

 FILE: Number of bytes written=405231

 FILE: Number of read operations=0

 FILE: Number of large read operations=0

 FILE: Number of write operations=0

 HDFS: Number of bytes read=331

 HDFS: Number of bytes written=38

 HDFS: Number of read operations=12

 HDFS: Number of large read operations=0

 HDFS: Number of write operations=6

 Job Counters 

 Launched map tasks=3

 Other local map tasks=3

 Total time spent by all maps in occupied slots (ms)=63579

 Total time spent by all reduces in occupied slots (ms)=0

 Total time spent by all map tasks (ms)=63579

 Total vcore-seconds taken by all map tasks=63579

 Total megabyte-seconds taken by all map tasks=65104896

 Map-Reduce Framework

 Map input records=3

 Map output records=3

 Input split bytes=331

 Spilled Records=0

 Failed Shuffles=0

 Merged Map outputs=0

 GC time elapsed (ms)=900

 CPU time spent (ms)=3120

 Physical memory (bytes) snapshot=310714368

 Virtual memory (bytes) snapshot=4515667968

 Total committed heap usage (bytes)=182255616

 File Input Format Counters 

 Bytes Read=0

 File Output Format Counters 

 Bytes Written=38

15/07/23 18:40:13 INFO mapreduce.ImportJobBase: Transferred 38 bytes in 39.352 seconds (0.9656 bytes/sec)

15/07/23 18:40:13 INFO mapreduce.ImportJobBase: Retrieved 3 records.

15/07/23 18:40:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1

15/07/23 18:40:13 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.4.0.jar!/hive-log4j.properties

OK

Time taken: 0.869 seconds

Loading data to table default.student

chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/student/part-m-00000': User does not belong to hive

chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/student/part-m-00001': User does not belong to hive

chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/student/part-m-00002': User does not belong to hive

Table default.student stats: [numFiles=3, totalSize=38]

OK

Time taken: 1.141 seconds

Temime Diaries

Monday, February 20, 2017

Sqoop Import from RDBMS to Hive Table

No comments:

Post a Comment

How To Fix Hive – Partition Table Query Failed When Stored As Parquet