Using Sqoop Import we can even import the data from Relational Databases to Hive table in following ways.
- Its very simple use –hive-import option, If you want to just load the data from RDBMS into new Hive Table or
- If you have a hive table already with data, then you can use –hive-overwrite option this option will overwrite the existing table data.
Let us practice this Sqoop Import from RDBMS to Hive table by following step by step Instructions.
Here is the Mysql Table called Student in StudentInfo Data Base.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
mysql> use StudentInfo;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+-----------------------+
| Tables_in_StudentInfo |
+-----------------------+
| student |
+-----------------------+
1 row in set (0.00 sec)
mysql> select * from student;
+--------+---------------+
| std_id | std_name |
+--------+---------------+
| 101 | javachainAnto |
| 102 | kumar |
| 103 | Sofia |
+--------+---------------+
3 rows in set (0.04 sec)
|
Let us import the above Mysql Table into Hive
use the following Sqoop Import with –hive-import Option.
sqoop import \
–connect ‘jdbc:mysql://localhost/StudentInfo’ \
–table ‘student’ \
–username root
–password cloudera
–split-by std_id \
–hive-import
–connect ‘jdbc:mysql://localhost/StudentInfo’ \
–table ‘student’ \
–username root
–password cloudera
–split-by std_id \
–hive-import
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
|
[cloudera@quickstart ~]$ sqoop import --connect 'jdbc:mysql://localhost/StudentInfo' --table 'student' --username root --password cloudera --split-by std_id --hive-import
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/07/23 18:39:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
15/07/23 18:39:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/07/23 18:39:27 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/07/23 18:39:27 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/07/23 18:39:28 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/07/23 18:39:28 INFO tool.CodeGenTool: Beginning code generation
15/07/23 18:39:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
15/07/23 18:39:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
15/07/23 18:39:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/ea2f56451e6ded56cbd202f22ff8d898/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/07/23 18:39:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ea2f56451e6ded56cbd202f22ff8d898/student.jar
15/07/23 18:39:32 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/07/23 18:39:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/07/23 18:39:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/07/23 18:39:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/07/23 18:39:32 INFO mapreduce.ImportJobBase: Beginning import of student
15/07/23 18:39:32 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/07/23 18:39:32 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/07/23 18:39:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/23 18:39:34 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/07/23 18:39:36 INFO db.DBInputFormat: Using read commited transaction isolation
15/07/23 18:39:37 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`std_id`), MAX(`std_id`) FROM `student`
15/07/23 18:39:37 INFO mapreduce.JobSubmitter: number of splits:3
15/07/23 18:39:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1437695502077_0003
15/07/23 18:39:37 INFO impl.YarnClientImpl: Submitted application application_1437695502077_0003
15/07/23 18:39:38 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1437695502077_0003/
15/07/23 18:39:38 INFO mapreduce.Job: Running job: job_1437695502077_0003
15/07/23 18:39:48 INFO mapreduce.Job: Job job_1437695502077_0003 running in uber mode : false
15/07/23 18:39:48 INFO mapreduce.Job: map 0% reduce 0%
15/07/23 18:40:11 INFO mapreduce.Job: map 33% reduce 0%
15/07/23 18:40:12 INFO mapreduce.Job: map 100% reduce 0%
15/07/23 18:40:13 INFO mapreduce.Job: Job job_1437695502077_0003 completed successfully
15/07/23 18:40:13 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=405231
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=331
HDFS: Number of bytes written=38
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=3
Other local map tasks=3
Total time spent by all maps in occupied slots (ms)=63579
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=63579
Total vcore-seconds taken by all map tasks=63579
Total megabyte-seconds taken by all map tasks=65104896
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=331
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=900
CPU time spent (ms)=3120
Physical memory (bytes) snapshot=310714368
Virtual memory (bytes) snapshot=4515667968
Total committed heap usage (bytes)=182255616
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=38
15/07/23 18:40:13 INFO mapreduce.ImportJobBase: Transferred 38 bytes in 39.352 seconds (0.9656 bytes/sec)
15/07/23 18:40:13 INFO mapreduce.ImportJobBase: Retrieved 3 records.
15/07/23 18:40:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
15/07/23 18:40:13 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in jar:file:/usr/lib/hive/lib/hive-common-1.1.0-cdh5.4.0.jar!/hive-log4j.properties
OK
Time taken: 0.869 seconds
Loading data to table default.student
chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/student/part-m-00000': User does not belong to hive
chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/student/part-m-00001': User does not belong to hive
chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/student/part-m-00002': User does not belong to hive
Table default.student stats: [numFiles=3, totalSize=38]
OK
Time taken: 1.141 seconds
|
No comments:
Post a Comment