Tuesday, February 14, 2017

Importing data from RDBMS into Hive using create-hive-table of sqoop

Importing data from RDBMS into Hive using create-hive-table of sqoop


In the Importing data from RDBMS into Hive i blogged about how to import data from RDBMS into Hive using Sqoop. In that case the import command took care of both creating table in Hive based on RDMBS table as well as importing data from RDBMS into Hive. But Sqoop can also be used to import data stored in HDFS text file into Hive. I wanted to try that out, so what i did is i created the contact table in Hive manually and then used the contact table that i exported as text file into HDFS as input
  1. First i used sqoop import command to import content of Contact table into HDFS as text file. By default sqoop will use , for separating columns and newline for separating
    
    sqoop import --connect jdbc:mysql://macos/test --table contact -m 1
    
    After import is done i can see content of the text file by executing hdfs dfs -cat contact/part-m-00000 like this
  2. After that you can use sqoop to create table into hive based on schema of the CONTACT table in RDBMS. by executing following command
    
    sqoop create-hive-table --connect jdbc:mysql://macos/test --table Address --fields-terminated-by ','
    
  3. Last step is to use Hive for loading content of contact text file into contact table. by executing following command.
    
    LOAD DATA INPATH 'contact' into table contact;

No comments:

Post a Comment

How To Fix Hive – Partition Table Query Failed When Stored As Parquet

This article is about the bug in Hive filtering option, when the partition table query stored as parquet. Big data developers will help y...