Skip to main content

Replicating Oracle data to Hive

See Hive Writer for information on storage types supported and limitations.

The following example assumes the following Oracle source table:

create table employee (Employee_name varchar2(30), 
Employeed_id number, 
CONSTRAINT employee_pk PRIMARY KEY (Employeed_id));

and the following Hive target table:

CREATE TABLE employee (emp_name string, emp_id int)
CLUSTERED BY (emp_id) into 2 buckets 
STORED AS ORC TBLPROPERTIES ('transactional'='true');

The following application will load existing data from Oracle to Hive:

CREATE SOURCE OracleJDBCSource USING DatabaseReader (
  Username:'oracleuser',
  Password:'********',
  ConnectionURL:'192.0.2.75:1521:orcl',
  Tables:'DEMO.EMPLOYEE',
  FetchSize:1
)
OUTPUT TO DataStream;

CREATE TARGET HiveTarget USING HiveWriter (
  ConnectionURL:’jdbc:hive2://localhost:10000’,
  Username:’hiveuser’, 
  Password:’********’,
  hadoopurl:'hdfs://18.144.17.75:9000/',
  Mode:’initiaload’,
  Tables:’DEMO.EMPLOYEE,employee’
)
INPUT FROM DataStream;

Once initial load is complete, the following application will read new data and continuously replicate it to Hive:

CREATE SOURCE OracleCDCSource USING OracleReader (
  Username:'oracleuser',
  Password:'********',
  ConnectionURL:'192.0.2.75:1521:orcl',
  Tables:'DEMO.EMPLOYEE',
  FetchSize:1
)
OUTPUT TO DataStream;

CREATE TARGET HiveTarget USING HiveWriter (
  ConnectionURL:’jdbc:hive2://192.0.2.76:10000’,
  Username:’hiveuser’, 
  Password:’********’,
  hadoopurl:'hdfs://192.0.2.76:9000/',
  Mode:’incrementalload’,
  Tables:’DEMO.EMPLOYEE,employee keycolumns(emp_id)’’
)
INPUT FROM DataStream;