Replicating Oracle data to Hive
See Hive Writer for information on storage types supported and limitations.
The following example assumes the following Oracle source table:
create table employee (Employee_name varchar2(30), Employeed_id number, CONSTRAINT employee_pk PRIMARY KEY (Employeed_id));
and the following Hive target table:
CREATE TABLE employee (emp_name string, emp_id int) CLUSTERED BY (emp_id) into 2 buckets STORED AS ORC TBLPROPERTIES ('transactional'='true');
The following application will load existing data from Oracle to Hive:
CREATE SOURCE OracleJDBCSource USING DatabaseReader ( Username:'oracleuser', Password:'********', ConnectionURL:'192.0.2.75:1521:orcl', Tables:'DEMO.EMPLOYEE', FetchSize:1 ) OUTPUT TO DataStream; CREATE TARGET HiveTarget USING HiveWriter ( ConnectionURL:’jdbc:hive2://localhost:10000’, Username:’hiveuser’, Password:’********’, hadoopurl:'hdfs://18.144.17.75:9000/', Mode:’initiaload’, Tables:’DEMO.EMPLOYEE,employee’ ) INPUT FROM DataStream;
Once initial load is complete, the following application will read new data and continuously replicate it to Hive:
CREATE SOURCE OracleCDCSource USING OracleReader ( Username:'oracleuser', Password:'********', ConnectionURL:'192.0.2.75:1521:orcl', Tables:'DEMO.EMPLOYEE', FetchSize:1 ) OUTPUT TO DataStream; CREATE TARGET HiveTarget USING HiveWriter ( ConnectionURL:’jdbc:hive2://192.0.2.76:10000’, Username:’hiveuser’, Password:’********’, hadoopurl:'hdfs://192.0.2.76:9000/', Mode:’incrementalload’, Tables:’DEMO.EMPLOYEE,employee keycolumns(emp_id)’’ ) INPUT FROM DataStream;