Monday, June 11, 2012

How-to: using talend open studio for big data tutorial-2, read Hive data

Once the data was loaded to HDFS, let’s create hive external table by using the hive shell.
   fist make sure data is there,
   image

Then using the hive shell to create one external table named customers
   

CREATE EXTERNAL TABLE customers(id INT, name STRING,
level STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\;'
STORED AS TEXTFILE
LOCATION '/usr/talend';

image
in order for the talend to connect to the hive, we need to start one hive thrift server.
image

 

Now let’s create one flow to read the data in hive. pick up the version ,and the thrift server ip/port, then put a hive query.
image

once import step, click the edit scheme and just add one column with type as object. then we will parse the result and map to our schema.
image

Click the advanced tab, to enable the parse query results, using the column we just created as object type.
image

drag the parserecordset component to the surface and conenct the mainout of hiverow to it, click edit schemas to do the mapping.
image

then match the values,
image

the the job looks like the following,
image

if we need to output the row, just put one logrow and connect, the final layout,
image

Now click to run this job, from the console it tell you whether it has connected to the hive server successfully, if failed, why.
image
on the hive server, it will show you it receive one query and parse it to mapreduce on the fly,
image
once done, you can see the results from the run console.
image

once done, we can export the result to Hbase to enable real-time application query. check it here,

1 comment:

mna said...

good tutorial!!!
i want to ask about "pick up the version ,and the thrift server ip/port", from where can i do it? i'am hadoop newbie. thank you

 
Locations of visitors to this page