Monday, June 11, 2012

How-to: using talend open studio for big data tutorial-1

Big data has been a hot-topic now, and Talend is very popular in the open source ETL community. however, there was not enough support for big data. recently, talend released a new product with the open source support called Talend Open Studio for Big Data. you can download it and play in your environment, it has built-in support for Big data. I will put several test cases here , then you can get an impression what does the product offer.
image
1. FROM DB to HDFS

for the db purpose , I just use the row generator to simulate the rows in db, put some simple logic to create one row serious with 3 columns, ID, name and level,

image

        Then drag and drop the hdfsoutput coponent to the surface, connect the major output of the row generator to the hdfs. for the hdfs, just specify the name node address, and the folder to hold the file
image
    then click to run the job, it will create a file for you which contains all the rows we generated in a CSV format.
image

remember to pickup the right version of you hadoop environment, and when done, you can tell the time taken to transfer the rows between two systems.
image

Once we export the data from traditional db to hdfs, we can run hive query to get the results , that’s the next case to test.

No comments:

 
Locations of visitors to this page