Monday, June 11, 2012

How-to: using talend open studio for big data tutorial-1

Big data has been a hot-topic now, and Talend is very popular in the open source ETL community. however, there was not enough support for big data. recently, talend released a new product with the open source support called Talend Open Studio for Big Data. you can download it and play in your environment, it has built-in support for Big data. I will put several test cases here , then you can get an impression what does the product offer.

for the db purpose , I just use the row generator to simulate the rows in db, put some simple logic to create one row serious with 3 columns, ID, name and level,


        Then drag and drop the hdfsoutput coponent to the surface, connect the major output of the row generator to the hdfs. for the hdfs, just specify the name node address, and the folder to hold the file
    then click to run the job, it will create a file for you which contains all the rows we generated in a CSV format.

remember to pickup the right version of you hadoop environment, and when done, you can tell the time taken to transfer the rows between two systems.

Once we export the data from traditional db to hdfs, we can run hive query to get the results , that’s the next case to test.


sreedevi said...

nice blog keep updating your blog and i am waiting for your next update also Big Data Hadoop Online course Hyderabad

Nicole kristen said...

Awesome post. Its so much informative for the followers. I like the way you describe this post. I wanted to write a little comment to support you and wish you a good continuation. Its really helpful for the users of this site. Got to learn new things from your Blog.
Talend Tutorial

Locations of visitors to this page