Big data has been a hot-topic now, and Talend is very popular in the open source ETL community. however, there was not enough support for big data. recently, talend released a new product with the open source support called Talend Open Studio for Big Data. you can download it and play in your environment, it has built-in support for Big data. I will put several test cases here , then you can get an impression what does the product offer.
1. FROM DB to HDFS
for the db purpose , I just use the row generator to simulate the rows in db, put some simple logic to create one row serious with 3 columns, ID, name and level,
Then drag and drop the hdfsoutput coponent to the surface, connect the major output of the row generator to the hdfs. for the hdfs, just specify the name node address, and the folder to hold the file
then click to run the job, it will create a file for you which contains all the rows we generated in a CSV format.
Once we export the data from traditional db to hdfs, we can run hive query to get the results , that’s the next case to test.