Pages

Wednesday, November 16, 2011

How to: Install and Test apache mahout on hadoop

Mahout and Hadoop are all java libraries basically, mahout use the Maven tool to build the source code and maintain the dependency.
So we need make sure we have the following bits ready.

  • JDK
  • Maven
  • Hadoop
  • Mahout

 

I will start from the fresh centos, then get all those stuff ready step by step.

install JDK.
GO to Oracle JDK download site, http://www.oracle.com/technetwork/java/javase/downloads/index.html, I still prefer the Java 6 instead of 7, pickup one the .bin link, download and run it directly. I will put the java under /usr/lib/jdk6 folder.
image 
Export the bin directory to PATH, and jdk6 to JAVA_HOME environment variable.

Install MAVEN
Download the binary package from http://maven.apache.org/download.html, here I chose the 2.2.1 version which is more stable. 
image

Extract the zip file, and link it to /usr/lib/maven, then.
Export /usr/lib/maven to the PATH. now, you can run mvn –version, to make sure it works at least we can get the version,
image
For settings like proxy, check it out here, http://maven.apache.org/download.html#Maven_Documentation

Install HADOOP.
you can check this out, if you want to install hadoop as a fully distributd mode, How to: install and config hadoop to run in fully distributed Mode, Centos.

here we just have one vm, so keep it easy for the mahout testing. I will use the Cloudera distribution,

Download the repo file for centos 5, http://archive.cloudera.com/redhat/cdh/cloudera-cdh3.repo and copy it to yum repo directory.
imageNow just search hadoop, you will see all the components,we will use the hadoop-0.20-conf-pseudo one.
yum install  hadoop-0.20-conf-pseudo

image

once done, go to /usr/lib/hadoop/conf directory, change the java home to /usr/lib/jdk6 in hadoop-env.sh

image

Then run as hdfs, format the namenode,

image
then start those daemons like /etc/init.d/hadoop-*, run JPS, you should see all the java process there,
image

now we can run a simple test, go the /usr/lib/hadoop, run

image
we can just copy one file there,
image
open a browser, go to http://localhost:50070, you can see the file we just uploaded is there,
image
Now , HDFS is ready. we can run a mapreduce job to make sure hadoop is ready.
image

If no error, we are all set, hadoop is ready.

Install Mahout.

Download one source code, you can use svn to clone one trunk copy,

svn co http://svn.apache.org/repos/asf/mahout/trunk



























and copy this code to /usr/lib/mahout

then run mvn install –DskipTests to compile the source, mvn will figure out the dependency and fetch those jars for you automatically,


image





it takes time to download all the jar, mileage depends.


Here is my MPG, take several minutes,


image





Now, export /usr/lib/mahout/bin to PATH , then we can run mahout from the shell.


If you cant exectute the mahout, give it one execute permission.


run mahout, will list all the options to go with different algorithms.


image



Then go the examples folder, run mvn compile



image



Now, you can run some example like the one to classify the news groups.



image





Here we didn’t sepecify the HADOOP_HOME, so it will run locally. the shell script will download data, prepare the data, then run the classifier.



image



when done, it will show the confusion matrix against the testing data.

image

121 comments:

  1. vry nice blog..............actually
    I have many queries....
    1. how to export /usr/lib/mahout/bin to PATH
    2. m done with installation by following ur blog...almost everything went fine bt getting error for the command "sh classify-20newsgroups.sh"
    sh: classify-20newsgroups.sh: No such file or directory

    m new to this so nt getting what is this.....

    3. is it necessary to run every command through "root"????

    thnx a lot :)

    ReplyDelete
  2. i am getting the following error
    [INFO] Reactor Summary:
    [INFO]
    [INFO] Mahout Build Tools ................................ SUCCESS [3.160s]
    [INFO] Apache Mahout ..................................... SUCCESS [0.934s]
    [INFO] Mahout Math ....................................... SUCCESS [48.081s]
    [INFO] Mahout Core ....................................... SUCCESS [1:02.933s]
    [INFO] Mahout Integration ................................ SUCCESS [2:49.722s]
    [INFO] Mahout Examples ................................... SUCCESS [2:01.534s]
    [INFO] Mahout Release Package ............................ SUCCESS [0.092s]
    [INFO] Mahout Math/Scala wrappers ........................ FAILURE [2:03.532s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD FAILURE
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 8:50.657s
    [INFO] Finished at: Fri Sep 05 19:35:01 IST 2014
    [INFO] Final Memory: 55M/312M
    [INFO] ------------------------------------------------------------------------
    [ERROR] Failed to execute goal on project mahout-math-scala: Could not resolve dependencies for project org.apache.mahout:mahout-math-scala:jar:0.9: Could not transfer artifact org.scalatest:scalatest_2.9.2:jar:1.9.1 from/to central (http://repo.maven.apache.org/maven2): GET request of: org/scalatest/scalatest_2.9.2/1.9.1/scalatest_2.9.2-1.9.1.jar from central failed: Connection reset -> [Help 1]
    [ERROR]
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :mahout-math-scala

    ReplyDelete
  3. Akila Siriweera

    Interesting RARE & worth article, works fine.
    Thanks for the creator.

    Note : http://svn.apache.org/repos/asf/mahout/trunk

    It needs to check out whole contained of the folder.
    Thumbs up....

    ReplyDelete
  4. hi when i tried to run a example sh, the data is getting downloaded in the tmp/mahout-work-root/20news-all.
    But i am getting an error,
    put: `/tmp/mahout-work-root/20news-all': No such file or directory

    Kindly help me out

    thank u so much in advance

    ReplyDelete
  5. After reading this blog i very strong in this topics and this blog really helpful to all... explanation are very clear so very easy to understand... thanks a lot for sharing this blog

    hadoop training in chennai adyar | big data training in chennai adyar

    ReplyDelete
  6. Nice post. By reading your blog, i get inspired and this provides some useful information. Thank you for posting this exclusive post for our vision. 
    Blueprism training in velachery

    Blueprism training in marathahalli


    AWS Training in chennai

    ReplyDelete
  7. Really great post, I simply unearthed your site and needed to say that I have truly appreciated perusing your blog entries. I want to say thanks for great sharing.

    java training in chennai | java training in bangalore

    java online training | java training in pune

    ReplyDelete
  8. You got an extremely helpful website I actually have been here reading for regarding an hour. I’m an initiate and your success is incredibly a lot of a concept on behalf of me.
    Data Science training in kalyan nagar | Data Science training in OMR
    Data Science training in chennai | Data science training in velachery
    Data science training in tambaram | Data science training in jaya nagar

    ReplyDelete
  9. Useful information.I am actual blessed to read this article.thanks for giving us this advantageous information.I acknowledge this post.and I would like bookmark this post.Thanks
    angularjs Training in chennai
    angularjs Training in chennai

    angularjs-Training in tambaram

    angularjs-Training in sholinganallur

    angularjs-Training in velachery

    ReplyDelete
  10. Thanks for sharing,this blog makes me to learn new thinks.
    interesting to read and understand.keep updating it.

    Article submission sites
    Education

    ReplyDelete
  11. Thanks for sharing this valuable information.Its more useful to us.its very interesting to know the blog with clear vision.

    linuxhacks
    Technology

    ReplyDelete
  12. Thanks for the great article this is very useful info thanks for the wornderful post,
    custom software development company.

    ReplyDelete
  13. Thanks for the great article this is very useful info thanks for the wornderful post,
    custom software development company.

    ReplyDelete
  14. Very nice post here thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.


    machine learning with python course in Chennai

    top institutes for machine learning in chennai

    ReplyDelete
  15. Given so much info in it, The list of your blogs are very helpful for those who want to learn more interesting facts. Keeps the users interest in the website, and keep on sharing more
    Our Credo Systemz Which is designed to offer you OpenStack Training skills required to kick-start your journey as an OpenStack Cloud Administrator.
    Please free to call us @ +91 9884412301 / 9600112302
    Openstack course training in Chennai | best Openstack course in Chennai | best Openstack certification training in Chennai | Openstack certification course in Chennai | openstack training in chennai omr | openstack training in chennai velachery

    ReplyDelete
  16. Nice blog..! I really loved reading through this article. Thanks for sharing such a
    amazing post with us and keep blogging... Best React js training near me | React js training online

    ReplyDelete
  17. Thanks for information , This is very useful for me.
    Keep sharing Lean Six Sigma Green Belt Training Bangalore


    ReplyDelete
  18. Thanks for such a great article here. I was searching for something like this for quite a long time and at last, I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays.angularjs best training center in chennai | angularjs training in velachery | angularjs training in chennai | best angularjs training institute in chennai

    ReplyDelete
  19. I really like your blog. You make it interesting to read and entertaining at the same time. I cant wait to read more from you.
    Microsoft Azure online training
    Selenium online training
    Java online training
    Python online training
    uipath online training

    ReplyDelete
  20. This comment has been removed by the author.

    ReplyDelete
  21. Hi, I like your post really I have read first-time Thanks for sharing keep up the good work.
    Very good post I am very thankful to author. This information is helpful for everyone.
    Are you looking for the best home elevators in India? Click here the link below: Home elevators | Home lift India

    ReplyDelete
  22. Great post. Thanks for sharing the post please keep sharing the post
    Thanks for sharing such a nice blog. I like it. Are you looking for the best home elevators in India? Home elevator India

    ReplyDelete
  23. Nice article
    Thanks for sharing the information
    Please visit leadmirro to know your blog rank

    ReplyDelete
  24. This comment has been removed by the author.

    ReplyDelete
  25. This comment has been removed by the author.

    ReplyDelete
  26. I was looking for the innovativedata migration service provider due to which I found your company to be the one.

    ReplyDelete
  27. Kegiatan yang biasanya dilakukan di togel online yaitu melihat kebenaran dalam hasil result togel

    ReplyDelete
  28. Thanks for sharing such a great information..Its really nice and informative...

    sap bi course

    ReplyDelete
  29. You write this post very carefully I think, which is easily understandable to me. Not only this, but another post is also good. As a newbie, this info is really helpful for me. Thanks to you.
    CCC Previous Year Question Paper

    ReplyDelete
  30. Thanks for sharing such a great information..Its really nice and informative..

    learn sap ui5

    ReplyDelete
  31. This blog is very interesting. I learned so much and want to thank you for sharing it in the first place. It is really helpful for my future endeavors. Thanks for your efforts and making it available to public
    Selenium Training in Chennai

    Selenium Training in Velachery

    Selenium Training in Tambaram

    Selenium Training in Porur

    Selenium Training in Omr

    Selenium Training in Annanagar

    ReplyDelete
  32. Fantastic blog! Thanks for sharing a very interesting post, I appreciate to blogger for an amazing post.

    Devops Training Institute in Pune
    Devops Training in Pune

    ReplyDelete
  33. Infycle Technologies, the No.1 software training institute in Chennai offers the No.1 Big Data Hadoop Training in Chennai | Infycle Technologies for students, freshers, and tech professionals. Infycle also offers other professional courses such as DevOps, Artificial Intelligence, Cyber Security, Python, Oracle, Java, Power BI, Selenium Testing, Digital Marketing, Data Science, etc., which will be trained with 200% practical classes. After the completion of training, the trainees will be sent for placement interviews in the top MNC's. Call 7502633633 to get more info and a free demo.

    ReplyDelete


  34. Trade Capital Options: TriumphFX Offers Customers The Opportunity To Trade A Wide Array Of Assets And Financial Instruments With A 30p Per Lot Commission.



















    ReplyDelete
  35. https://www.icywheels.com/blogs/1303/5808/the-advantages-of-disposable-shatter-pen
    https://everyonezone.com/read-blog/42382
    https://blackshare.net/blogs/84230/What-is-a-Shatter-Pen-Shatter-Pen-Definition

    ReplyDelete
  36. Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing.data science training in gwalior

    ReplyDelete
  37. Know the Latest Smartphone Price with MobileErBazaar.com totaly free

    Some Peopole are want to know Vivo V23e price of bangladesh it's easy to know

    visit for know about iqoo 9 se price in bangladesh

    ReplyDelete
  38. To know about AI automated service desk and help desk with rezolve.ai click the link here: https://bit.ly/3t5ygip

    ReplyDelete
  39. quite nice say. I just stumbled upon your weblog and wanted to publicize that i've without a doubt enjoyed browsing your weblog posts. After every sick be subscribing on your feed and that i dream you write inside the same manner as again quickly! Avast Secureline Keys

    ReplyDelete
  40. I cant taking office on focusing long adequate to research; lots less write this rosy of article. Youve outdone your self as quickly as this cloth truely. it's miles one of the greatest contents. Birthday Quotes For Brother

    ReplyDelete
  41. attractive, notice. I simply stumbled upon your blog and wanted to proclaim that i've appreciated browsing your weblog posts. After every one in all proportion of share of, i can really subscribe to your feed, and that i goal you may write once more quickly! Spyhunter Crack

    ReplyDelete
  42. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    mechanical engineering traineeship
    online iot course
    online course for c programming
    cyber security course with job guarantee
    electrical engineering inservice training

    ReplyDelete
  43. Great post. keep sharing such a worthy information
    Jewellery ERP Software Dubai
    Jewellery ERP Software Dubai

    ReplyDelete
  44. Cybersecurity is concerned with preventing unauthorised access to, damage to, or inaccessibility of computer systems. All information assets, whether in hard copy or digital form, are protected by information security, a more general term. To know more about cyber security, join the cyber security course in chennai at FITA Academy.

    cyber security course in chennai

    ReplyDelete