You might get scarred when you first try to download and run Tika on windows, If you dont have some experience of SVN and Maven like me. Here is a quick tutorial to go through these processes.
1. Use subversion client to download the source code of tika.
there is one Windows Shell Extension for Subversion, just download and install it to your windows box. then Right click one folder like C:\temp\tike, and CLick the Svn checkout context menu.
enter the SVN source url. http://svn.apache.org/repos/asf/tika/trunk
it may take couple seconds to download the source code . Click Ok when done.
2. Download Maven , the Build utility like the msbuild, ant. and put the mvn.bat folder to windows PATH.
After the path is set, you should be able to run “mvn” at the command prompt.
3. Go the the download tike source folder c:\temp\tika. and run “mvm install”
the builder will download necessary component and compile the project. this make take a while
4. run the tika app now.
go to that folder, run “java –jar tika-app-0.8-snapshot.jar –m a.txt”
to pull the metadata of a.txt
or –t yourpdf.pdf to extract the pdf file content