Kettle is Pentaho’s ETL tool, which is also called Pentaho Data Integration (PDI).
- The Kettle 4.4.0 installation is done in below versions of Linux, Java and Hadoop respectively.
- Download kettle stable version from here
- Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory
- Unzip the tar file.
tar -xzf pdi-ce-4.4.0-stable.tar.gz
- Move data-integration to /bin/pdi-ce-4.4.0
mv data-integration/ /bin/pdi-ce-4.4.0
- Create a symlink
ln -s pdi-ce-4.4.0 data-integration
- To run Spoon:
Apache Hadoop 1.1.2 is not compatible with the Apache Hadoop 0.20.x line, and thus PDI doesn’t work with 1.1.2. Follow following steps to make it compatible 🙂
- Create Folder “hadoop-112” in hadoop-configuration directory [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations].
- Copy “hadoop-20” folder to “hadoop-112” folder.
- Replace the following JARs in the client/ subfolder [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations/hadoop-112 /lib/client] with the versions from the Apache Hadoop 1.1.2 distribution:
- Add the following JAR from the Hadoop 1.1.2 distribution to the client/ subfolder as well:
- Change the property in plugins.properties [ data-integration /plugins/ pentaho-big-data-plugin/] to point to my new folder:
- Start PDI
Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with ur valuable suggestion which are always welcome….