Install Kettle 4.4.0 on Ubuntu 13.04


Kettle is Pentaho’s ETL tool, which is also called Pentaho Data Integration (PDI).

  • The Kettle 4.4.0 installation is done in below versions of Linux, Java and Hadoop  respectively.


JAVA 1.7.0_25

HADOOP 1.1.2

  • Download kettle stable version from here
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory

     cd /Downloads

  • Unzip the tar file.

    tar -xzf pdi-ce-4.4.0-stable.tar.gz

  • Move data-integration to /bin/pdi-ce-4.4.0

    mv data-integration/ /bin/pdi-ce-4.4.0

  • Create a symlink

    cd  /bin

    ln -s pdi-ce-4.4.0 data-integration

  • To run Spoon:

    cd  /bin/data-integration


Apache Hadoop 1.1.2 is not compatible with the Apache Hadoop 0.20.x line, and thus PDI doesn’t work with 1.1.2.  Follow following steps to make it compatible 🙂

  • Create Folder “hadoop-112” in hadoop-configuration directory [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations].
  • Copy “hadoop-20” folder to “hadoop-112” folder.
  • Replace the following JARs in the client/ subfolder [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations/hadoop-112 /lib/client] with the versions from the Apache Hadoop 1.1.2 distribution:
  1.  commons-codec-<version>.jar
  2.  hadoop-core-<version>.jar    
  • Add the following JAR from the Hadoop 1.1.2  distribution to the client/ subfolder as well:


  • Change the property in [ data-integration /plugins/ pentaho-big-data-plugin/] to point to my new folder:


  • Start PDI



Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with ur valuable suggestion which are always welcome….