Install Kettle 4.4.0 on Ubuntu 13.04


Kettle is Pentaho’s ETL tool, which is also called Pentaho Data Integration (PDI).

  • The Kettle 4.4.0 installation is done in below versions of Linux, Java and Hadoop  respectively.


JAVA 1.7.0_25

HADOOP 1.1.2

  • Download kettle stable version from here
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory

     cd /Downloads

  • Unzip the tar file.

    tar -xzf pdi-ce-4.4.0-stable.tar.gz

  • Move data-integration to /bin/pdi-ce-4.4.0

    mv data-integration/ /bin/pdi-ce-4.4.0

  • Create a symlink

    cd  /bin

    ln -s pdi-ce-4.4.0 data-integration

  • To run Spoon:

    cd  /bin/data-integration


Apache Hadoop 1.1.2 is not compatible with the Apache Hadoop 0.20.x line, and thus PDI doesn’t work with 1.1.2.  Follow following steps to make it compatible 🙂

  • Create Folder “hadoop-112” in hadoop-configuration directory [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations].
  • Copy “hadoop-20” folder to “hadoop-112” folder.
  • Replace the following JARs in the client/ subfolder [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations/hadoop-112 /lib/client] with the versions from the Apache Hadoop 1.1.2 distribution:
  1.  commons-codec-<version>.jar
  2.  hadoop-core-<version>.jar    
  • Add the following JAR from the Hadoop 1.1.2  distribution to the client/ subfolder as well:


  • Change the property in [ data-integration /plugins/ pentaho-big-data-plugin/] to point to my new folder:


  • Start PDI



Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with ur valuable suggestion which are always welcome….


Hive Installation On Ubuntu

The hive-0.10.0 installation is done in below versions of Linux, Java and Hadoop respectively.


JAVA 1.7.0_25

HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hive  in /usr/lib/hive folder.

  • Download hive stable version from this link

  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory
$ cd ~/Downloads
  • Unzip the tar file.

[go to root user by using command: su ]

# tar xzf hive-0.10.0.tar.gz
  • Create directory
# mkdir /usr/lib/hive
  • move  hive-0.10.0 to hive
 # mv hive-0.10.0 /usr/lib/hive/hive-0.10.0

[Exit from root to hduser by using command: su hduser or exit ]

  • Set the HIVE_HOME path in bashrc file

To open bashrc file use this command

hduser@system_name:~$ gedit ~/.bashrc

            In bashrc file append the below 2 statements

export HIVE_HOME=/usr/lib/hive/hive-0.10.0

export PATH=$PATH:$HIVE_HOME/bin
  •  Type hive in command line and now you can see hive shell.
$ hive

  • Now you can play with Hive 🙂

How to install MySQL on Ubuntu

The MySql installation is done in below version of Ubuntu.


  • First of all, make sure your package management tools are up-to-date. Also make sure you install all the latest software available.

            sudo apt-get update

             sudo apt-get dist-upgrade

  • Install the MySQL server and client packages:

sudo apt-get install mysql-server mysql-client

The apt-get command will also install the mysql-client package which is necessary to login to mysql from the server itself.

During the installation, MySQL will ask you to set a root password.


  • You can now access your MySQL server like this:

mysql -u root -p


  •  Have fun using MySQL Server 🙂
  • What is mysql server and mysql client

The mysql server package will install the mysql database server which you can interact with using a mysql client. You can use the mysql client to send commands to any mysql server; on a remote computer or your own.

The mysql server is used to persist the data and provide a query interface for it (SQL). The mysql clients purpose is to allow you to use that query interface.

How to install Java in Ubuntu

The JDK 7 installation is done in below version of Ubuntu.


  •  Download  from here 32bit or 64bit Linux “compressed binary file” – it has a “.tar.gz” file extension i.e. “[java-version]-i586.tar.gz” for 32bit and “[java-version]-x64.tar.gz” for 64bit
  • Once the download is complete, uncompressed the file using following command

sudo tar -xvf jdk-7u25-linux-i586.tar.gz

JDK 7 package is extracted into /jdk1.7.0_25 directory.

  • Now move the JDK 7 directory to /usr/lib

            sudo mkdir -p /usr/lib/jvm

            sudo mv jdk1.7.0_25/ /usr/lib/jvm/jdk1.7.0_25

  • Now run

sudo update-alternatives –install “/usr/bin/java” “java” “/usr/lib/jvm/jdk1.7.0_25/bin/java” 1

sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/lib/jvm/jdk1.7.0_25/bin/javac” 1

sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/lib/jvm/jdk1.7.0_25/bin/javaws” 1

  • Run

sudo update-alternatives –config java

You will see output similar one below – choose the number of jdk1.7.0_25

There are 2 choices for the alternative java (providing /usr/bin/java).

 Selection    Path                                           Priority          Status


  0      /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java     1071   auto mode

  1      /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java    1071   manual mode

* 2     /usr/lib/jvm/jdk1.7.0_25/bin/java                       1       manual mode

 Press enter to keep the current choice[*], or type selection number: 2


  • Repeat the above step 5 for below commands:

sudo update-alternatives –config javac

sudo update-alternatives –config javaws

  • Check the version of your new JDK 7 installation:

java -version

java version “1.7.0_25”

Java(TM) SE Runtime Environment (build 1.7.0_25-b15)

Java HotSpot(TM) Server VM (build 23.25-b01, mixed mode)

  • Set JAVA_HOME in Ubuntu.

Open the .bashrc

gedit ~/.bashrc

Now add the following to the end of the file.

export JAVA_HOME=/usr/lib/jvm/java-1.7.0_25

export PATH=$PATH:$JAVA_HOME/bin

NOTE: If /usr/lib/jvm/java does not match the actual JAVA_HOME path in your environment, then set the actual JAVA_HOME, where you have installed Java in your machine.

Now, JDK 7 has been successfully installed on your Ubuntu 🙂

Installing Eclipse in Ubuntu

Before installing eclipse IDE you need to check few things.

First, check whether you have Java installed or not. For that you need to run following command in your terminal

hduser@archana:~$ java -version

If you get the java version as a output that means you have java otherwise u need to install java.

  • First download the eclipse tar.gz package from here
  • Use the below command line to extract the tar.gz package.

hduser@archana:~$ tar xzf eclipse-jee-kepler-R-linux-gtk.tar.gz


you can also use- right click on eclipse-jee-kepler-R-linux-gtk.tar.gz and chose extract here option.

  • Move the extracted eclipse in the /opt/ folder.

hduser@archana:~$ sudo mv eclipse /opt/

  • Create a desktop file and place it into /usr/share/applications

hduser@archana:~$ gedit /usr/share/applications/eclipse.desktop

and copy the following to the eclipse.desktop file

[Desktop Entry]






Comment=Integrated Development Environment




  •  Create a symlink in /usr/local/bin using

hduser@archana:~$ cd /usr/local/bin

hduser@archana:~$ sudo ln -s /opt/eclipse/eclipse

  •   Now goto /usr/share/applications and find eclipse.desktop file for launching eclipse , you can drag this file to the launcher.
  • Start Eclipse