Pig Installation on Ubuntu

pigExecution Modes

Pig has two execution modes :

  • Local Mode – To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).
  • MapReduce Mode – To run Pig in MapReduce mode, you need access to a Hadoop cluster and HDFS installation. MapReduce mode is the default mode; you can, but don’t need to, specify it using the -x flag (pig OR pig -x mapreduce).

The pig-0.11.1 installation is done in below versions of Linux and Hadoop respectively.

UBUNTU 13.4

HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install pig in /usr/lib/pig folder.

  • Download Pig from here.
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory.
$ cd Downloads/
  • Unzip the tar file.
$ tar -xvf pig-0.11.1.tar.gz
  • Create directory
$ sudo mkdir /usr/lib/pig
  • move pig-0.11.1 to pig
$ mv pig-0.11.1 /usr/lib/pig/
  • Set the PIG_HOME path in bashrc file

To open bashrc file use this command

$ gedit ~/.bashrc

 In bashrc file append the below 2 statements

export PIG_HOME=/usr/lib/pig/pig-0.11.1
export PATH=$PATH:$PIG_HOME/bin
  • Restart your computer or use [ . .bashrc]

Now let’s test the installation

On the command prompt type

$ pig -h

It shows the help related to Pig, and its various commands.

  • Starting pig in local mode
 $ pig -x local grunt>
  •  Starting pig in mapreduce mode
 $ pig -x mapreduce

                        or

 $ pig

Reference:

http://pig.apache.org/docs/r0.10.0/start.html

Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with your valuable suggestion which are always welcome…. :)

Install MongoDB on Ubuntu

MongoDB is an open-source document database, and the leading NoSQL database. Written in C++.

This example is using MongoDB 2.4.6, running on Ubuntu13.4, both MongoDB client and server console are run on localhost, same machine.

  • Download MongoDB from here.
  • Enter into the directory where the MongoDB is downloaded. By default it downloads in “Downloads” directory
$ cd Downloads/
  • Unzip the tar file.
$ tar xzf mongodb-linux-i686-2.4.6.tgz
  • Move mongodb-linux-i686-2.4.6 to mongodb
$ sudo mkdir /usr/lib/mongodb

$ sudo mv mongodb-linux-i686-2.4.6 /usr/lib/mongodb/

  • Before you start mongod for the first time, you will need to create the data directory. By default, mongod writes data to the /data/db/ directory. To create this directory, and set the appropriate permissions use the following commands:
# mkdir -p /data/db

# chmod 777 /data/*
 

1st command prompt: Start mongodb server

$ cd /usr/lib/mongodb/mongodb-linux-i686-2.4.6/bin/

$ ./mongod

mongo

2nd command prompt: Start the client

$ cd /usr/lib/mongodb/mongodb-linux-i686-2.4.6/bin/

$ ./mongo

mongo1

mongo2

Reference:

http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-x/

http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

Note: The information provided here is best of my knowledge and experiences if at all any modifications are to be made please help me with your valuable suggestions which are always welcome…. :)

Mount and Unmount USB drive in Ubuntu

Mount the Drive

Step 1 : Go to media  

$ cd /media

Step 2 : Create the Mount Point

Now we need to create a mount point for the device, let’s say we want to call it “usb-drive “. You can call it whatever you want. Create the mount point:

$ sudo mkdir usb-drive

 Step 3 : Mount the Drive

We can now mount the drive. Let’s say the device is /dev/sdb1, the filesystem is FAT16 or FAT32 , and we want to mount it at /media/usb-drive (having already created the mount point)

 $ sudo mount -t vfat /dev/sdb1 /media/usb-drive

 Step 4 : Check USB drive contents

 $ ls /media/usb-drive/

Unmount the Drive

$ sudo umount /dev/sdb1

            Or

$ sudo umount /media/usb-drive

—————————-*———————*——————–*———————————–

NOTE
Error: umount: /media/usb-drive: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
Solution: It means that some process has a working directory or an open file handle underneath the mount point. The best thing to do is to close the file before unmounting.

Note: The information provided here is best of my knowledge and experiences if at all any modifications are to be made please help me with your valuable suggestions which are always welcome…. 🙂

Sqoop Installation

The Sqoop installation is done in below versions of Linux and Hadoop

UBUNTU 13.4

HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install sqoop in /usr/lib/sqoop folder.

  •  Download sqoop from here
  •  Enter into the directory where the sqoop is downloaded. By default it downloads in “Downloads” directory
            $ cd Downloads/
  • Unzip the tar file.
            $ tar xzf sqoop-1.4.0-incubating.tar.gz
  • Move sqoop-1.4.0-incubating to sqoop
 	    $ sudo mv sqoop-1.4.0-incubating /usr/lib/sqoop
  • Set the SQOOP_HOME path in bashrc file

To open bashrc file use this command

 	    $ gedit ~/.bashrc 

In bashrc file append the below 2 statements

export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin           
  • Test your installation by typing
            $ sqoop help

Reference:

http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html

 

Note: The information provided here is best of my knowledge and experiences if at all any modifications are to be made please help me with your valuable suggestions which are always welcome…. :)

Install TeamViewer in Ubuntu 13.4

teamviewer-200x175

  • Open Terminal and execute following commands
  • sudo dpkg -i teamviewer_linux.deb

 team

Installing Pseudo- Distributed HBase on Ubuntu

HBase run modes: Standalone and Distributed

Standalone mode: By default HBase runs in standalone mode. In standalone mode, HBase does not use HDFS. 

Distributed mode: Distributed mode can be subdivided into distributed but all daemons run on a single node is pseudo-distributed— and fully-distributed where the daemons are spread across all nodes in the cluster.

Hadoop version support matrix

HBase-0.92.x HBase-0.94.x HBase-0.95
Hadoop-0.20.205

S

X

X

Hadoop-0.22.x

S

X

X

Hadoop-1.0.0-1.0.2[a]

S

S

X

Hadoop-1.0.3+

S

S

S

Hadoop-1.1.x

NT

S

S

Hadoop-0.23.x

X

S

NT

Hadoop-2.x

X

S

S

[a] HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.

Where

S = supported and tested,
X = not supported,
NT = it should run, but not tested enough.

Pseudo- Distributed Installation

The hbase-0.94.8 installation is done in below versions of Linux, Java and Hadoop respectively.

UBUNTU 13.4

JAVA 1.7.0_25

HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hbase in /usr/lib/hbase folder.

  • Download hbase<version>.tar.gz stable version from here
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory
$ cd Downloads/
  • Unzip the tar file.
$ tar -xvf hbase-0.94.8.tar.gz
  • Create directory
$ sudo mkdir /usr/lib/hbase
  • move  hbase-0.94.8 to hbase
$ mv hbase-0.94.8 /usr/lib/hbase/hbase-0.94.8
  • Open your hbase/conf/hbase-env.sh and modify these lines
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_25

export HBASE_REGIONSERVERS=/usr/lib/hbase/hbase-0.94.8/conf/regionservers

export HBASE_MANAGES_ZK=true
  • Set the HBASE_HOME path in bashrc file

To open bashrc file use this command

$ gedit ~/.bashrc

In bashrc file append the below 2 statements

export HBASE_HOME=/usr/lib/hbase/hbase-0.94.8

export PATH=$PATH:$HBASE_HOME/bin

 

  • Update hbase-site.xml in HBASE_HOME/conf folder with required properties.

    hbase-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hbase.rootdir</name>

<value>hdfs://localhost:9000/hbase</value>

</property>

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

<property>

<name>hbase.zookeeper.quorum</name>

<value>localhost</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>hbase.zookeeper.property.clientPort</name>

<value>2181</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hduser/hbase/zookeeper</value>

</property>

</configuration>
  • Now check Hadoop version support matrix. If Hadoop is not supported your hbase version then you will get some exception. To fix this simply copy hadoop-core-*.jar from your HADOOP_HOME and commons-collections-*.jar from HADOOP_HOME/lib folder into your HBASE_HOME/lib folder.
  • Extra steps

In /etc/hosts there are two entries:127.0.0.1 and 127.0.1.1.Change the second entry 127.0.1.1 to 127.0.0.1  otherwise it gives error: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

  • To start Hbase [ First start hadoop ]
hduser@archana:~$ start-hbase.sh

localhost: starting zookeeper, logging to /usr/lib/hbase/hbase-0.94.8/bin/../logs/hbase-hduser-zookeeper-archana.out
 starting master, logging to /usr/lib/hbase/hbase-0.94.8/logs/hbase-hduser-master-archana.out
 localhost: starting regionserver, logging to /usr/lib/hbase/hbase-0.94.8/bin/../logs/hbase-hduser-regionserver-archana.out

jps command list down all currently running processes

hduser@archana:~$ jps

 4334 HQuorumPeer
 2882 SecondaryNameNode
 4867 Jps
 3207 TaskTracker
 2460 NameNode
 4671 HRegionServer
 4411 HMaster
 2977 JobTracker
 2668 DataNode

Hbase Shell

hduser@archana:~$ hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.
 Type "exit<RETURN>" to leave the HBase Shell
 Version 0.94.8, r1485407, Wed May 22 20:53:13 UTC 2013

hbase(main):001:0> create 't1','c1'
  • To stop HBase
HBASE_PATH$ bin/stop-hbase.sh

stopping hbase...............

To use the web interfaces

http://localhost:60010 for master
http://localhost:60030 for region server

  • Reference :

http://hbase.apache.org/book/standalone_dist.html

http://hbase.apache.org/book/standalone_dist.html#confirm

Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with ur valuable suggestion which are always welcome…. :)

Installing Apache HBase on Ubuntu for Standalone Mode

Standalone HBase

By default HBase runs in standalone mode. In standalone mode, HBase does not use HDFS — it uses the local file system instead — and it runs all HBase daemons and a local zookeeper all up in the same JVM. Zookeeper binds to a well-known port so clients may talk to HBase. HBase requires java 6 or newer version. If this is not the case, HBase will not start.

The hbase-0.94.8 installation is done in below versions of Linux, Java and Hadoop respectively.

UBUNTU 13.4

JAVA 1.7.0_25

HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hbase in /usr/lib/hbase folder.

  • Download hbase-0.94.8.tar.gz from here
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory
$ cd Downloads/
  • Unzip the tar file.
$ tar -xvf hbase-0.94.8.tar.gz
  • Create directory
$ sudo mkdir /usr/lib/hbase
  • move  hbase-0.94.8 to hbase
$ mv hbase-0.94.8 /usr/lib/hbase/hbase-0.94.8
  • Configuring HBase with java

Open your hbase/conf/hbase-env.sh and set the path to the java installed in your system

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_25
  • Set the HBASE_HOME path in bashrc file

To open bashrc file use this command

hduser@system_name:~$ gedit ~/.bashrc

In bashrc file append the below 2 statements

export HBASE_HOME=/usr/lib/hbase/hbase-0.94.8

export PATH=$PATH:$HBASE_HOME/bin
  •  At this point, you are ready to start HBase. But before starting it, you might want to edit conf/hbase-site.xml and set the directory you want HBase to write to, hbase.rootdir.
  •  By default, hbase.rootdir is set to /tmp/hbase-${user.name} which means you’ll lose all your data whenever your server reboots
  •  So replace DIRECTORY in the hbase-site.xml with a path to a directory where you want HBase to store its data.
  •  hbase-site.xml
<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

<name>hbase.rootdir</name>

<value>file:///home/hduser/HBASE/hbase</value>

</property>

<property>

<name>hbase.zookeeper.property.dataDir</name>

<value>/home/hduser/HBASE/zookeeper</value>

</property>

</configuration>
  • Extra steps

In /etc/hosts there are two entries:127.0.0.1 and 127.0.1.1.Change the second entry 127.0.1.1 to 127.0.0.1  otherwise it gives error: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

  • To start Hbase [ in standalone mode no need to start hadoop ]
HBASE_PATH$bin/start-hbase.sh

HBASE_PATH$ bin/hbase shell
  • To stop HBase
HBASE_PATH$ bin/stop-hbase.sh

stopping hbase...............
  • To use the web interfaces

http://localhost:60010 for master
http://localhost:60030 for region server

  • Reference :

http://archive.cloudera.com/cdh/3/hbase-0.90.1-cdh3u0/quickstart.html

http://archive.cloudera.com/cdh/3/hbase-0.90.1-cdh3u0/notsoquick.html

Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with ur valuable suggestion which are always welcome…. :)

Install Kettle 4.4.0 on Ubuntu 13.04

images

Kettle is Pentaho’s ETL tool, which is also called Pentaho Data Integration (PDI).

  • The Kettle 4.4.0 installation is done in below versions of Linux, Java and Hadoop  respectively.

UBUNTU 13.4

JAVA 1.7.0_25

HADOOP 1.1.2

  • Download kettle stable version from here
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory

     cd /Downloads

  • Unzip the tar file.

    tar -xzf pdi-ce-4.4.0-stable.tar.gz

  • Move data-integration to /bin/pdi-ce-4.4.0

    mv data-integration/ /bin/pdi-ce-4.4.0

  • Create a symlink

    cd  /bin

    ln -s pdi-ce-4.4.0 data-integration

  • To run Spoon:

    cd  /bin/data-integration

    ./spoon.sh

Apache Hadoop 1.1.2 is not compatible with the Apache Hadoop 0.20.x line, and thus PDI doesn’t work with 1.1.2.  Follow following steps to make it compatible 🙂

  • Create Folder “hadoop-112” in hadoop-configuration directory [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations].
  • Copy “hadoop-20” folder to “hadoop-112” folder.
  • Replace the following JARs in the client/ subfolder [data-integration /plugins/ pentaho-big-data-plugin/hadoop-configurations/hadoop-112 /lib/client] with the versions from the Apache Hadoop 1.1.2 distribution:
  1.  commons-codec-<version>.jar
  2.  hadoop-core-<version>.jar    
  • Add the following JAR from the Hadoop 1.1.2  distribution to the client/ subfolder as well:

       commons-configuration-<version>.jar

  • Change the property in plugins.properties [ data-integration /plugins/ pentaho-big-data-plugin/] to point to my new folder:

active.hadoop.configuration=hadoop-112

  • Start PDI

    ./spoon.sh

    Reference:

http://funpdi.blogspot.in/2013/03/pentaho-data-integration-44-and-hadoop.html

Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with ur valuable suggestion which are always welcome….

Hive Installation On Ubuntu

The hive-0.10.0 installation is done in below versions of Linux, Java and Hadoop respectively.

UBUNTU 13.4

JAVA 1.7.0_25

HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install hive  in /usr/lib/hive folder.

  • Download hive stable version from this link

http://mirror.tcpdiag.net/apache/hive/stable/

  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory
$ cd ~/Downloads
  • Unzip the tar file.

[go to root user by using command: su ]

# tar xzf hive-0.10.0.tar.gz
  • Create directory
# mkdir /usr/lib/hive
  • move  hive-0.10.0 to hive
 # mv hive-0.10.0 /usr/lib/hive/hive-0.10.0

[Exit from root to hduser by using command: su hduser or exit ]

  • Set the HIVE_HOME path in bashrc file

To open bashrc file use this command

hduser@system_name:~$ gedit ~/.bashrc

            In bashrc file append the below 2 statements

export HIVE_HOME=/usr/lib/hive/hive-0.10.0

export PATH=$PATH:$HIVE_HOME/bin
  •  Type hive in command line and now you can see hive shell.
$ hive

hive>
  • Now you can play with Hive 🙂

How to install MySQL on Ubuntu

The MySql installation is done in below version of Ubuntu.

UBUNTU 13.4

  • First of all, make sure your package management tools are up-to-date. Also make sure you install all the latest software available.

            sudo apt-get update

             sudo apt-get dist-upgrade

  • Install the MySQL server and client packages:

sudo apt-get install mysql-server mysql-client

The apt-get command will also install the mysql-client package which is necessary to login to mysql from the server itself.

During the installation, MySQL will ask you to set a root password.

c

  • You can now access your MySQL server like this:

mysql -u root -p

 mysql>

  •  Have fun using MySQL Server 🙂
  • What is mysql server and mysql client

The mysql server package will install the mysql database server which you can interact with using a mysql client. You can use the mysql client to send commands to any mysql server; on a remote computer or your own.

The mysql server is used to persist the data and provide a query interface for it (SQL). The mysql clients purpose is to allow you to use that query interface.

%d bloggers like this: