“Missing artifact jdk.tools:jdk.tools:jar:1.6”

Amal G Jose

While using maven, we may face an error like
“Missing artifact jdk.tools:jdk.tools:jar:1.6”

This problem can be fixed by adding the below lines to your pom.xml file.
Replace ${JAVA_HOME} in the xml file with the absolute path of JAVA_HOME.


View original post


Hortonworks Hadoop 2.0 Developer Certification

My Learning Notes on Big Data!!!

Horotonworks Certification Tips and Guidelines

I successfully completed this certification on Oct 24, 2014 with a passing score of 88%.  I am sharing the experience I gained on this certification. I have given all the required materials what I have gone through for this certification.  Please have some sandbox level hands on experience on these topics before you appear for the examination.

Read all the answers of a question very carefully before you select the correct answer. Because these are little tricky.  Also 90 minutes is more than enough for this exam as you can easily complete in 45 to 60 mins. All the Best

Certification 1 – Hortonworks Certified Apache Hadoop Developer (Pig and Hive)

Exam Format

  • Registration Cost – $200/attempt (Unlimited attempts are allowed)
  • 1The exam consists of approximately 50 multiple-choice questions. The exam is delivered in English.
  • You have to clear 38 questions (75%) to get certified

View original post 487 more words

How to write MapReduce program in Java with example

MapReduce program other than WordCount

Code Hadoop : Experience exemplified

Understanding fundamental of MapReduce
MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework.

MapReduce is a game all about Key-Value pair. I will try to explain key/value pairs by covering some similar concepts in the Java standard library. The java.util.Map interface is used for key-value in Java.
For any Java Map object, its contents are a set of mappings from a given key of a specified type to a related value of a potentially different type.

In the context of Hadoop, we are referring to keys that is associated with values. This data in MapReduce is stored in such a way that the values can be sorted and rearranged (Shuffle and sort wrt to MapReduce) across a…

View original post 1,314 more words

Excel InputFormat for Hadoop MapReduce


Code Hadoop : Experience exemplified

Excel Spreadsheet Input Format for Hadoop Map Reduce
I want to read a Microsoft Excel spreadsheet using Map Reduce, and found that I cannot use Text Input format of Hadoop to fulfill my requirement. Hadoop does not understand Excel spreadsheet so I landed upon writing custom Input format to achieve the same.
Hadoop works with different types of data formats like flat text files to databases. An InputSplit is nothing more than a chunk of several blocks; it should be pretty rare to get a block boundary ending up at the exact location of a end of line (EOL). Each such split is processed by a single map. Some of my records located around block boundaries should be therefore split in 2 different blocks.

  1. How Hadoop can guarantee all lines from input files are completely read?
  2. How Hadoop can consolidate a line that is starting on block B and that…

View original post 456 more words

Hadoop: Implementing the Tool interface for MapReduce driver

Hadoop: Implementing the Tool interface for MapReduce driver


Most of people usually create their MapReduce job using a driver code that is executed though its static main method. The downside of such implementation is that most of your specific configuration (if any) is usually hardcoded. Should you need to modify some of your configuration properties on the fly (such as changing the number of reducers), you would have to modify your code, rebuild your jar file and redeploy your application. This can be avoided by implementing the Tool interface in your MapReduce driver code.

Hadoop Configuration

By implementing the Tool interface and extending Configured class, you can easily set your hadoop Configuration object via the GenericOptionsParser, thus through the command line interface. This makes your code definitely more portable (and additionally slightly cleaner) as you do not need to hardcode any specific configuration anymore.

Let’s take a couple of example with and without the use of Tool interface.

View original post 357 more words

Installing Maven on Windows 7


Tools Used:

  1. JDK 1.6
  2. Maven 3.1.1
  3. Windows 7

Step 1: Install Java if not already installed

 Install Java and add/update the JAVE_HOME variable


Step 2: Download Maven

Choose a version and download apache-maven-*-bin.zip file from here

Step 3: Extract It

Extract the downloaded zip file into desired location.

Step 4: Add MAVEN_HOME

Now, set the MAVEN_HOME variable just as you did for JAVA_HOME variable.


Step 5: Update Path variable

Append up to bin folder path in variable value, so that you can run the Maven’s command everywhere.


Step 6: Verify installation

Open command prompt. Type mvn –version in command prompt and hit enter.


If you see similar output as above, means your Apache Maven is installed successfully J



SafeModeException: Name node is in safe mode


org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/...../test. Name node is in safe mode.

In order to forcefully let the namenode leave safemode, following command should be executed:

$ bin/hadoop dfsadmin -safemode leave

-safemode isn’t a sub-command for hadoop fs, but it is of hadoop dfsadmin.

Run following command so that any inconsistencies in the hdfs might be sorted out.

$ hadoop fsck




Namenode not starting

I was using Hadoop in a pseudo-distributed mode and everything was working fine. But when I restarted my computer I can’t start Namenode. Only way I can start Namenode is by formatting it and I end up losing data in HDFS.

  • Make following changes to start Namenode 

In conf/hdfs-site.xml, you should have a property like





The property “dfs.name.dir” allow you to control where Hadoop writes NameNode metadata. And giving it another dir rather than /tmp makes sure the NameNode data isn’t being deleted when you reboot.

Format Namenode after you change it

$ bin/hadoop namenode -format

$ bin/hadoop start-all.sh



Pig Installation on Ubuntu

pigExecution Modes

Pig has two execution modes :

  • Local Mode – To run Pig in local mode, you need access to a single machine; all files are installed and run using your local host and file system. Specify local mode using the -x flag (pig -x local).
  • MapReduce Mode – To run Pig in MapReduce mode, you need access to a Hadoop cluster and HDFS installation. MapReduce mode is the default mode; you can, but don’t need to, specify it using the -x flag (pig OR pig -x mapreduce).

The pig-0.11.1 installation is done in below versions of Linux and Hadoop respectively.


HADOOP 1.1.2

I have hduser as a dedicated hadoop system user. I had installed my Hadoop in /home/hduser/hadoop folder. Now I am going to install pig in /usr/lib/pig folder.

  • Download Pig from here.
  • Enter into the directory where the stable version is downloaded. By default it downloads in “Downloads” directory.
$ cd Downloads/
  • Unzip the tar file.
$ tar -xvf pig-0.11.1.tar.gz
  • Create directory
$ sudo mkdir /usr/lib/pig
  • move pig-0.11.1 to pig
$ mv pig-0.11.1 /usr/lib/pig/
  • Set the PIG_HOME path in bashrc file

To open bashrc file use this command

$ gedit ~/.bashrc

 In bashrc file append the below 2 statements

export PIG_HOME=/usr/lib/pig/pig-0.11.1
export PATH=$PATH:$PIG_HOME/bin
  • Restart your computer or use [ . .bashrc]

Now let’s test the installation

On the command prompt type

$ pig -h

It shows the help related to Pig, and its various commands.

  • Starting pig in local mode
 $ pig -x local grunt>
  •  Starting pig in mapreduce mode
 $ pig -x mapreduce


 $ pig



Note:The information provided here is best of my knowledge and experience if at all any modifications are to be made please help me with your valuable suggestion which are always welcome…. :)

How to add RevolverMaps Widget to your Blog /Website


This widget displays all visitor locations as well as recent hits with city, state and country information live and in real time. A click on the enlarge button opens the live statistics page.

Go to the site http://www.revolvermaps.com/ and click on “Get Standard Version”.

Customise the look of your globe by changing Globe, Dimensions, Colors and Advanced Settings to suit your tastes by clicking on the round button


Copy the code from step number 5 [Copy The Code Your Site…]

  •  How to add a widget to WordPress.com?

Login to your WordPress account

  1. Go to ‘My Blog’ – ‘Dashboard’ – ‘Appearance’ – ‘Widgets’
  2. Drag the Element ‘Text – Arbitrary text or HTML’ to the sidebar
  3. Copy the code from the RevolverMaps setup page to the big textbox, optionally add a title
  4. Click on save, you’re done.
  • How to add a widget to a blogger.com (blogspot.com) layout?

Login to your Blogger-account

  1. Choose your blog on the dashboard, click on ‘Layout’. You get an overview of the page elements on your blog.
  2. Click on one of the ‘Add a Gadget’ links, a pop-up opens
  3. Under ‘Basics’ click on ‘HTML/JavaScript’
  4. Paste the code you get at revolvermaps.com into ‘Content’, optionally add a title
  5. Click on ‘SAVE’
  6. Drag the new page element representing the widget to a position of your choice
  7. Click on ‘PREVIEW’, check if the widget fits into your layout. You may have to experiment a little in order to find appropriate size settings for the widget.
  8. Click on ‘SAVE’, you’re done
  •   How to add a widget to Website?

Copy the code from the RevolverMaps setup page into your web page html code.


Happy bloging 🙂

%d bloggers like this: