“Missing artifact jdk.tools:jdk.tools:jar:1.6”

Amal G Jose

While using maven, we may face an error like
“Missing artifact jdk.tools:jdk.tools:jar:1.6”

This problem can be fixed by adding the below lines to your pom.xml file.
Replace ${JAVA_HOME} in the xml file with the absolute path of JAVA_HOME.


View original post


Hortonworks Hadoop 2.0 Developer Certification

My Learning Notes on Big Data!!!

Horotonworks Certification Tips and Guidelines

I successfully completed this certification on Oct 24, 2014 with a passing score of 88%.  I am sharing the experience I gained on this certification. I have given all the required materials what I have gone through for this certification.  Please have some sandbox level hands on experience on these topics before you appear for the examination.

Read all the answers of a question very carefully before you select the correct answer. Because these are little tricky.  Also 90 minutes is more than enough for this exam as you can easily complete in 45 to 60 mins. All the Best

Certification 1 – Hortonworks Certified Apache Hadoop Developer (Pig and Hive)

Exam Format

  • Registration Cost – $200/attempt (Unlimited attempts are allowed)
  • 1The exam consists of approximately 50 multiple-choice questions. The exam is delivered in English.
  • You have to clear 38 questions (75%) to get certified

View original post 487 more words

How to write MapReduce program in Java with example

MapReduce program other than WordCount

Code Hadoop : Experience exemplified

Understanding fundamental of MapReduce
MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework.

MapReduce is a game all about Key-Value pair. I will try to explain key/value pairs by covering some similar concepts in the Java standard library. The java.util.Map interface is used for key-value in Java.
For any Java Map object, its contents are a set of mappings from a given key of a specified type to a related value of a potentially different type.

In the context of Hadoop, we are referring to keys that is associated with values. This data in MapReduce is stored in such a way that the values can be sorted and rearranged (Shuffle and sort wrt to MapReduce) across a…

View original post 1,314 more words

Excel InputFormat for Hadoop MapReduce


Code Hadoop : Experience exemplified

Excel Spreadsheet Input Format for Hadoop Map Reduce
I want to read a Microsoft Excel spreadsheet using Map Reduce, and found that I cannot use Text Input format of Hadoop to fulfill my requirement. Hadoop does not understand Excel spreadsheet so I landed upon writing custom Input format to achieve the same.
Hadoop works with different types of data formats like flat text files to databases. An InputSplit is nothing more than a chunk of several blocks; it should be pretty rare to get a block boundary ending up at the exact location of a end of line (EOL). Each such split is processed by a single map. Some of my records located around block boundaries should be therefore split in 2 different blocks.

  1. How Hadoop can guarantee all lines from input files are completely read?
  2. How Hadoop can consolidate a line that is starting on block B and that…

View original post 456 more words

Hadoop: Implementing the Tool interface for MapReduce driver

Hadoop: Implementing the Tool interface for MapReduce driver


Most of people usually create their MapReduce job using a driver code that is executed though its static main method. The downside of such implementation is that most of your specific configuration (if any) is usually hardcoded. Should you need to modify some of your configuration properties on the fly (such as changing the number of reducers), you would have to modify your code, rebuild your jar file and redeploy your application. This can be avoided by implementing the Tool interface in your MapReduce driver code.

Hadoop Configuration

By implementing the Tool interface and extending Configured class, you can easily set your hadoop Configuration object via the GenericOptionsParser, thus through the command line interface. This makes your code definitely more portable (and additionally slightly cleaner) as you do not need to hardcode any specific configuration anymore.

Let’s take a couple of example with and without the use of Tool interface.

View original post 357 more words

Installing Maven on Windows 7


Tools Used:

  1. JDK 1.6
  2. Maven 3.1.1
  3. Windows 7

Step 1: Install Java if not already installed

 Install Java and add/update the JAVE_HOME variable


Step 2: Download Maven

Choose a version and download apache-maven-*-bin.zip file from here

Step 3: Extract It

Extract the downloaded zip file into desired location.

Step 4: Add MAVEN_HOME

Now, set the MAVEN_HOME variable just as you did for JAVA_HOME variable.


Step 5: Update Path variable

Append up to bin folder path in variable value, so that you can run the Maven’s command everywhere.


Step 6: Verify installation

Open command prompt. Type mvn –version in command prompt and hit enter.


If you see similar output as above, means your Apache Maven is installed successfully J



SafeModeException: Name node is in safe mode


org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/...../test. Name node is in safe mode.

In order to forcefully let the namenode leave safemode, following command should be executed:

$ bin/hadoop dfsadmin -safemode leave

-safemode isn’t a sub-command for hadoop fs, but it is of hadoop dfsadmin.

Run following command so that any inconsistencies in the hdfs might be sorted out.

$ hadoop fsck