Excel InputFormat for Hadoop MapReduce


Code Hadoop : Experience exemplified

Excel Spreadsheet Input Format for Hadoop Map Reduce
I want to read a Microsoft Excel spreadsheet using Map Reduce, and found that I cannot use Text Input format of Hadoop to fulfill my requirement. Hadoop does not understand Excel spreadsheet so I landed upon writing custom Input format to achieve the same.
Hadoop works with different types of data formats like flat text files to databases. An InputSplit is nothing more than a chunk of several blocks; it should be pretty rare to get a block boundary ending up at the exact location of a end of line (EOL). Each such split is processed by a single map. Some of my records located around block boundaries should be therefore split in 2 different blocks.

  1. How Hadoop can guarantee all lines from input files are completely read?
  2. How Hadoop can consolidate a line that is starting on block B and that…

View original post 456 more words


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s