Friday, December 18, 2009

Hadoop Solution for the Hoppity problem

The latest in the Code Snippet series.

A highly contrived solution to the Hoppity problem, using Hadoop.

Why in the world did I do this? After all, Hadoop is a batch framework for working with large amounts of data in parallel, hardly the right tool to solve a trivial small-data problem on a single laptop.

I did it primarily to exercise my Hadoop skills, though even here it's a light workout. But I think Hadoop is well worth learning and practicing, so here goes!

The Setup
- You have to have Hadoop installed. For that task, I used Michael Noll's excellent Hadoop-on-Ubuntu blog entry.
- If you have a problem running after Michael's instructions, if it's a host naming problem, be sure your hosts file has as 'localhost'.

Run scripts
You'll probably want to use utility scripts for the following tasks:

# Compile
javac -cp /usr/local/hadoop/hadoop-0.20.1/hadoop-0.20.1-core.jar:/usr/local/hadoop/hadoop-0.20.1/lib/commons-cli-1.2.jar
jar -cfv hoppity.jar *.class

# Make a directory
bin/hadoop dfs -mkdir /user/hadoop/HoppityInput

# Copy the file containing Hoppity input into the directory

bin/hadoop dfs -copyFromLocal numHops.txt /user/hadoop/HoppityInput

# Run
bin/hadoop jar hoppity.jar Hoppity /user/hadoop/HoppityInput /user/hadoop/HoppityOutput

# View your output

bin/hadoop dfs -ls /user/hadoop/HoppityOutput
bin/hadoop dfs -cat /user/hadoop/HoppityOutput/part-r-00000

# Destroy the output directory for the inevitable re-runs as you learn
bin/hadoop dfs -rmr /user/hadoop/HoppityOutput

Finally, the code


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Hoppity {

public static class HoppityMapper
extends Mapper<Object, Text, Text, Text>{

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
int theValue = Integer.parseInt(value.toString().trim());
StringBuilder result = new StringBuilder();
for (int idx = 1; idx < (theValue + 1); idx++){
if (isModN(idx, 3) && isModN(idx, 5)){
if (isModN(idx, 3)){
if (isModN(idx, 5)){
Text resultKey = new Text("ResultKey");
Text resultValue = new Text(result.toString());
context.write(resultKey, resultValue);

private boolean isModN(int num, int mod){
if ((num % mod) == 0){
return true;
return false;

public static class HoppityReducer
extends Reducer<Text,Text,Text,Text> {

public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
for (Text val : values) {
String[] hops = val.toString().split("~");
for (String hop : hops){
Text blankText = new Text();
context.write(blankText, new Text(hop));

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Hoppity <in> <out>");
Job job = new Job(conf, "hoppity");
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);

Happy Coding!


Anonymous said...

some parts of the code is not visible

sundara rami reddy said...

This is such a great resource on hadoop ,that you are providing and you give it away for free. I love seeing websites that understand the value of providing a quality resource for free.
Hadoop Training in hyderabad

Kalyan Hadoop said...

Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

Follow the below links to know more knowledge on Hadoop