Friday, December 18, 2009

Hadoop Solution for the Hoppity problem

The latest in the Code Snippet series.

A highly contrived solution to the Hoppity problem, using Hadoop.

Why in the world did I do this? After all, Hadoop is a batch framework for working with large amounts of data in parallel, hardly the right tool to solve a trivial small-data problem on a single laptop.

I did it primarily to exercise my Hadoop skills, though even here it's a light workout. But I think Hadoop is well worth learning and practicing, so here goes!

The Setup
- You have to have Hadoop installed. For that task, I used Michael Noll's excellent Hadoop-on-Ubuntu blog entry.
- If you have a problem running after Michael's instructions, if it's a host naming problem, be sure your hosts file has 127.0.0.1 as 'localhost'.

Run scripts
You'll probably want to use utility scripts for the following tasks:

# Compile
javac -cp /usr/local/hadoop/hadoop-0.20.1/hadoop-0.20.1-core.jar:/usr/local/hadoop/hadoop-0.20.1/lib/commons-cli-1.2.jar Hoppity.java
jar -cfv hoppity.jar *.class

# Make a directory
bin/hadoop dfs -mkdir /user/hadoop/HoppityInput


# Copy the file containing Hoppity input into the directory

bin/hadoop dfs -copyFromLocal numHops.txt /user/hadoop/HoppityInput

# Run
bin/hadoop jar hoppity.jar Hoppity /user/hadoop/HoppityInput /user/hadoop/HoppityOutput


# View your output

bin/hadoop dfs -ls /user/hadoop/HoppityOutput
bin/hadoop dfs -cat /user/hadoop/HoppityOutput/part-r-00000

# Destroy the output directory for the inevitable re-runs as you learn
bin/hadoop dfs -rmr /user/hadoop/HoppityOutput

Finally, the code

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Hoppity {

public static class HoppityMapper
extends Mapper<Object, Text, Text, Text>{


public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
int theValue = Integer.parseInt(value.toString().trim());
StringBuilder result = new StringBuilder();
for (int idx = 1; idx < (theValue + 1); idx++){
if (isModN(idx, 3) && isModN(idx, 5)){
result.append("Hop~");
continue;
}
if (isModN(idx, 3)){
result.append("Hoppity~");
continue;
}
if (isModN(idx, 5)){
result.append("Hophop~");
continue;
}
}
Text resultKey = new Text("ResultKey");
Text resultValue = new Text(result.toString());
context.write(resultKey, resultValue);
}

private boolean isModN(int num, int mod){
if ((num % mod) == 0){
return true;
}
return false;
}
}

public static class HoppityReducer
extends Reducer<Text,Text,Text,Text> {

public void reduce(Text key, Iterable<Text> values,
Context context
) throws IOException, InterruptedException {
for (Text val : values) {
String[] hops = val.toString().split("~");
for (String hop : hops){
Text blankText = new Text();
context.write(blankText, new Text(hop));
}
}
}
}

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Hoppity <in> <out>");
System.exit(2);
}
Job job = new Job(conf, "hoppity");
job.setJarByClass(Hoppity.class);
job.setMapperClass(HoppityMapper.class);
job.setCombinerClass(HoppityReducer.class);
job.setReducerClass(HoppityReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}


Happy Coding!

3 comments:

Anonymous said...

some parts of the code is not visible

sundara rami reddy said...

This is such a great resource on hadoop ,that you are providing and you give it away for free. I love seeing websites that understand the value of providing a quality resource for free.
Hadoop Training in hyderabad

Kalyan Hadoop said...

Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

Follow the below links to know more knowledge on Hadoop

WebSites:
================
http://www.kalyanhadooptraining.com/

http://www.hyderabadhadooptraining.com/

http://www.bigdatatraininghyderabad.com/

Videos:
===============
https://www.youtube.com/watch?v=-_fTzrgzVQc

https://www.youtube.com/watch?v=Df2Odze87dE

https://www.youtube.com/watch?v=AOfX-tNkYyo

https://www.youtube.com/watch?v=Cyo3y0vlZ3c

https://www.youtube.com/watch?v=jOLSXx6koO4

https://www.youtube.com/watch?v=09mpbNBAmCo