Run Hadoop MapReduce jobs over Avro data, with map and reduce functions written in Java.

Avro data files do not contain key/value pairs as expected by Hadoop's MapReduce API, but rather just a sequence of values. Thus we provide here a layer on top of Hadoop's MapReduce API.

In all cases, input and output paths are set and jobs are submitted as with standard Hadoop jobs:

For jobs whose input and output are Avro data files:

For jobs whose input is an Avro data file and which use an {@link org.apache.avro.mapred.AvroMapper}, but whose reducer is a non-Avro {@link org.apache.hadoop.mapred.Reducer} and whose output is a non-Avro format:

For jobs whose input is non-Avro data file and which use a non-Avro {@link org.apache.hadoop.mapred.Mapper}, but whose reducer is an {@link org.apache.avro.mapred.AvroReducer} and whose output is an Avro data file:

For jobs whose input is non-Avro data file and which use a non-Avro {@link org.apache.hadoop.mapred.Mapper} and no reducer, i.e., a map-only job: