MapReduceFoundation: Typing of MapReduce

Summary

MapReduce is a framework for processing large distributed data sets. It is based on the combinators map and reduce of functional programming. map applies a unary function to a collection of data items, returning the collection of results. reduce applies a binary associative function successively to a collection of data items, returning the result.

One frequent use case is to apply map and reduce in succession, first preparing a data set via map, and then extracting some information via reduce. For instance, the following MapReduce job counts the occurrences of each word in some given input data (on the left).

MapReduce example

Both map and reduce exist in different guises and for different aggregation patterns. Companies with large data centers such as Google, Yahoo, and Amazon have made the combination of map and reduce practical for processing very large data sets, mainly spread across a cluster of computers. The respective MapReduce frameworks have been implemented with imperative languages and are quite complex in the interest of efficiency. The price is a partial loss of structure leading to an increased danger of unrecognized programming errors, in particular, type errors.

The goal of project MapReduceFoundation is to make MapReduce type-safe in an imperative setting and more efficient in a functional setting (where type safety is a given).

Funding

The project MapReduceFoundation is funded by the German Research Foundation (Deutsche Forschungsgemeinschaft--DFG).

Publications

2011


Contact

MapReduceFoundation is a research project at the University of Passau. The project members are: