MapReduceFoundation: Typing of MapReduce

Summary

MapReduce is a framework for processing large distributed data sets. It is based on the combinators map and reduce of functional programming. map applies a unary function to a collection of data items, returning the collection of results. reduce applies a binary associative function successively to a collection of data items, returning the result.

One frequent use case is to apply map and reduce in succession, first preparing a data set via map, and then extracting some information via reduce. For instance, the following MapReduce job counts the occurrences of each word in some given input data (on the left).

MapReduce example

Both map and reduce exist in different guises and for different aggregation patterns. Companies with large data centers such as Google, Yahoo, and Amazon have made the combination of map and reduce practical for processing very large data sets, mainly spread across a cluster of computers. The respective MapReduce frameworks have been implemented with imperative languages and are quite complex in the interest of efficiency. The price is a partial loss of structure leading to an increased danger of unrecognized programming errors, in particular, type errors.

The goal of project MapReduceFoundation is to make MapReduce type-safe in an imperative setting and to provide a functional model in which optimizations can be formulated as program transformations based on the theory of list homomorphisms.

The project has been terminated. Its results can be found in the technical report cited below.

Funding

The project MapReduceFoundation has been funded by the German Research Foundation (Deutsche Forschungsgemeinschaft--DFG).

Publications (copyright notice)

2015


2013


2011


Copyright Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these publications may not be reposted without the explicit permission of the copyright holder.

Contact

MapReduceFoundation is a research project at the University of Passau. The project members are: