1. 程式人生 > >MapReduce:Simplified Data Processing On Large Clusters

MapReduce:Simplified Data Processing On Large Clusters

Over the past five years, the authors and many other at Google have implemented hundreds of special-purpose computations that process large amounts of raw data, such as arawled documents, web request logs, etc., to compute various kinds of derived data,such as inverted indices, various representations of the graph structure of web documents, summaries of the number of pages crawled per host, the set of most frequent queries in a given day, etc. Most such computations are conceptually straight forward. However, the input data is usually large and the commutations have to be distributed across hundreds or thousands of machines in order to finish in a reasonable amount of time. The issues of how to paralleize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of comlex code to deal with these issues.