avatar

Catalog
Paper Reading----Mapreduce

Implementation

Execution Overview
  1. User program split input files into M pieces(16MB ~ 64MB), and start up copies of the programs.

  2. Master(one of the copied programs) program assign M map tasks and R reduce tasks to rest workers.

  3. Map Worker reads the input split and pass key/value pairs to Map function, the intermediate key/value pairs are buffered in memory.

  4. Buffered pairs are written to local disk, partitioned into R regions by partitioning function, later the location of this buffer will be passed to master, and then passed to reduce workers.

  5. Reduce worker use RPC to read the buffer from map workers. After reading all the intermediate data, it sorts it by the intermediate key.(Many different keys can be mapped to the same reduce task.)

  6. Reduce worker passes key and corresponding set of value to the user’s Reduce function, the result is appended to a final output file for this reduce partition.

  7. Master wakes up the user program, MapReduce call in the user program returns back to user code.

Author: Yiwen Zhang
Link: http://bessss-zyw.github.io/2021/05/22/paper-mapreduce/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.
Donate
  • 微信
    微信
  • 支付寶
    支付寶