MIT6.824 2018 MapReduce Part II: Single-worker word count

阿新 • • 發佈：2020-12-26

技術標籤：演算法

Part II: Single-worker word count

Now you will implement word count — a simple Map/Reduce example. Look inmain/wc.go; you'll find emptymapF()andreduceF()functions. Your job is to insert code so thatwc.goreports the number of occurrences of each word in its input. A word is any contiguous sequence of letters, as determined by

unicode.IsLetter.

There are some input files with pathnames of the formpg-*.txtin ~/6.824/src/main, downloaded fromProject Gutenberg. Here's how to runwcwith the input files:

$ cd 6.824
$ export "GOPATH=$PWD"
$ cd "$GOPATH/src/main"
$ go run wc.go master sequential pg-*.txt
# command-line-arguments
./wc.go:14: missing return at end of function
./wc.go:21: missing return at end of function

The compilation fails becausemapF()andreduceF()are not complete.

Review Section 2 of theMapReduce paper. YourmapF()andreduceF()functions will differ a bit from those in the paper's Section 2.1. YourmapF()will be passed the name of a file, as well as that file's contents; it should split the contents into words, and return a Go slice ofmapreduce.KeyValue. While you can choose what to put in the keys and values for themapFoutput, for word count it only makes sense to use words as the keys. YourreduceF()will be called once for each key, with a slice of all the values generated bymapF()for that key. It must return a string containing the total number of occurences of the key.

a good read on Go strings is theGo Blog on strings.
you can usestrings.FieldsFuncto split a string into components.
the strconv package (http://golang.org/pkg/strconv/) is handy to convert strings to integers etc.

You can test your solution using:

$ cd "$GOPATH/src/main"
$ time go run wc.go master sequential pg-*.txt
master: Starting Map/Reduce task wcseq
Merge: read mrtmp.wcseq-res-0
Merge: read mrtmp.wcseq-res-1
Merge: read mrtmp.wcseq-res-2
master: Map/Reduce task completed
2.59user 1.08system 0:02.81elapsed

The output will be in the file "mrtmp.wcseq". Your implementation is correct if the following command produces the output shown here:

$ sort -n -k2 mrtmp.wcseq | tail -10
that: 7871
it: 7987
in: 8415
was: 8578
a: 13382
of: 13536
I: 14296
to: 16079
and: 23612
the: 29748

You can remove the output file and all intermediate files with:

$ rm mrtmp.*

To make testing easy for you, run:

$ bash ./test-wc.sh

and it will report if your solution is correct or not. 來讀一下這段Shell指令碼

#!/bin/bash
go run wc.go master sequential pg-*.txt
sort -n -k2 mrtmp.wcseq | tail -10 | diff - mr-testout.txt > diff.out
if [ -s diff.out ]
then
echo "Failed test. Output should be as in mr-testout.txt. Your output differs as follows (from diff.out):" > /dev/stderr
  cat diff.out
else
  echo "Passed test" > /dev/stderr
fi

這一部分主要是實現一個單執行緒序列化的mapreduce，這裡的map函式是對英文文字進行分詞（這裡是去查Go的API手冊），然後新增到key-value陣列，value均為1，reduce，是將同一個key，對應的value求和。得到真實的value，介面如下：

//
// The map function is called once for each file of input. The first
// argument is the name of the input file, and the second is the
// file's complete contents. You should ignore the input file name,
// and look only at the contents argument. The return value is a slice
// of key/value pairs.
//
func mapF(filename string, contents string) []mapreduce.KeyValue {
	// Your code here (Part II).
	// function to detect word separators.
	ff := func(r rune) bool { return !unicode.IsLetter(r) }

	// split contents into an array of words.
	words := strings.FieldsFunc(contents, ff)

	kva := []mapreduce.KeyValue{}
	for _, w := range words {
		kv := mapreduce.KeyValue{w, "1"}
		kva = append(kva, kv)
	}
	return kva

}

//
// The reduce function is called once for each key generated by the
// map tasks, with a list of all the values created for that key by
// any map task.
//
func reduceF(key string, values []string) string {
	// Your code here (Part II).
	return strconv.Itoa(len(values))
}

這樣Part2就完成了。

MIT6.824 2018 MapReduce Part II: Single-worker word count

Part II: Single-worker word count

MIT6.824 2018 MapReduce Part II: Single-worker word count

Mit6.824 Lab1 MapReduce實現

MIT6.824之MapReduce實現

Single Sign On (SSO) for cross-domain ASP.NET applications: Part-II - The implementation（轉發）（待續）

MIT6.824 mapReduce lab1 reduce過程實現

6v6最後回憶，守望先鋒5年電競簡史 — PART II 崛起與隕落 2018

MIT6.824 Lab1 預熱

MIT6.824 Lab1 程式碼

UserWarning:RNN module weights are not part of single contiguous chunk of memory.

MIT6.824 2020 Lab2 A Raft Leader Election

MIT6.824 Lab2除錯過程

DSPIC33CK BLDC 驅動 by Rocky MCS part II

MIT6.824 lec3 GFS

6.824 Lab1 MapReduce

Hadoop: 單詞計數(Word Count)的MapReduce實現

Spark: 單詞計數(Word Count)的MapReduce實現（Java/Python）

Single Number II

6.824 MapReduce lab總結

MIT 6.824(Spring 2020) Lab1: MapReduce 文件翻譯

Swift Unsafe Part - 「危險的 Swift 」指北

MIT6.824 2018 MapReduce Part II: Single-worker word count

Part II: Single-worker word count

相關推薦