1. 程式人生 > >[hadoop入門]mapper與reducer

[hadoop入門]mapper與reducer

1、mapper

#!/usr/bin/env python
import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print "%s\t%s" % (word, 1)

2、reducer

#!/usr/bin/env python
from operator import itemgetter
import sys

current_word = None
current_count = 0
word 
= None for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: #count如果不是數字的話,直接忽略掉 continue if current_word == word: current_count += count else: if current_word:
print "%s\t%s" % (current_word, current_count) current_count = count current_word = word if word == current_word: #不要忘記最後的輸出 print "%s\t%s" % (current_word, current_count)

 

3、提供許可權命令

chmod +x  檔名