爬取安居客上的優質業務員資訊
coding=utf-8
import urllib2
import urllib
import re
f = open(‘D:/python1/renwu.txt’,“a”,)
from bs4 import BeautifulSoup
for i in range(1,5):
url = ‘https://beijing.anjuke.com/tycoon/p’+str(i)+’/’
user_agent = ‘Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)’
headers = {‘User-Agent’:user_agent}
request = urllib2.Request(url,headers = headers)
response = urllib2.urlopen(request)
content = response.read().decode(‘utf-8’)
soup = BeautifulSoup(content,‘html.parser’)
title = soup.find_all(‘div’,class_=‘jjr-itemmod’)
for a in title:
part1 = a.find(‘div’,class_=‘jjr-info’).get_text(’’,strip=True).encode(‘utf-8’).replace(‘’,’’)
part2 = part1.replace(’ ‘,’’)
part3 = part2.replace(’/n’,’’)
print part3
f.write(part3+’\n’)
學習總結:
1.在這個任務中學習到了一個新的模組bs4,這個模組在查詢資訊時比re模組更方便,更快捷