Python實踐練習:電話號碼和 E-mail 地址提取程序
阿新 • • 發佈:2018-05-04
system 剪切板 http con cisco jobs 什麽 python3 sts
3.把處理好的文本復制到剪切板
題目:
假設你有一個無聊的任務,要在一篇長的網頁或文章中,找出所有電話號碼和郵件地址。如果手動翻頁,可能需要查找很長時間。如果有一個程序,可以在剪貼板的文本中查找電話號碼和 E-mail 地址,那你就只要按一下 Ctrl-A 選擇所有文本,按下 Ctrl-C 將它復制到剪貼板,然後運行你的程序。它會用找到的電話號碼和 E-mail地址,替換掉剪貼板中的文本。
測試文本
Skip to main content Home Search form Search GO! Topics Arduino Art & Design General Computing Hacking & Computer Security Hardware / DIY JavaScript Kids LEGO? LEGO? MINDSTORMS? Linux & BSD Skip to main content Home Search form Search GO! Catalog Media Write for Us About Us Topics Arduino Art & Design General Computing Hacking & Computer Security Hardware / DIY JavaScript Kids LEGO? LEGO? MINDSTORMS? Linux & BSD Manga Minecraft Programming Python Science & Math Scratch System Administration Early Access Gift Certificates Free ebook edition with every print book purchased from nostarch.com! Shopping cart 3 Items Total: $53.48 View cart Checkout Contact Us No Starch Press, Inc. 245 8th Street San Francisco, CA 94103 USA Phone: 800.420.7240 or +1 415.863.9900 (9 a.m. to 5 p.m., M-F, PST) Fax: +1 415.863.9950 Reach Us by Email General inquiries: [email protected] Media requests: [email protected] Academic requests: [email protected] (Please see this page for academic review requests) Help with your order: [email protected] Reach Us on Social Media Twitter Facebook Navigation My account Log out Manage your subscription preferences. About Us | ★ Jobs! ★ | Sales and Distribution | Rights | Media | Academic Requests | Conferences | Order FAQ | Contact Us | Write for Us | Privacy Copyright 2018 No Starch Press, Inc
運行後結果
Copied to clipboard:
800-420-7240
415-863-9900
415-863-9950
[email protected]
[email protected]
[email protected]
[email protected]
Hit any key to close this window...
思路
當你開始接手一個新項目時,很容易想要直接開始寫代碼。但更多的時候,最好是後退一步,考慮更大的圖景。我建議先草擬高層次的計劃,弄清楚程序需要做什麽。暫時不要思考真正的代碼,稍後再來考慮。
1.創建電話的正則表達式和創建email的正則表達式
2.匹配剪切板的文本
現在開始寫程序
#! python3 # phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard. import re, pyperclip # 創建電話的正則表達式 phoneRegex = re.compile(r'''( (\d{3}|\(d{3}\))? # 區號可選,444或(444) (\s|-|\.)? # 分隔符:字符或-或. 可選 (\d{3}) # 三個數字 (\s|-|\.)? # 分隔符:字符或-或. 可選 (\d{4}) # 四個數字 )''',re.VERBOSE) # 創建email的正則表達式 emailRegex = re.compile(r'''( [a-zA-Z0-9._%+-]+ # username @ [a-zA-Z0-9.-]+ # domail name (\.[a-zA-Z]{2,4}) # dot-something )''',re.VERBOSE) # 匹配剪切板的文本 text = str(pyperclip.paste()) matches = [] for groups in phoneRegex.findall(text): phoneNum = '-'.join([groups[1], groups[3], groups[6]]) matches.append(phoneNum) for groups in emailRegex.findall(text): matches.append(groups[0]) # 把處理好的文本復制到剪切板 if len(matches) > 0: pyperclip.copy('\n'.join(matches)) print('Copied to clipboard:') print('\n'.join(matches)) else: print('No phone numbers or email addresses found.')
分析代碼
re.VERBOSE是讓正則表達式中可以忽略註釋和空白符的一個參數。verbose表示冗雜的意思,就是可以讓你添些註釋,對正則更可讀。
正則表達式詳見:Python正則
另一個坑就是groups了,原來我沒有理解groups與group的區別
group()是截取分組的意思,例子:
import re
a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(0) #123abc456,返回整體
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(1) #123
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(2) #abc
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(3) #456
groups() 返回一個包含所有小組字符串的元組,從 1 到 所含的小組號。
代碼中phoneNum = ‘-‘.join([groups[1], groups[3], groups[6]])中的groups是一個變量,別看錯了。
Python實踐練習:電話號碼和 E-mail 地址提取程序