1. 程式人生 > >Python實踐練習:電話號碼和 E-mail 地址提取程序

Python實踐練習:電話號碼和 E-mail 地址提取程序

system 剪切板 http con cisco jobs 什麽 python3 sts

題目:

假設你有一個無聊的任務,要在一篇長的網頁或文章中,找出所有電話號碼和郵件地址。如果手動翻頁,可能需要查找很長時間。如果有一個程序,可以在剪貼板的文本中查找電話號碼和 E-mail 地址,那你就只要按一下 Ctrl-A 選擇所有文本,按下 Ctrl-C 將它復制到剪貼板,然後運行你的程序。它會用找到的電話號碼和 E-mail地址,替換掉剪貼板中的文本。

測試文本

Skip to main content
Home
Search form

Search

GO!
Topics
Arduino
Art & Design
General Computing
Hacking & Computer Security
Hardware / DIY
JavaScript
Kids
LEGO?
LEGO? MINDSTORMS?
Linux & BSD
Skip to main content
Home
Search form

Search

GO!
Catalog
Media
Write for Us
About Us
Topics
Arduino
Art & Design
General Computing
Hacking & Computer Security
Hardware / DIY
JavaScript
Kids
LEGO?
LEGO? MINDSTORMS?
Linux & BSD
Manga
Minecraft
Programming
Python
Science & Math
Scratch
System Administration
Early Access
Gift Certificates
Free ebook edition with every print book purchased from nostarch.com!
Shopping cart
3 Items    Total: $53.48
View cart Checkout
Contact Us

No Starch Press, Inc.
245 8th Street
San Francisco, CA 94103 USA
Phone: 800.420.7240 or +1 415.863.9900 (9 a.m. to 5 p.m., M-F, PST)
Fax: +1 415.863.9950

Reach Us by Email
General inquiries: [email protected]
Media requests: [email protected]
Academic requests: [email protected] (Please see this page for academic review requests)
Help with your order: [email protected]
Reach Us on Social Media
Twitter
Facebook
Navigation
My account
Log out
Manage your subscription preferences.


About Us  |  ★ Jobs! ★  |  Sales and Distribution  |  Rights  |  Media  |  Academic Requests  |  Conferences  |  Order FAQ  |  Contact Us  |  Write for Us  |  Privacy
Copyright 2018 No Starch Press, Inc

運行後結果

Copied to clipboard:
800-420-7240
415-863-9900
415-863-9950 
[email protected]
[email protected]
[email protected]
[email protected]
Hit any key to close this window...

思路

當你開始接手一個新項目時,很容易想要直接開始寫代碼。但更多的時候,最好是後退一步,考慮更大的圖景。我建議先草擬高層次的計劃,弄清楚程序需要做什麽。暫時不要思考真正的代碼,稍後再來考慮。
1.創建電話的正則表達式和創建email的正則表達式
2.匹配剪切板的文本

3.把處理好的文本復制到剪切板

現在開始寫程序

#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.

import re, pyperclip
# 創建電話的正則表達式
phoneRegex = re.compile(r'''(
   (\d{3}|\(d{3}\))?  # 區號可選,444或(444)
   (\s|-|\.)?  # 分隔符:字符或-或. 可選
   (\d{3})  # 三個數字
   (\s|-|\.)?  # 分隔符:字符或-或. 可選
   (\d{4})  # 四個數字
   )''',re.VERBOSE)

# 創建email的正則表達式
emailRegex = re.compile(r'''(
   [a-zA-Z0-9._%+-]+  # username
   @
   [a-zA-Z0-9.-]+  # domail name
   (\.[a-zA-Z]{2,4})  # dot-something
   )''',re.VERBOSE)

# 匹配剪切板的文本
text = str(pyperclip.paste())
matches = []
for groups in phoneRegex.findall(text):
   phoneNum = '-'.join([groups[1], groups[3], groups[6]])
   matches.append(phoneNum)
for groups in emailRegex.findall(text):
   matches.append(groups[0])

# 把處理好的文本復制到剪切板
if len(matches) > 0:
   pyperclip.copy('\n'.join(matches))
   print('Copied to clipboard:')
   print('\n'.join(matches))
else:
   print('No phone numbers or email addresses found.')

分析代碼

re.VERBOSE是讓正則表達式中可以忽略註釋和空白符的一個參數。verbose表示冗雜的意思,就是可以讓你添些註釋,對正則更可讀。
正則表達式詳見:Python正則

另一個坑就是groups了,原來我沒有理解groups與group的區別
group()是截取分組的意思,例子:

import re
a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(0)   #123abc456,返回整體
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(1)   #123
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(2)   #abc
print re.search("([0-9]*)([a-z]*)([0-9]*)",a).group(3)   #456

groups() 返回一個包含所有小組字符串的元組,從 1 到 所含的小組號。
代碼中phoneNum = ‘-‘.join([groups[1], groups[3], groups[6]])中的groups是一個變量,別看錯了。

Python實踐練習:電話號碼和 E-mail 地址提取程序