1. 程式人生 > >python re.compile(?P)

python re.compile(?P)

正則還可以這樣匹配。。。 geeksquiz 網站提供程式碼題,可用於自測一門語言的掌握情況,今天做python的題有了有趣的發現——原來正則還可以這樣寫>>>
sentence = 'cats are fast'
regex = re.compile('(?P<animal>\w+) (?P<verb>\w+) (?P<adjective>\w+)')
matched = re.search(regex, sentence)
print(matched.groupdict())

output: {'adjective': 'fast', 'verb': 'are', 'animal': 'cats'}

Python 幫助文件中的說明如下:

The syntax for a named group is one of the Python-specific extensions: (?P<name>...). name is, obviously, the name of the group. Named groups also behave exactly like capturing groups, and additionally associate a name with a group. Thematch object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways:

>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
>>> m.group(1)
'Lots'

The syntax for backreferences in an expression such as (...)\1 refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension:(?P=name)

indicates that the contents of the group calledname should again be matched at the current point. The regular expression for finding doubled words,(\b\w+)\s+\1 can also be written as (?P<word>\b\w+)\s+(?P=word):

>>> p = re.compile(r'(?P<word>\b\w+)\s+(?P=word)')
>>> p.search('Paris in the the spring').group()
'the the'

正則表示式文件的整理:https://docs.python.org/2/howto/regex.html

常用

^ Matches the beginning of a line

$ Matches the end of the line

.Matches any character

\s  Matches whitespaces

\SMatches any non-whitespace character

* Repeats a character 0 or more times

*?Repeats a character 0 or more times(non-greedy)

+ Repeats a character one or more times 

+? Repeats a character one or more times(non-greedy)

(Indicates where string extraction is to start 

) Indicates where string extraction is to end 

\dMatches any decimal digit

\D Matches any non-didgit character

\wMatches any alphanumeric character

\WMatches any non-alphanumeric character