python re.compile(?P)
sentence = 'cats are fast'
regex = re.compile('(?P<animal>\w+) (?P<verb>\w+) (?P<adjective>\w+)')
matched = re.search(regex, sentence)
print(matched.groupdict())
output: {'adjective': 'fast', 'verb': 'are', 'animal': 'cats'}
Python 幫助文件中的說明如下:
The syntax for a named group is one of the Python-specific extensions:
(?P<name>...)
. name is, obviously, the name of the group. Named groups also behave exactly like capturing groups, and additionally associate a name with a group. Thematch
object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways:
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
>>> m.group(1)
'Lots'
The syntax for backreferences in an expression such as
(...)\1
refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension:(?P=name)
(\b\w+)\s+\1
can also be written as
(?P<word>\b\w+)\s+(?P=word)
:
>>> p = re.compile(r'(?P<word>\b\w+)\s+(?P=word)')
>>> p.search('Paris in the the spring').group()
'the the'
正則表示式文件的整理:https://docs.python.org/2/howto/regex.html
常用
^ Matches the beginning of a line
$ Matches the end of the line
.Matches any character
\s Matches whitespaces
\SMatches any non-whitespace character
* Repeats a character 0 or more times
*?Repeats a character 0 or more times(non-greedy)
+ Repeats a character one or more times
+? Repeats a character one or more times(non-greedy)
(Indicates where string extraction is to start
) Indicates where string extraction is to end
\dMatches any decimal digit
\D Matches any non-didgit character
\wMatches any alphanumeric character
\WMatches any non-alphanumeric character