Python——string之maketrans,translate函式
先來看下關於這兩個函式的官方定義:
string.maketrans(from, to):Return a translation table suitable for passing to translate(), that will map each character in from into the character at the same position in to; from and to must have the same length.
string.translate(s, table[, deletechars]):Delete all characters from s that are in deletechars (if present), and then translate the characters using table, which must be a 256-character string giving the translation for each character value, indexed by its ordinal. If table is None, then only the character deletion step is performed.
下面的程式碼是對這兩個函式進行的封裝:
#!/usr/bin/env python # -*- coding:utf-8 -*- import string def translator(frm='', to='', delete='', keep=None): if len(to) == 1: to = to * len(frm) trans = string.maketrans(frm, to) if keep is not None: trans_all = string.maketrans('', '') #keep.translate(trans_all, delete),從要保留的字元中剔除要刪除的字元 #trans_all.translate(trans_all, keep.translate(trans_all, delete)),從翻譯表中刪除要保留的字元,即取保留字元的補集 delete = trans_all.translate(trans_all, keep.translate(trans_all, delete)) def translate(s): return s.translate(trans, delete) return translate if __name__ == '__main__': #result:12345678 digits_only = translator(keep=string.digits) print digits_only('Eric chen: 1234-5678') #result:Eric chen: - no_digits = translator(delete=string.digits) print no_digits('Eric chen: 1234-5678') #result:Eric chen: ****-**** digits_to_hash = translator(frm=string.digits, to='*') print digits_to_hash('Eric chen: 1234-5678')
當以string.maketrans('', '')方法呼叫maketrans時,翻譯表正好是有256個字元的字串t。翻譯表生成的字串(忽略不可列印字元)為“!"#$%&'()*+,-./:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~”,本質上與ASCII表相對應。
其實呼叫maketrans函式時,已經完成了轉換。例如string.maketrans('ABCD', 'abcd'),呼叫完成後,翻譯表生成的包含256個字元的字串(忽略不可列印字元)為“!"#$%&'()*+,-./0123456789:;<=>?@abcdEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~”,該翻譯表中的原“ABCD”的位置已被“abcd”替換。
當你把t作為第一個引數傳入translate方法時,原字串中的每一個字元c,在處理完成後都會被翻譯成字元t[ord(c)]。
For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be
a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.
下面的程式碼是對unicode字串進行過濾:
import sets
class Keeper(object):
def __init__(self, keep):
self.keep = sets.Set(map(ord, keep))
def __getitem__(self, n):
if n not in self.keep:
return None
return unichr(n)
def __call__(self, s):
return unicode(s).translate(self)
makeFilter = Keeper
if __name__ == '__main__':
#result:人民
just_people = makeFilter(u'人民')
print just_people(u'中華人民共和國成立了')
#刪除unicode字元
#result:中華共和國成立了!
translate_table = dict((ord(char), None) for char in u'人民')
print unicode(u'中華人民共和國成立了!').translate(translate_table)
#替換unicode字元
#result:中華**共和國成立了!
translate_table = dict((ord(char), u'*') for char in u'人民')
print unicode(u'中華人民共和國成立了!').translate(translate_table)
Unicode字串的translate方法只需要一個引數:一個序列或對映,並且根據字串中的每個字元的碼值進行索引。碼值不是一個對映的鍵(或者序列的索引值)的字元會被直接複製,不做改變。與每個字元碼對應的值必須是一個unicode字串(該字元的替換物)或者None(這意味著該字元需要被刪除)。通常我們使用dict或list作為unicode字串的translate方法的引數,來翻譯或刪除某些字元。