【python】re.error: bad character range
阿新 • • 發佈:2020-12-30
python 分割中文句子的時候報錯:
File "C:\Users\Admin\anaconda3\envs\NLP\lib\re.py", line 215, in split return _compile(pattern, flags).split(string, maxsplit) File "C:\Users\Admin\anaconda3\envs\NLP\lib\re.py", line 288, in _compile p = sre_compile.compile(pattern, flags) File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 924, in parse p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 420, in _parse_sub not nested and not items)) File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 574, in _parse raise source.error(msg, len(this) + 1 + len(that)) re.error: bad character range )- at position 15
出錯程式碼點:
txt_split = re.split(r'[,,.。!!;;::??、()- ]', txt_process.strip())
參考這位仁兄:re分割字串時,所用的分隔符集合必須按其ASCII值的大小從小到大排列
而我原始碼裡的順序為:
print([ord(x) for x in ',,.。!!;;::??、()- '])
[65292, 44, 46, 12290, 65281, 33, 65307, 59, 65306, 58, 63, 65311, 12289, 65288, 65289, 45, 32]
更改分隔符的順序後,解決~
txt_split = re.split(r'[ !,-.:;?、。!(),:;?]', txt_process.strip()) print([ord(x) for x in ' !,-.:;?、。!(),:;?'])
[32, 33, 44, 45, 46, 58, 59, 63, 12289, 12290, 65281, 65288, 65289, 65292, 65306, 65307, 65311]