1. 程式人生 > 其它 >【python】re.error: bad character range

【python】re.error: bad character range

技術標籤:python正則表示式

python 分割中文句子的時候報錯:

  File "C:\Users\Admin\anaconda3\envs\NLP\lib\re.py", line 215, in split
    return _compile(pattern, flags).split(string, maxsplit)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\re.py", line 288, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 924, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 420, in _parse_sub
    not nested and not items))
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 574, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range )-  at position 15

出錯程式碼點:

txt_split = re.split(r'[,,.。!!;;::??、()- ]', txt_process.strip())

參考這位仁兄:re分割字串時,所用的分隔符集合必須按其ASCII值的大小從小到大排列

而我原始碼裡的順序為:

print([ord(x) for x in ',,.。!!;;::??、()- '])
[65292, 44, 46, 12290, 65281, 33, 65307, 59, 65306, 58, 63, 65311, 12289, 65288, 65289, 45, 32]

更改分隔符的順序後,解決~

txt_split = re.split(r'[ !,-.:;?、。!(),:;?]', txt_process.strip())
print([ord(x) for x in ' !,-.:;?、。!(),:;?'])
[32, 33, 44, 45, 46, 58, 59, 63, 12289, 12290, 65281, 65288, 65289, 65292, 65306, 65307, 65311]