Python傳送Http請求時,提交中文或者符號中文編碼問題的解決方法
阿新 • • 發佈:2018-11-06
前言
博主最近在用python3比較強大的Django開發web的時候,發現一些url的編碼問題,在瀏覽器提交請求api時,如果url中包含漢子,就會被自動編碼掉。呈現的結果是 ==> %xx%xx%xx。如果出現3個百分號為一個原字元則為utf8編碼,如果2個百分號則為gb2312編碼。下面為大家演示編碼和解碼的程式碼。
編碼 from urllib.parse import quote text = quote(text, 'utf-8') 注:text為要進行編碼的字串 解碼 from urllib.parse import unquote text = unquote(text, 'utf-8') 原始碼 def unquote(string, encoding='utf-8', errors='replace'): """Replace %xx escapes by their single-character equivalent. The optional encoding and errors parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the bytes.decode() method. By default, percent-encoded sequences are decoded with UTF-8, and invalid sequences are replaced by a placeholder character. unquote('abc%20def') -> 'abc def'. """ if '%' not in string: string.split return string if encoding is None: encoding = 'utf-8' if errors is None: errors = 'replace' bits = _asciire.split(string) res = [bits[0]] append = res.append for i in range(1, len(bits), 2): append(unquote_to_bytes(bits[i]).decode(encoding, errors)) append(bits[i + 1]) return ''.join(res) def quote(string, safe='/', encoding=None, errors=None): """quote('abc def') -> 'abc%20def' Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted. RFC 2396 Uniform Resource Identifiers (URI): Generic Syntax lists the following reserved characters. reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," Each of these characters is reserved in some component of a URL, but not necessarily in all of them. By default, the quote function is intended for quoting the path section of a URL. Thus, it will not encode '/'. This character is reserved, but in typical usage the quote function is being called on a path where the existing slash characters are used as reserved characters. string and safe may be either str or bytes objects. encoding and errors must not be specified if string is a bytes object. The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode method. By default, encoding='utf-8' (characters are encoded with UTF-8), and errors='strict' (unsupported characters raise a UnicodeEncodeError). """ if isinstance(string, str): if not string: return string if encoding is None: encoding = 'utf-8' if errors is None: errors = 'strict' string = string.encode(encoding, errors) else: if encoding is not None: raise TypeError("quote() doesn't support 'encoding' for bytes") if errors is not None: raise TypeError("quote() doesn't support 'errors' for bytes") return quote_from_bytes(string, safe)