Python內建資料結構之字串

阿新 • • 發佈：2022-05-03

字串今天跟大家來說一說Python中的字串資料結構。

上文回顧

讓我們回顧一下可變型別及不可變型別:

不可變資料型別：str、int、tuple
可變資料型別：dict、list

今天講解的字串屬於不可變型別。

Python字串編碼

Python3中的字串是Unicode的序列，也就是說，Python3的字串支援多語言了；Python2中的字串是byte序列。

例如:

In[1]: print('含有中文的字串str')
含有中文的字串str

對於單個字元的編碼，Python提供了ord()內建函式來獲取字元的整數表示；chr()內建函式則把編碼轉換為對應的字元。例如:

In[2]: ord('A')
Out[2]: 65

In[3]: ord('中')
Out[3]: 20013

In[4]: chr(97)
Out[4]: 'a'

In[5]: chr(25991)
Out[5]: '文'

如果要知道字元的整數編碼，那麼還可以用其十六進位制的形式這麼寫:

In[6]: 'u4e2du6587'
Out[6]: '中文'

Python的字元編碼就介紹到這裡。接下來介紹Python字串的常用方法，看看字串在日常當中是怎麼用的。

字串常用方法

字串常用方法:

字串連線:join
字串分割:split、rsplit、splitlines、partition、rpartition
字串修改-大小寫:capitalize、title、lower、upper、swapcase
字串修改-填充或清除:center、ljust、rjust、zfill、strip、rstrip、lstrip
字串查詢替換:count、find、rfind、index、rindex、replace

字串連線(join)

join允許我們使用特定的字元來連線字串。直接看例子吧:

In[7]: lst = ['i', 'am', 'lavenliu']

In[8]: ' '.join(lst)
Out[8]: 'i am lavenliu'

join是字串方法，引數是內容為字串的可迭代物件，接收者(在這裡為空格)作為連線符。使用逗號進行連線:

In[9]: ','.join(lst)
Out[9]: 'i,am,lavenliu'

除了join外，我們還可以使用+進行兩個字串的連線:

In[10]: 'my' + ' name'
Out[10]: 'my name'

字串分割(split系列方法)

# split
>>> s = 'my name is lavenliu'
>>> s
'my name is lavenliu'
>>> s.split()
['my', 'name', 'is', 'lavenliu']
>>> s.split(maxsplit=1) # maxsplit引數表示分割多少次；預設值為-1，表示分割所有分隔符
>>> s.split('ls')
['my name is lavenliu']
>>> s.split('is')
['my name ', ' lavenliu']
>>> s.split(' ', 1)
['my', 'name is lavenliu']
s.split('is')   # 以“is”為分隔符
s.split(' ', 1) # 以空格為分隔符，從左到右分隔一次
s.split(' ', 2) # 以空格為分隔符，從左到右分隔兩次
s.split(' ', -1) # -1就是預設值，直到字串分隔完成
['my', 'name', 'is', 'lavenliu']

# rsplit方法
s.rsplit()
['my', 'name', 'is', 'lavenliu']

s.rsplit(' ')
['my', 'name', 'is', 'lavenliu']

s.rsplit(' ', 1) # 與s.split(' ', 1)的分隔形式相反
['my name is', 'lavenliu']

我們看一看split函式的原型該怎麼寫:

def split(s, sep, maxsplit):
    ret = []
    tmp = []
    i = 0
    for c in s:
        if c != sep:
            tmp.append(c)
        else:
            i += 1
            ret.append(''.join(tmp))
            tmp.clear()
        if maxsplit > 0 and i >= maxsplit:
            return ret
    return ret

rsplit方法的原型為:

def rsplit(s, sep, maxsplit):
    ret = []
    tmp = []
    for c in reversed(s):
        if c != sep:
            tmp.append(c)
        else:
            i += 1
            ret.append(''.join(reversed(tmp)))
            tmp.clear()
        if maxsplit > 0 and i >= maxsplit:
            ret.append()
            return reversed(ret)
    return reversed(ret)

splitlines方法:

In[12]: s = '''i am lavenliu
    ...: i love python'''

In[13]: print(s.splitlines())     # 按行分割，並且返回結果不帶換行符
['i am lavenliu', 'i love python']

In[14]: print(s.splitlines(True)) # 按行分割，並且返回結果帶換行符
['i am lavenliun', 'i love python']

partition方法:

In[15]: s = 'i am lavenliu'

In[16]: s.partition(' ')
Out[16]: ('i', ' ', 'am lavenliu')

partition總是返回一個三元組，它按傳入的分隔符分割一次，得到head，tail，返回結果是head，sep，tail。rpartition是partition從右往左的版本。再看一個partition的例子，

In[17]: cfg = 'mysql.connect = mysql://user:[email protected]:3306/test'

In[18]: print(cfg.partition('='))
('mysql.connect ', '=', ' mysql://user:[email protected]:3306/test')

In[19]: cfg = 'env = PATH=/usr/bin:$PATH'

In[20]: print(cfg.partition('='))
('env ', '=', ' PATH=/usr/bin:$PATH')

In[21]: print(''.partition('='))
('', '', '')

In[22]: print('='.partition('='))
('', '=', '')

partition方法實現:

def partition(s, sep):
    if s == '':
        return '', '', ''
    tmp = s.split(sep, maxsplit=1)
    if len(tmp) == 2:
        return tmp[0], sep, tmp[1]
    if len(tmp) == 1:
        return tmp[0], sep, ''

字串大小寫轉換:

In[23]: s = 'my name is laven'

In[24]: print(s.capitalize())
My name is laven

In[25]: print(s.title())
My Name Is Laven

In[26]: print(s.lower())
my name is laven

In[27]: print(s.upper())
MY NAME IS LAVEN

In[28]: print(s.upper().lower())
my name is laven

In[29]: print('Hello World'.casefold()) # 不同的平臺有不同的表現形式，但在同一平臺下，表現形式相同，通常用來忽略大小寫時的比較。
hello world

In[30]: print('Hello World'.swapcase())
hELLO wORLD

In[31]: print('t'.expandtabs(4))
     # 此處前面有四個空格

大小寫轉化通常用在做比較的時候，當我們需要忽略大小寫的時候，通常統一轉化為全部大寫或全部小寫再做比較。

字串修改之填充:

s = 'my name is laven'
help(s.center) # 預設空格填充。如果寬度小於等於原串長度，不做任何操作。
s.center(80)
>>> s.center(80)
'                                my name is laven                                '

>>> s.center(80, '#')
'################################my name is laven################################'
>>>

# ljust方法
>>> s.ljust(80)
'my name is laven                                                                '
>>>
>>> s.ljust(80, '*') # 字串左對齊
'my name is laven****************************************************************'
>>>

# rjust方法
>>> s.rjust(80)
'

>>> s.rjust(80, '*')
'****************************************************************my name is laven'
# ljust,rjust方法是針對字串的顯示位置的

# zfill方法
>>> s.zfill(80)
'0000000000000000000000000000000000000000000000000000000000000000my name is laven'
>>>

接下來演示3個非常重的字串方法:strip、lstrip、rstrip:

s = '   hahe hehe   n t'
s.strip()
>>> s.strip() # strip只能去掉字串兩邊的空白或指定字元
'haha hehe'

# lstrip預設去掉前置的空白
>>> s.lstrip()
'haha hehe  n t'

# rstrip預設去掉後置的空白
>>> s.rstrip()
'    haha hehe'

# 去掉指定字元
>>> s = '##test##'
>>> s.strip('#')
'test'
>>> s.strip('*')
'##test##'

>>> s = '## test ##'
>>> s
'## test ##'
>>> s.strip('#')
' test '
>>> s.strip('#').strip()
'test'

strip方法可以移除指定的字元。大家可以試試看。

startswith與endswith方法，判斷字串是否以某個字首開始，返回結果是boolean。

s = '**test##'
>>> s.startswith('*')
True
>>> s.endswith('#')
True
>>> s.endswith('test')
False
>>> s.endswith('test', 0, 5) # start,end 引數表示的是從索引start處開始到end處結束，但不包含end
False
>>> s.endswith('test', 0, 6)
True

字元查詢與替換:

count
find
rfind
index
replace

# count方法
s = '**test##'
>>> s.count('*')
2
>>> s.count('#')
2
>>> s.find('t') # find從左向右查詢
2
>>>
>>> s.find('t')
2
>>> s.find('test')
2
>>> s.rfind('test') # rfind是find的從右向左查詢的版本
2
>>> s.rfind('t')
5

s = 'i very very love python'
s.find('very') # 2
s.find('very', 3) # start引數指定從哪裡開始查詢
s.find('very', 3, 10) # end引數指定到哪裡結束查詢，end不包含
s.rfind('very') # 從右往左查詢，但得到的索引是從左到右

# index方法，index一個不存在的字元或子串時，會報錯；而find則不會報錯；
>>> s.index('t')
2
>>> s.index('test')
2
>>> s.rfind('t')
5
>>> s.find('a')
-1
>>> s.index('a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: substring not found
s.find('a') # 使用find找一個不存在的字元，返回-1
s.index('a') # 使用index找一個不存在的字元，會丟擲異常

# replace方法，可選的count，指定替換多少次
s = 'abc123abc123'
s.replace('abc', 'xyz') # 預設是全部替換，-1省略了
s.replace('abc', 'xyz', 1) # 只替換一次
s.replace('xxxx', '') # 如果要替換的字元不存在，什麼都不做
s.replace('abc', 'xyz', -1) # -1表示全部替換

如果對find不是很理解，可以藉助enumerate方法來檢視，

In[32]: s = 'i very very love python'

In[33]: print(list(enumerate(s)))
[(0, 'i'), (1, ' '), (2, 'v'), (3, 'e'), (4, 'r'), (5, 'y'), (6, ' '), (7, 'v'), (8, 'e'), (9, 'r'), (10, 'y'), (11, ' '), (12, 'l'), (13, 'o'), (14, 'v'), (15, 'e'), (16, ' '), (17, 'p'), (18, 'y'), (19, 't'), (20, 'h'), (21, 'o'), (22, 'n')]

字串判斷函式是一些is開頭的方法，這些方法用的不多。

一個比較有用的小技巧，

In[34]: line = 'url://http://lavenliu.cn'

In[35]: line.split(':', 1)
Out[35]: ['url', '//http://lavenliu.cn']

In[36]: key, value = line.split(':', 1)

In[37]: key
Out[37]: 'url'

In[38]: value
Out[38]: '//http://lavenliu.cn

splitlines方法:

In[39]: text = '''I am laven
    ...: I am a man
    ...: I like Emacs'''

In[40]: text.splitlines()
Out[40]: ['I am laven', 'I am a man', 'I like Emacs']

In[41]: text.splitlines(True) # 保留換行符
Out[41]: ['I am lavenn', 'I am a mann', 'I like Emacs']

partition方法，

In[42]: s = 'my name is laven'

In[43]: s.partition(' ') # 類似於s.split(' ', 1)
Out[43]: ('my', ' ', 'name is laven')

In[44]: # 上面的line = 'url:http://magedu.com'可以寫成如下的形式

In[45]: line.partition(':')
Out[45]: ('url', ':', '//http://lavenliu.cn')

In[46]: key, _, value = line.partition(':')

In[47]: key
Out[47]: 'url'

In[48]: value
Out[48]: '//http://lavenliu.cn'

In[49]: s.rpartition(' ')
Out[49]: ('my name is', ' ', 'laven')

字串解包操作，

In[50]: s = 'my name is lavenliu'

In[51]: a, b, *mid, tail = s

In[52]: a
Out[52]: 'm'

In[53]: b
Out[53]: 'y'

In[54]: mid
Out[54]: 
[' ',
 'n',
 'a',
 'm',
 'e',
 ' ',
 'i',
 's',
 ' ',
 'l',
 'a',
 'v',
 'e',
 'n',
 'l',
 'i']

In[55]: tail
Out[55]: 'u'

字串的迭代

字串是也是可迭代的物件:

In[56]: s = 'hello world'

In[57]: for i in s:
    ...:     print(i)
    ...:     
h
e
l
l
o
 
w
o
r
l
d

In[58]:

切片及索引

In[1]: s = "use python do something"

In[2]: s[1], s[-1], s[1:6:2], s[1:], s[:-1], s[:]
Out[2]: 
('s',
 'g',
 's y',
 'se python do something',
 'use python do somethin',
 'use python do something')

In[3]: s[1:] # 從1開始到最後
Out[3]: 'se python do something'

In[4]: # 倒序
In[5]: s[::-1]
Out[5]: 'gnihtemos od nohtyp esu'

In[7]: s[4]
Out[7]: 'p'

字串格式化

字串格式化是拼接字元的一種手段。如：

In[8]: print(' '.join(['i', 'love', 'python']))
i love python

In[9]: print('i' + ' love ' + 'python')
i love python

join和+拼接字串難以控制格式。接下來介紹兩種字串格式化的方法。一種是printf-style方式一種是format方式。

printf風格的格式化

首先介紹一下print函式的佔位符及其說明，後面的講解會用得到。

佔位符	說明
i	有符號整數數
d	有符號整數
o	有符號的八進位制數
x	十六進位制(以小寫顯示)
X	十六進位制(以大寫顯示)
e	科學計數法(以小寫顯示)
E	科學計數法(以大寫顯示)
f	浮點數
F	浮點數

語法為：

template % tuple
>>> 'I am %s' % ('lavenliu',) # 如果只有一個元素的時候，可以不用元組；即這裡可以省略逗號，或直接省略小括號。
'I am lavenliu'

template % dict
>>> 'I am %(name)s' % {'name': 'lavenliu'}
'I am lavenliu'
>>>
>>> 'I am %(name)s, my name is %(name)s' % {'name': 'lavenliu'}
'I am lavenliu, my name is lavenliu'
# 使用字典的形式的場景
## 1. 反覆出現
## 2. 需要格式化內容很多

一個簡單的例子，

>>> a = "this is %s %s" % ("my", 'apple')
>>> print(a)
this is my apple

佔位符的演示，

>>> '%i' % 18
'18'
>>> '%d' % 18
'18'
>>> '%ld' % 18 # 為了與C語言的相容
'18'
>>> '%o' % 18
'22'
>>> '%X' % 12
'C'
>>> '%x' % 12
'c'
>>> '%e' % 0.00000345
'3.450000e-06'
>>> '%E' % 0.00000345
'3.450000E-06'
>>> '%e' % 12
'1.200000e+01'
>>> '%f' % 0.00000345 # 預設顯示6位，不足就補0，多了就捨去
'0.000003'
# 如果只顯示3位小數呢
'%0.3f' % 0.00123
'0.001'
>>> '%F' % 0.00000345
'0.000003'
# r與s的區別
class A:
    def __str__(self):
        return 'I am A.__str__'


    def __repr__(self):
        return 'I am A.__repr__'


a = A()
>>> a = A()
>>> '%s' % 123
'123'
>>> '%s' % a
'I am A.__str__'
>>> '%r' % a
'I am A.__repr__'
>>> '%a' % 'n'
"'\n'"
# str是給人看的，repr是個機器看的

# a的演示
In[11]: '%a' % 'n'
Out[11]: "'\n'"

In[10]: '%a' % '大川淘氣'
Out[10]: "'\u5927\u5ddd\u6dd8\u6c14'"

當型別不匹配時，會丟擲TypeError。當佔位符是%s時，其實隱式呼叫了str()。

format風格的字串格式化

format語法，使用大括號作為佔位符。當呼叫format方法時，format傳入的引數會替換大括號。format方法的引數個數是可變的。

'I am {}'.format('lavenliu')
'I am {}, my age is {}'.format('lavenliu', 23) # 按照順序
# 如果按照順序呢
# 可以在佔位符里加數字指定format引數的位置
'I am {1}, my age is {0}'.format('18, 'lavenliu'')

# 可以在佔位符里加識別符號，以使用關鍵字引數
'I am {name}, my age is {age}'.format(name='lavenliu', age=18)
# 多次出現也是可以的
'I am {name}, my name is {name}'.format('lavenliu')
'I am {0}, my name is {0}'.format('lavenliu')

# 幾個好玩的用法
## 要麼全是位置與關鍵字的
## 要麼全是順序與關鍵字的
## 位置的要在關鍵字之前
>>> '{1} {0} {name}'.format(1, 2, name='abc')
'2 1 abc'
>>> '{} {} {name}'.format(1, 2, name='abc')
'1 2 abc'

>>> '{0} {name} {1}'.format(1, 2, name='abc')
'1 abc 2'

>>> '{name} {} {}'.format(1, 2, name='abc')
'abc 1 2'

>>> '{} {name} {}'.format(1, 2, name='abc')
'1 abc 2'

# 這樣寫就會出錯
## 順序的，位置的，關鍵字的，不要混用
>>> '{} {1} {name}'.format(1, 2, name='abc')
ValueError: cannot switch from automatic field numbering to manual field specification

# 位置引數要在關鍵字引數之前
>>> '{} {name} {}'.format(1, name='abc', 2)
  File "<stdin>", line 1
SyntaxError: positional argument follows keyword argument

幾個format的小例子，

In[12]: b = "this is {} {}" .format("my", "apple")

In[13]: print(b)
this is my apple

In[14]: ## {}裡可以加入引數的位置

In[15]: b = "this is {1} {0}" .format("apple", "my")

In[16]: print(b)
this is my apple

In[17]: ## 更加高階的用法，這裡不用指定數字，因為有可能算錯位置

In[18]: b = "this is {whose} {fruit}" .format(fruit="apple", whose="my")

In[19]: print(b)
this is my apple

還可以使用類進行格式化，很吊的，

class A:
    def __init__(self):
        self.x = 1
        self.y = 2


>>> a = A()
>>> a.x
1
>>> a.y
2

>>> '{0.x} {0.y}'.format(a)
'1 2'

>>> '{instance.x}'.format(instance=a)
'1'

可以使用列表進行格式化，

>>> lst = [1, 2, 3]
>>> '{0[0]}'.format(lst)
'1'
>>> '{lst[0]}'.format(lst=lst)
'1'

再來看幾個例子，主要涉及格式化時的對齊操作，

# < 左對齊
>>> '{0:<80}'.format('lavenliu')
'lavenliu                                                                          '

# > 右對齊
>>> '{0:>80}'.format('lavenliu')

# ^ 居中對齊
'{0:^80}'.format('lavenliu')

# 預設的，字串對齊方式是左對齊
'{0:80}'.format('lavenliu')

# 預設的，數字對齊方式是右對齊
'{0:80}'.format(10)

# 數字
'{0:d}'.format(10)
'{:n}'.format(1000)
'{:b}'.format(10)

# 還可以巢狀
>>> '{0:^{width}}'.format('lavenliu', width=80)
'                                     lavenliu                                     '

>>> '{0:#^{width}}'.format('lavenliu', width=80)
'#####################################lavenliu#####################################'

>>> '{0:{fill}^{width}}'.format('lavenliu', width=80, fill='*')
'*************************************lavenliu*************************************'

printf-style格式化對於從其他語言，尤其是C語言轉過來的，非常容易接受，但是Python並不推薦這種方法。 儘量使用內建的這種format的方式來格式化字串。

另外，也可以使用字典的方式實現字串的格式化。

In[20]: a = "this is %(whose)s %(fruit)s" % {'whose': 'my', 'fruit': 'apple'}

In[21]: a
Out[21]: 'this is my apple'

今日總結

字串是不可變的資料型別；
字串可以進行索引、切片、迭代等操作；
字串內建了很多方法供我們使用；
Python3中的字元預設是Unicode格式的；

格式化總結

佔位符與引數不匹配，會丟擲異常
{} 按照順序，使用位置引數
{數字 i} 會把位置引數當成一個列表 args，args[i] 當i不是args的索引的時候，丟擲IndexError
{關鍵字 k} 會把關鍵字引數當成一個字典kwargs，使用kwargs[k] 當k不是kwargs的key時，會丟擲KeyError
如果要單純的列印大括號，可以使用{{}}；如果要列印形如{18}的形式，可以使用{{{}}}

Python內建資料結構之字串

字串今天跟大家來說一說Python中的字串資料結構。上文回顧讓我們回顧一下可變型別及不可變型別:

Python內建資料結構之集合

今天給大家介紹內建資料結構集合的用法。看一下集合的思維導圖：集合的特點

Python內建資料結構之字典（完整版）

今天主要講解上次未完成的內建資料結構-字典。小白這幾天比較忙，忙的忘記了健身及寫作，特發此文以作補償。

Python內建資料結構之字典

今天給大家講解Python內建資料結構：字典。字典的內容比較多，今天只是簡單地介紹一下，明天會繼續補充字典相關的內容。

Python內建資料結構---字串

字串就是一段文字，由一個個字元組成的有序序列，其中的字元是Unicode碼點表示的。

Python基礎語法-內建資料結構之元組

今天給大家講解Python的內建資料結構元組。前面的內容大家有沒有複習呢？元組的特點：不可變的列表，但是可雜湊的。列表是不可雜湊的。

Python基礎語法-內建資料結構之列表

列表特性總結列表的一些特點：列表是最常用的線性資料結構 list是一系列元素的有序組合

Python內建資料結構----bytes和bytearray

bytes和bytearray Python提供了兩種位元組序列：不可變的 bytes 和可變的 bytearray 字串是字元組成的有序序列，在記憶體和磁碟中，所有的物件都是以二進位制數字（0和1）表示的。因為這些數字每8個為1組組成一個位

Python內建資料結構--------set集合

set性質可變的、無序的、不重複的元素的集合 set定義和初始化語法： set() -> 空集合

Python內建資料結構大總結

內建據結構大總結今天不講解新的內容，主要回顧一下以往講過的內建資料結構，來個大總結。

【自學Python系列】Python 基礎 (內建資料結構-列表，集合，字典）之列表

列表開篇講了一些語言的基礎，今天開始記錄一些學習中的其他python的知識。

(Python學習) 40. Python 常用內建資料結構部分總結-列表、元組、字典、集合

一、列表list 1. 列表的特徵　　1. 列表中的每個元素都可變的，意味著可以對每個元素進行修改和刪除；

python基礎知識——內建資料結構(集合)

python中的set是指一系列無序元素的集合，其中的元素都是相異的，常見的操作包括集合的並集，交集和補集等操作。

python基礎知識——內建資料結構(元組)

python中的內建資料結構主要有元組、列表和字典。本篇主要介紹元組。元組由不同的元素組成，每個元素可以儲存不同型別的資料，如字串、數字甚至是元組。

Python內建資料型別list各方法的效能測試過程解析

這篇文章主要介紹了Python內建資料型別list各方法的效能測試過程解析,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

Python內建資料型別

數值型　　int 　　float 　　complex 　　bool 數字的處理函式 round()；四捨六入，五取偶；round(1.5)=2;round(2.5)=2;

資料結構之字串

最長迴文串題目解析 ASCⅡ表有256個位。 class Solution { public int longestPalindrome(String s) {

Python |內建四大資料結構之（列表）

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

Python |內建四大資料結構之（字典）

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

Python |內建四大資料結構之（元組與集合）

本文的文字及圖片來源於網路,僅供學習、交流使用,不具有任何商業用途,版權歸原作者所有,如有問題請及時聯絡我們以作處理

Python內建資料結構之字串

上文回顧

Python字串編碼

字串常用方法

字串連線(join)

字串分割(split系列方法)

字串的迭代

切片及索引

字串格式化

printf風格的格式化

format風格的字串格式化

今日總結

格式化總結

相關推薦