pandas 學習彙總15 - str函式(全面 tcy)

阿新 • • 發佈：2018-12-15

str函式 2018/12/5

彙總52個str函式，本人經過全部測試，簡單的容易理解的沒寫例程，你可根據簡表進行測試；複雜難以理解的

都附加有例程，共有12個例程。簡表註釋內容根據測試結果編寫，而不是按簡單的翻譯，有時是難以理解的。

1.函式表

No	字串處理	類似Python中str忽略Na;適用於Series / Index
0	使用方式：s.str.cat()	返回序列索引或其他
1	capitalize()	字串轉大寫
2	cat([others, sep, na_rep, join])	用分隔符連線字串；返回str或原物件構架
3	center(width[, fillchar])	用附加字元填充字串的左側和右側
4	contains(‘'is')	判斷字串中是否包含子串
5	count(pat[, flags])	計算字串中子串出現次數
6	decode(encoding[, errors])	位元組解碼
7	encode(encoding[, errors])	字串編碼
8	endswith(pat[, na])	測試字串結尾是否是特定子串 true
9	extract(pat[, flags, expand])	從正則表示式pat中提取第一個匹配字元；結果為1個字元
10	extractall(pat[, flags])	從正則表示式pat提取所有匹配的，返回組
11	find(sub[, start, end])	查詢子串首索引，子串包含在[start：end]
12	findall(pat[, flags])	查詢所有符合正則表示式的字元，以陣列形式返回
13	get(i)	從指定位置提取字元
14	get_dummies([sep])	用sep拆分每個字串，返回一個虛擬/指示變數框
15	index(sub[, start, end])	子串最低索引，子串範圍[start：end]；
	-	任何一個系列中的元素中無子串引發異常
16	isalnum()	檢查所有字元是否為字母數字
17	isalpha()	檢查是否都是字母
18	isdecimal()	檢查是否都是十進位制
19	isdigit()	檢查是否都是數字
20	islower()	檢查是否都是小寫
21	isnumeric()	檢查是否都是數字
22	isspace()	檢查是否都是空格
23	istitle()	檢查是否都是標題
24	isupper()	檢查是否都是大寫
25	join(sep)	用分隔符連線所有字元；同一級有非str返回Na
26	len()	計算每個字串的長度
27	ljust(width[, fillchar])	使用附加字元填充字串的右側
28	lower()	字串轉小寫
29	lstrip([to_strip])	左側刪除空格（包括換行符）或其他str
30	match(pat[, case, flags, na, …])	確定每個字串是否與正則表示式匹配。
31	normalize(form)	返回字串的Unicode普通表單
32	pad(width[, side, fillchar])	指定左或右填充字元補齊字串
33	partition([pat, expand])	分隔符拆分為3部分，分隔符左，分隔符，分隔符右
34	repeat(repeats)	重複每個元素指定的次數
35	replace(a,b)	將值a替換為值b。
36	rfind(sub[, start, end])	右邊查詢子串索引，子串包含在[start：end]
37	rindex(sub[, start, end])	返回每個元素中子串的最高索引，子串範圍[start：end]
	-	任何一個查詢不到則異常
38	rjust(width[, fillchar])	使用附加字元填充字串的左側
39	rpartition([pat, expand])	右拆分成3部分含分隔符
40	rsplit([pat, n, expand])	分隔符字串右邊拆分字串
41	rstrip([to_strip])	右側刪除空格（包括換行符）
42	slice([start, stop, step])	切片擷取字串
43	slice_replace([start, stop, repl])	用另一個值替換字串的位置切片；比較複雜看例項
44	split(['', n, expand])	按分隔符或子串拆分字串
45	startswith('st')	測試字串開頭是否匹配子串True
46	strip('')	刪除字串左右空白（包括換行符)或刪除其他左右字串
47	swapcase()	變換字母大小寫
48	title()	字串轉標題
49	translate(table[, deletechars])	通過對映表對映字串中的所有字元
50	upper()	字串轉大寫
51	wrap(width, **kwargs)	長字串換行，#結果插入換行符\n
52	zfill(width)	用0填充字串的左側

2例項

text="this is string example!"
s=pd.Series(text.split())

# 0 this
# 1 is
# 2 string
# 3 example!
# dtype: object

# 例項1：index
s.str.index('s')#必須每個元素中都含有‘s',:否則異常

# 例項2：translate字串對映表

intab='aeiou'
outtab='12345'
trantab=str.maketrans(intab,outtab)
# {97: 49, 101: 50, 105: 51, 111: 52, 117: 53}

s.str.translate(trantab)
# 0 th3s
# 1 3s
# 2 str3ng
# 3 2x1mpl2!
# dtype: object

# 例項3：

s.str.split('i')#拆分

# 0 [th, s]
# 1 [, s]
# 2 [str, ng]
# 3 [example!]
# dtype: object

# 例項4：抽取匹配的字元串出來，注意要加上括號，把你需要抽取的東西標註上

s.str.extract("([d-z])")#提取第一個匹配字元
# 0
# 0 t
# 1 i
# 2 s
# 3 e

# 例項5：
s.str.extractall("([d-z])")#提取所有匹配字元


#            0
#   match
# 0  0       t
#    1       h
#    2       i
#    3       s
# 1  0       i
#    1       s
# 2  0       s
#    1       t
#    2       r
#    3       i
#    4       n
#    5       g
# 3  0       e
#    1      x
#    2      m
#    3      p
#    4      l
#    5      e

# 例項6：unicodedata.normalize（form，unistr ）

# 返回Unicode字串unistr的普通表單表單。
# 表單的有效值為 “NFC”，“NFKC”，“NFD”和“NFKD”。

# 正規形式D（NFD）也稱為規範分解，並將每個字元轉換為其分解形式。
# 普通形式C（NFC）首先應用規範分解，然後再次組合預組合字元。
# 普通形式KD（NFKD）將應用相容性分解，即將所有相容性字元替換為其等效字元。
# 正常形式KC（NFKC）首先應用相容性分解，然後是規範組合物。

import unicodedata
unicodedata.normalize('NFD', '\u00C7').encode('utf-8')# 'Ç'
unicodedata.normalize('NFD', '\u0043') # 'C'
unicodedata.normalize('NFD', '\u0327') # '̧'
unicodedata.normalize('NFD', '\u0043\u0327') # 'Ç'

unicodedata.normalize('NFD', '\u00C7').encode('ascii','ignore')#b'C'
b1=unicodedata.normalize('NFD', '\u00C7').encode('ascii','ignore')
b1.decode()# 'C'

title = u"Klüft skräms inför på fédéral électoral große"
unicodedata.normalize('NFKD', title).encode('ascii','ignore')
# 'Kluft skrams infor pa federal electoral groe'

# 例項7：

v=s.str.encode('utf-8')
# 0 b'this'
# 1 b'is'
# 2 b'string'
# 3 b'example!'
# dtype: object

v.str.decode('utf-8')
# 0 this
# 1 is
# 2 string
# 3 example!
# dtype: object

# 例項8：
s.str.findall("[a-z]")

# 0 [t, h, i, s]
# 1 [i, s]
# 2 [s, t, r, i, n, g]
# 3 [e, x, a, m, p, l, e]
# dtype: object

# 例項9：

s.str.match("[d-z]")
# 0 True
# 1 True
# 2 True
# 3 True
# dtype: bool

# 例項10：

s.str.join('-')

# 0 t-h-i-s
# 1 i-s
# 2 s-t-r-i-n-g
# 3 e-x-a-m-p-l-e-!

s1 = pd.Series([['Tom', 'Bob', 'Jim'],[1.1, 2.2, 3.3],['s1', np.nan, 's2'],
['s3', 4.5, 's4'], ['s5', ['s6', 's7'], 's8']])
# 0 [Tom, Bob, Jim]
# 1 [1.1, 2.2, 3.3]
# 2 [s1, nan, s2]
# 3 [s3, 4.5, s4]
# 4 [s5, [s6, s7], s8]
# dtype: object

s1.str.join('-')
# 0 Tom-Bob-Jim
# 1 NaN
# 2 NaN
# 3 NaN
# 4 NaN

#例項11：字串連線

s = pd.Series(['a', 'b', np.nan, 'd'])
s.str.cat(sep=' ', na_rep='?') # 'a b ? d'

s.str.cat(['A', 'B', 'C', 'D'], sep=',')
# 0 a,A
# 1 b,B
# 2 NaN
# 3 d,D
# dtype: object

s.str.cat(['A', 'B', 'C', 'D'], sep=',', na_rep='-')
# 0 a,A
# 1 b,B
# 2 -,C
# 3 d,D
# dtype: object

t = pd.Series(['D', 'A', 'B', 'C'], index=[4, 1, 2, 3])
s.str.cat(t, join=None, na_rep='-')
# 0 aD
# 1 bA
# 2 -B
# 3 dC

s.str.cat(t, join='left', na_rep='-')
# 0 a-
# 1 bA
# 2 -B
# 3 dC
# dtype: object

s.str.cat(t, join='outer', na_rep='-')
# 0 a-
# 1 bA
# 2 -B
# 3 dC
# 4 -D
# dtype: object

s.str.cat(t, join='inner', na_rep='-')
# 1 bA
# 2 -B
# 3 dC
# dtype: object

s.str.cat(t, join='right', na_rep='-')
# 4 -D
# 1 bA
# 2 -B
# 3 dC
# dtype: object

#例項12：

s = pd.Series(['a', 'ab', 'abc', 'abdc', 'abcde'])

# 0 a
# 1 ab
# 2 abc
# 3 abdc
# 4 abcde
# dtype: object

# 指定“start”，意思是將“start”替換為“respont”，直到字串的末尾
s.str.slice_replace(1, repl='X')
# 0 aX
# 1 aX
# 2 aX
# 3 aX
# 4 aX
# dtype: object

# 只指定‘top’，這意味著將字串的開頭替換為‘Stop’，其餘的字串將包括在內
s.str.slice_replace(stop=2, repl='X')
# 0 X
# 1 X
# 2 Xc
# 3 Xdc
# 4 Xcde
# dtype: object


# 指定‘start’和‘top’，這意味著從‘start’到‘top’的片段被替換為‘repl’。“開始”和“停止”之前或之後的所有內容都按原樣包括在內。
s.str.slice_replace(start=1, stop=3, repl='X')
# 0 aX
# 1 aX
# 2 aX
# 3 aXc
# 4 aXde

pandas 學習彙總15 - str函式(全面 tcy)

str函式 2018/12/5 彙總52個str函式，本人經過全部測試，簡單的容易理解的沒寫例程，你可根據簡表進行測試；複雜難以理解的都附加有例程，共有12個例程。簡表註釋內容根據測試結果編寫，而不是按簡單的翻譯，有時是難以理解的。 1.函式表

pandas 學習彙總13 - 函式應用- 將自定義或其他庫函式應用於Pandas物件( tcy)

Pandas函式應用- 將自定義或其他庫函式應用於Pandas物件（pipe,apply,applymap,map,agg） 2018/12/5 1.函式： # 表函式應用： df.pipe(func, *args, **kwarg

pandas 學習彙總10 - 統計：視窗函式rolling，expanding( tcy)

視窗函式rolling，expanding 2018/12/4 主要用在統計方面。 1.函式 df.rolling(window,

numpy 學習彙總15 -廣播 ( 基礎學習 tcy)

廣播 2018/6/19 2018/11/21 =================================================================== 1.說明：廣播描述了算術運算期間numpy如何處理具有不同形狀的陣列 NumPy使用廣播決定處

pandas 學習彙總17 - 計算( tcy)

1.算數計算 2018/11/8 2018/12/10 1.1函式： Series.product([axis, skipna, level, …]) # 返回請求軸的值的乘積；各個元素相乘 Series.dot(other) # 矩陣乘法與

pandas 學習彙總16 - 基本設定( tcy)

pandas基本設定 2018/12/5 1.函式： get_option(*args, **kwds) # 獲取預設引數值 set_option(*args, **kwds) # 設定引數值 reset_option(*args, **kwds) # 引數重設為預設值 desc

pandas 學習彙總12 - 描述性統計(比較全 tcy)

描述性統計 2018/12/4 1.統計函式說明：大部分是聚合函式（因此產生低維結果）採用軸引數（通過名稱或整數）可選level引數，該引數僅在物件具有分層索引時才適用可選skipna引數，一般預設排除系列輸入上的NA值。 2.視窗函式：

pandas 學習彙總11 - 統計：pd.cut與pd.qcut數字按區間劃分( tcy)

pd.cut與pd.qcut數字按區間劃分 2018/12/4 1.函式： pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_low

pandas 學習彙總9 - Series系列，DataFrame資料幀屬性( tcy)

Series-屬性 2018/11/8 2018/12/6 序列： # 可以把Series看成有序字典；均勻資料；尺寸資料均可變 s=pd.Series(data=np.arange(10,15),index=pd.Index(list('abcde'))

pandas 學習彙總8 - Series系列，DataFrame資料幀新增刪除（行列）( tcy)

新增刪除 2018/12/3 1.函式： s1.append(to_append, ignore_index=False, verify_integrity=False) #更多序列連線 df.append(other, ignore_index=False, verify_in

pandas 學習彙總7 - 缺失資料( tcy)

缺少資料 2018/12/3 # 用np.nan表示缺失資料。預設不包含在計算中 dates=pd.date_range('2018-12-02',periods=4) df=pd.DataFrame(np.random.random((4,3)) ,index=dates,c

pandas 學習彙總5 - index 建立( tcy)

index 建立 2018/12/2 #1.pd.Index i=pd.Index([1,2,3,4]) # (Int64Index([1, 2, 3, 4], dtype='int64') i=pd.Index(list('abcd')) # Index(['a','b','c',

pandas 學習彙總3 - Series,DataFrame迭代iter( tcy)

迭代iter 2018/12/1 ======================================================================= 1.基本iteration()產生：#系列：值；DataFrame：列標籤；面板：專案標籤 # 迭代Seri

python 學習彙總22：函式屬性（ tcy）

屬性 ================================================================== 1.屬性： __

python 學習彙總18：函式註釋（ tcy）

Python 函式註釋 2018/11/14 ===================================================================== 1.定義函式： def dog(name,

python 學習彙總21：函式用作引數（ tcy）

函式用作引數 2018 / 11 / 14 ==================================================================== 1.1.將函式作為引數;# 以字串的形式執行函式 imp

python 學習彙總37：functools（ tcy）

functools 2018/9/13 -------------------------------------------------------------------------------------- 1 模組簡介用於高階函式：指那些作用於函式或者返回其它函式的函式，通常

python 學習彙總41：unicode（ tcy）

Unicode 2018/7/1 1.字元一般字元： 'a';'嚴'; 特殊字元：編輯器中不能直接輸入，在字串文字中使用轉義序列 \u轉義序列編寫特定的Unicode程式碼點 '\u0394' # 'Δ' 16-bit hex value "\U00000394

numpy 學習彙總2.1-集合運算 tcy

集合運算 2018/11/11 ====================================================================== 1.np.unique 唯一值 # 它用幹找出陣列中的唯一值並返回已排序的結果

python 學習彙總27：itertools函式詳解（ tcy）

itertools函式 2018/11/14 2.1.建立新iter： count(start=0, step=1)#無限迴圈數;按Ctrl + C退出 # 返回均勻間隔值無限流；通常用作map()生成連續資料點的引數。此外，用於zip()新增序列號 g = itertools.count

pandas 學習彙總15 - str函式(全面 tcy)

str函式 2018/12/5

相關推薦