Pandas模組上——Series型別

阿新 • • 發佈：2020-09-07

Pandas模組

1.非常強大的python資料分析包
2.基於numpy構建的 所以你學習起來會有一種似曾相識的感覺
3.pandas奠定了python在資料分析領域的一哥地位

主要功能

1 具有兩大非常靈活強大的資料型別
	Series
    DataFrame
2.整合時間模組
3.提供豐富的數學運算和操作(基於Numpy)
4.針對缺失資料操作非常靈活

匯入方法

匯入pandas，約定俗成的匯入語句書寫
import pandas as pd

資料結構之Series

是一種類似於一維陣列物件，由資料和相關的標籤(索引)組成
Series的結構
	左側是標籤
	右側是資料
    
Series的建立方式總共有四種

# 第一種		
# Series的建立
res = pd.Series([111,222,333,444,555])
res
# 預設會自動幫你用索引作為資料的標籤

# 第二種
# 指定元素的標籤:個數一定要一致
res1 = pd.Series([111,222,333,444,555],index=['a','b','c','d','e'])
res1

# 第三種
# 直接放字典
res2 = pd.Series({
        'username':'abc',
        'password':123,
        'hobby':'read'
    })
res2

# 第四種
pd.Series(0,index=['a','b','c'])
執行結果：
a    0
b    0
c    0
dtype: int64

缺失資料處理

isnull		判斷是否缺失，是缺失值返回True
notnull		判斷是否缺失，不是缺失值返回True
	# 過濾缺失值 布林型索引
    obj[obj.notnull()]
dropna		刪除缺失資料
fillna		填充缺失資料
	# 預設也是不修改原來的資料 要想直接修改加引數inplace=True即可

Series的各種特性

基本跟Numpy操作一致
1.ndarray直接建立Series:Series(array)
    Series可以直接將numpy中的一維陣列轉換(這裡必須只能是一維)
    res = pd.Series(np.array([1,2,3,4,5,6]))
    res

2.與標量運算
    res = pd.Series([1,2,3,4,5])
    res * 2
    0     2
    1     4
    2     6
    3     8
    4    10
    dtype: int64
    
3.兩個Series運算
    res * res
    0     1
    1     4
    2     9
    3    16
    4    25
    dtype: int64
        
    res1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
    res * res1
    0   NaN
    1   NaN
    2   NaN
    3   NaN
    4   NaN
    a   NaN
    b   NaN
    c   NaN
    d   NaN
    dtype: float64
        
4.通用函式abs
    res3 = pd.Series([-1,-2,-3,-4,5,6])
    res3.abs()
    0    1
    1    2
    2    3
    3    4
    4    5
    5    6
    dtype: int64
        
5.布林值索引
	res[res.notnull()]
    
6.統計函式

7.從字典建立Series:Series(dic)
    res4 = pd.Series({'username':'abc','password':123})
    res4
    
8.In運算
    'username' in res4
    True
    
    'username' in res4	# 跟python中的字典不一樣 這裡直接拿資料而不是標籤
        print(i)
    abc
	123
    
9.鍵索引與切片

10.其他函式等

布林選擇器

import numpy as np
import pandas as pd
mask = pd.Series([True,False,False,True,False])
price = pd.Series([321312,123,324,5654,645])

# 掌握
price[mask]
0    321312
3      5654
dtype: int64
    
# 瞭解
price|mask
0    True
1    True
2    True
3    True
4    True
dtype: bool

price&mask
0    False
1    False
2    False
3    False
4    False
dtype: bool

# 需要掌握
(price > 100) & (price < 700)
0    False
1     True
2     True
3    False
4     True
dtype: bool

price[(price > 100) & (price < 700)]
1    123
2    324
4    645
dtype: int64

索引及標籤

res = pd.Series({'a':111,'b':222,'c':333,'d':444,'e':555})
一般情況下可以使用兩種方式
# 索引取值
	res[0]
    111
# 標籤取值
	res['a']
    111
    
# 獲取所有的標籤
    res.index
    Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

# 給標籤加列名稱
    res.index.name = 'ZBC'
    res
    ZBC
    a    111
    b    222
    c    333
    d    444
    e    555
    dtype: int64
    
# 整數索引
    sr = pd.Series(np.arange(6))
    sr
    0    0
    1    1
    2    2
    3    3
    4    4
    5    5
    dtype: int32
        
    res1 = sr[3:]
    res1
    3    3
    4    4
    5    5
    dtype: int32
    # 索引取值
    # res1[1]  # 報錯
    '''針對取值操作，以後需要用特定的方法來約束'''
    # iloc按照索引的方式取值
    # loc按照標籤的方式取值
    # res1.iloc[1]  # 1
    #res1.loc[3]  # 3
    '''非常重要，一定要記憶'''

日期型別

# date_range時間間隔
    res1 = pd.date_range('2020-01-01','2020-06-01',freq='M')  # frep後面按照指定的時間間隔
    res1
    DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
                   '2020-05-31'],
                  dtype='datetime64[ns]', freq='M')

# 還可以將日期作為Series的標籤
    res2 = pd.Series([111,222,333,444,555],index=res1)
    res2
    2020-01-31    111
    2020-02-29    222
    2020-03-31    333
    2020-04-30    444
    2020-05-31    555
    Freq: M, dtype: int64

資料對齊

sr1 = pd.Series([12,23,34], index=['c','a','d'])
sr2 = pd.Series([11,20,10], index=['d','c','a',])
sr1 + sr2
執行結果:
a    33
c    32
d    45
dtype: int64
    
# 可以通過這種索引對齊直接將兩個Series物件進行運算
sr3 = pd.Series([11,20,10,14], index=['d','c','a','b'])
sr1 + sr3
執行結果：
a    33.0
b     NaN
c    32.0
d    45.0
dtype: float64
# sr1 和 sr3的索引不一致，所以最終的執行會發現b索引對應的值無法運算，就返回了NaN,一個缺失值
# 因為NaN其實是float型別,所以int64變成float64。（type(np.nan)  結果是:float）

資料操作

# 查
    res.loc['a']

# 改
    res.iloc[2] = 100

# 增
# 方式1：append不修改原資料
	res.append(pd.Series([66],index=['e']))
# 方式2：set_value直接修改原資料
	res.set_value('f',999) 	# 會有一個提示 如果不想有這個提示需要修改配置
    
# 刪：del關鍵字作用的也是原資料
	del res['f']

靈活的算術方法

"""
針對加減乘除等數學運算
可以直接使用符號
也可以使用提供的方法名(可以有額外的功能)
add
sub
div
mul
"""
sr1 = pd.Series([12,23,34], index=['c','a','d'])
sr2 = pd.Series([11,20,10,14], index=['d','c','a','b'])
sr1.add(sr2,fill_value=0)
執行結果：
a    33.0
b    14.0
c    32.0
d    45.0
dtype: float64
# 將缺失值設為0，所以最後算出來b索引對應的結果為14
'''
在計算之前對即將要計算的資料進行缺失值的處理再運算
避免資料不準確的問題
'''

Pandas模組上——Series型別

Pandas模組 1.非常強大的python資料分析包 2.基於numpy構建的所以你學習起來會有一種似曾相識的感覺

Pandas模組——Series和DataFrame詳解

技術標籤：pytorchpythonpandas import pandas as pd 1. Pandas模組這裡主要講解兩種資料結構： 1）Series 2）DataFrame

Python資料分析pandas模組用法例項詳解

本文例項講述了Python資料分析pandas模組用法。分享給大家供大家參考，具體如下：

pandas模組篇（之二）

今日內容概要布林選擇器索引資料對齊資料操作(增出改查) 算術方法 DataFrame(Excel表格資料)

pandas模組篇(之三）

今日內容概要目標:將Pandas儘量結束如何讀取外部excel檔案資料到DataFrame中針對DataFrame的常用資料操作

pandas模組篇（終章）及初識mataplotlib

今日內容概要時間序列針對表格資料的分組與聚合操作其他函式補充(apply) 練習題(為了加深對DataFrame操作的印象)

利用Python的Testlink模組上傳測試結果，取代Jenkins中的Testlink外掛

需要在Python環境中安裝Testlink-API-Python-client。TestLink-API-Python-client is a Python XML-RPC client for Testlink

python——pandas模組

import pandas as pd import numpy as np s = pd.Series([1,3,6,np.nan,44,1]) print(s) # #------------------------------------------DataFrame的建立及基本操作-------------------------------------------

大資料分析使用numpy在pandas dataframe上新增列

　　當我們使用Python進行資料分析時，有時可能需要根據DataFrame其他列中的值向pandas DataFrame新增一列。

模組上電須注意初始化時間

技術標籤：javaandroid 加粗樣式@模組上電須注意初始化時間最近我公司的一個專案（andriod），在除錯導航模組（整合GPS，北斗等）出現不能切換導航模式的情況。具體情況是這樣的：導航模組在手機開機後就一直開

pandas自動推斷日期型別

構建一個csv檔案: import pandas as pd pd.DataFrame(data={\"datetime\": [\"1999-10-10 10:10:10\"] * 150, \"index\": range(150)}).to_csv(\'/tmp/test.csv\', index=False)

Python 第三方模組資料分析 Pandas模組 DataFrame

技術標籤：資料科學# 資料分析# Pythonpython資料分析pandasDataFrame 一.簡介: 提供了比R語言的data.frame更豐富的功能

Python 第三方模組資料分析 Pandas模組高階功能

技術標籤：資料科學# 資料分析python資料分析pandascategory 一.類別型資料(Categorical Data) 1.概念:

【Python12】pandas模組儲存Excel檔案

技術標籤：Pythonpythonpandasexcel 文章目錄 1. Pandas模組簡介2. 儲存為Excel檔案 1. Pandas模組簡介

paramiko模組上傳檔案失敗：paramiko.ssh_exception.SSHException: Channel closed.

問題： paramiko模組上傳檔案失敗，提示paramiko.ssh_exception.SSHException: Channel closed.

科陸電錶通過卓嵐無線串列埠模組上傳智慧雲平臺

科陸電錶通過卓嵐無線串列埠模組上傳智慧雲平臺 1.概述隨著目前無線通訊領域的快速發展，遠端實時監控電錶資料已經成為發展的趨勢，無線遠端監控在應用領域運用這一塊，更是尤為重要。卓嵐無線串列埠模組ZLAN7144實

pandas模組

目錄練習題 pandas模組簡介資料型別之Series 缺失資料概念資料修改規則布林值索引

資料分析之pandas模組

練習: 1. 1.計算陣列每一行和每一列的中位數(不能使用axis引數) array([[ 80.5, 60., 40.1, 20., 90.7],[ 10.5, 30., 50.4, 70.3, 90.],[ 35.2, 35., 39.8, 39., 31.],[91.2, 83.4, 85.6, 67.8, 99.]])

pandas模組高效能使用方法總結

pandas處理起來大批量資料是很方便的,本文主要是根據自己的工作經驗總結一下pandas裡不同迴圈方法的優劣

pandas模組2

Series資料操作 res = pd.Series([111,222,333,444]) 增 res[\'a\'] = 123 查 res.loc[1] 改 res[0] = 1 刪 del res[0]

Pandas模組上——Series型別

Pandas模組

主要功能

匯入方法

資料結構之Series

缺失資料處理

Series的各種特性

布林選擇器

索引及標籤

日期型別

資料對齊

資料操作

靈活的算術方法

相關推薦