2 Series&Pandas

阿新 • • 發佈：2021-06-16

為什麼學習pandas

numpy已經可以幫助我們進行資料的處理了，那麼學習pandas的目的是什麼呢？
- numpy能夠幫助我們處理的是數值型的資料，當然在資料分析中除了數值型的資料還有好多其他型別的資料（字串，時間序列），那麼pandas就可以幫我們很好的處理除了數值型的其他資料！

什麼是pandas？

首先先來認識pandas中的兩個常用的類
- Series
- DataFrame

Series

Series是一種類似與一維陣列的物件，由下面兩個部分組成：
- values：一組資料（ndarray型別）
- index：相關的資料索引標籤
Series的建立
- 由列表或numpy陣列建立
- 由字典建立

from pandas import Series
s = Series(data=[1,2,3,'four'])
s

0       1
1       2
2       3
3    four
dtype: object

import numpy as np
s = Series(data=np.random.randint(0,100,size=(3,)))
s

0    53
1    24
2    35
dtype: int32

#index用來指定顯示索引
s = Series(data=[1,2,3,'four'],index=['a','b','c','d'])
s

a       1
b       2
c       3
d    four
dtype: object

#為什麼需要有顯示索引
# 顯示索引可以增強Series的可讀性

dic = {
    '語文':100,
    '數學':99,
    '理綜':250
}
s = Series(data=dic)
s

語文    100
數學     99
理綜    250
dtype: int64

Series的索引和切片

s[0]
s.語文
s[0:2]

語文    100
數學     99
dtype: int64

Series的常用屬性
- shape
- size
- index
- values

s.shape
s.size
s.index #返回索引
s.values #返回值
s.dtype #元素的型別

dtype('int64')

s = Series(data=[1,2,3,'four'],index=['a','b','c','d'])
s.dtype #資料型別O表示的是Object（字串型別）

dtype('O')

Series的常用方法
- head(),tail()
- unique()
- isnull(),notnull()
- add() sub() mul() div()

s = Series(data=np.random.randint(60,100,size=(10,)))
s.head(3) #顯示前n個數據

0    99
1    99
2    88
dtype: int64

s.tail(3) #顯示後n個元素

7    85
8    70
9    76
dtype: int64

s.unique() #去重

array([99, 88, 74, 72, 80, 63, 85, 70, 76])

s.isnull() #用於判斷每一個元素是否為空，為空返回True，否則返回False

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
dtype: bool

s.notnull()

0    True
1    True
2    True
3    True
4    True
5    True
6    True
7    True
8    True
9    True
dtype: bool

Series的算術運算
- 法則：索引一致的元素進行算數運算否則補空

s1 = Series(data=[1,2,3],index=['a','b','c'])
s2 = Series(data=[1,2,3],index=['a','d','c'])
s = s1 + s2
s

a    2.0
b    NaN
c    6.0
d    NaN
dtype: float64

s.isnull()

a    False
b     True
c    False
d     True
dtype: bool

DataFrame

DataFrame是一個【表格型】的資料結構。DataFrame由按一定順序排列的多列資料組成。設計初衷是將Series的使用場景從一維拓展到多維。DataFrame既有行索引，也有列索引。
- 行索引：index
- 列索引：columns
- 值：values
DataFrame的建立
- ndarray建立
- 字典建立

from pandas import DataFrame

df = DataFrame(data=[[1,2,3],[4,5,6]])
df

	0	1	2
0	1	2	3
1	4	5	6

df = DataFrame(data=np.random.randint(0,100,size=(6,4)))
df

	0	1	2	3
0	93	61	7	1
1	89	41	29	16
2	21	66	97	24
3	56	96	13	87
4	86	21	20	54
5	19	18	96	7

dic = {
    'name':['zhangsan','lisi','wanglaowu'],
    'salary':[1000,2000,3000]
}
df = DataFrame(data=dic,index=['a','b','c'])
df

	name	salary
a	zhangsan	1000
b	lisi	2000
c	wanglaowu	3000

DataFrame的屬性
- values、columns、index、shape

df.values
df.columns
df.index
df.shape

(3, 2)

============================================

練習4：

根據以下考試成績表，建立一個DataFrame，命名為df：

    張三  李四  
語文 150  0
數學 150  0
英語 150  0
理綜 300  0

============================================

dic = {
    '張三':[150,150,150,150],
    '李四':[0,0,0,0]
}
df = DataFrame(data=dic,index=['語文','數學','英語','理綜'])
df

	張三	李四
語文	150	0
數學	150	0
英語	150	0
理綜	150	0

DataFrame索引操作
- 對行進行索引
- 佇列進行索引
- 對元素進行索引

df = DataFrame(data=np.random.randint(60,100,size=(8,4)),columns=['a','b','c','d'])
df

	a	b	c	d
0	75	69	79	67
1	98	65	96	79
2	71	82	91	92
3	73	60	89	69
4	70	74	64	79
5	85	76	65	68
6	81	62	89	76
7	69	94	95	92

df['a'] #取單列，如果df有顯示的索引，通過索引機制去行或者列的時候只可以使用顯示索引

0    75
1    98
2    71
3    73
4    70
5    85
6    81
7    69
Name: a, dtype: int32

df[['a','c']] #取多列

	a	c
0	95	83
1	76	78
2	69	89
3	74	93
4	75	93
5	67	66
6	95	71
7	72	79

iloc:
- 通過隱式索引取行
loc:
- 通過顯示索引取行

#取單行
df.loc[0]

a    95
b    87
c    83
d    68
Name: 0, dtype: int64

#取多行
df.iloc[[0,3,5]]

	a	b	c	d
0	95	87	83	68
3	74	77	93	82
5	67	98	66	85

#取單個元素
df.iloc[0,2]
df.loc[0,'a']

#取多個元素
df.iloc[[1,3,5],2]

1    78
3    93
5    66
Name: c, dtype: int64

DataFrame的切片操作
- 對行進行切片
- 對列進行切片

#切行
df[0:2]

	a	b	c	d
0	95	87	83	68
1	76	82	78	95

#切列
df.iloc[:,0:2]

	a	b
0	95	87
1	76	82
2	69	94
3	74	77
4	75	88
5	67	98
6	95	83
7	72	74

df索引和切片操作
- 索引：
  - df[col]:取列
  - df.loc[index]:取行
  - df.iloc[index,col]:取元素
- 切片：
  - df[index1:index3]:切行
  - df.iloc[:,col1:col3]:切列
DataFrame的運算
- 同Series

============================================

練習：

假設ddd是期中考試成績，ddd2是期末考試成績，請自由建立ddd2，並將其與ddd相加，求期中期末平均值。
假設張三期中考試數學被發現作弊，要記為0分，如何實現？
李四因為舉報張三作弊立功，期中考試所有科目加100分，如何實現？
後來老師發現有一道題出錯了，為了安撫學生情緒，給每位學生每個科目都加10分，如何實現？

============================================

dic = {
    '張三':[150,150,150,150],
    '李四':[0,0,0,0]
}
df = DataFrame(data=dic,index=['語文','數學','英語','理綜'])
qizhong = df
qimo = df

(qizhong + qizhong) / 2 #期中期末的平均值

	張三	李四
語文	150	0
數學	150	0
英語	150	0
理綜	150	0

qizhong.loc['數學','張三'] = 0
qizhong #將張三的數學成績修改為0

	張三	李四
語文	150	0
數學	0	0
英語	150	0
理綜	150	0

#將李四的所有成績+100
qizhong['李四']+=100
qizhong

	張三	李四
語文	150	100
數學	0	100
英語	150	100
理綜	150	100

qizhong += 10
qizhong #將所有學生的成績+10

	張三	李四
語文	160	110
數學	10	110
英語	160	110
理綜	160	110

時間資料型別的轉換
- pd.to_datetime(col)
將某一列設定為行索引
- df.set_index()

dic = {
    'time':['2010-10-10','2011-11-20','2020-01-10'],
    'temp':[33,31,30]
}
df = DataFrame(data=dic)
df

	time	temp
0	2010-10-10	33
1	2011-11-20	31
2	2020-01-10	30

#檢視time列的型別
df['time'].dtype

dtype('O')

import pandas as pd

#將time列的資料型別轉換成時間序列型別
df['time'] = pd.to_datetime(df['time'])
df

	time	temp
0	2010-10-10	33
1	2011-11-20	31
2	2020-01-10	30

df['time']

0   2010-10-10
1   2011-11-20
2   2020-01-10
Name: time, dtype: datetime64[ns]

#將time列作為源資料的行索引
df.set_index('time',inplace=True)

df

	temp
time
2010-10-10	33
2011-11-20	31
2020-01-10	30

2 Series&Pandas

為什麼學習pandas numpy已經可以幫助我們進行資料的處理了，那麼學習pandas的目的是什麼呢？

Delphi 2009 泛型容器單元(Generics.Collections)[2]: TQueue&lt;T&gt;

TQueue 和 TStack,一個是佇列列表,一個是堆疊列表; 一個是先進先出,一個是先進後出.TQueue 主要有三個方法、一個屬性:Enqueue(入列)、Dequeue(出列)、Peek(檢視下一個要出列的元素);Count(元素總數).

Codeforces Round #655 (Div. 2) B&C題解

程式碼如下： #include<iostream> #include<algorithm> #include<cstring> using namespace std;

Linux中 2>&1 的含義

平時寫shell指令碼時經常見到命令或者定時任務的後面跟著2>&1的寫法，舉個例子大概如下面這樣

吳裕雄--天生自然ANDROID開發學習：2.4.2 Date & Time元件(上)

可以通過呼叫：TextClock提供的is24HourModeEnabled()方法來檢視，系統是否在使用24進位制時間顯示! 在24進位制模式中：

linux shell中"2>&1"含義

linux shell中\"2>&1\"含義在計劃任務中經常可以看到。例如： */2 * * * * root cd /opt/xxxx/test_S1/html/xxxx/admin; php index.php task testOne >/dev/null 2>&1

【JSOI2019】精準預測（2-SAT & bitset）

Description 先有一臺預測機，可以預測當前 \$n\$ 個人在 \$T\$ 個時刻內的生死關係。關係有兩種：

洛谷 11 月月賽 II div.2 T2&T3 題解

T2 P7107 天選之人 https://www.luogu.com.cn/problem/P7107?contestId=13515 根據貪心，那p個人最多各能拿到p/k個有記號的紙團

SpringMVC - 2 請求&響應

SpringMVC - 2 請求&響應 4.1 普通型別引數傳參引數名與處理器方法形參名保持一致

面試題（2）&購物車

技術標籤：測試面試軟體測試一、資料庫基礎現有兩張表，分別為賬號表Account、考試成績表Exam，欄位定義如下： Account 賬號表欄位名欄位型別欄位說明 ID int 學員ID Name varchar 學員姓名 Class varchar

sql操作資料庫（2）---&gt;DQL、資料庫備份和還原

查詢查詢表中的所有的行和列的資料 select * from 表名; select * from student; 查詢指定列的資料：如果有多個列，中間用逗號隔開。

shell重定向輸出(1>&2 2>&1 &>file >&file)

在shell程式中，最常用的檔案描述符FD(file descriptor)大概有三個： 0: 標準輸入(stdin)

linux shell中“2＞&1“含義

技術標籤：centos 在計劃任務中經常可以看到。例如我們公司的計劃任務舉例：

pd.to_excel&&Pandas操作Excel儲存資料到同一張表格&&pd.ExcelWriter&&Python列表儲存到excel&&(list、array)

技術標籤：Pythonpythonpandasexcel Pandas 操作Excel表儲存資料到同一張表格 import numpy as np

linux命令 nohup python -u 12.py ＞ test.txt 2＞&1 &

技術標籤：linux linux命令 nohup python -u 12.py > test.txt 2>&1 & nohup&一個命令的執行伴隨著三種輸入輸出

【linux命令】Linux shell中 2＞&1的含義解釋

技術標籤：【計算機組成原理&amp;作業系統】文章目錄 1. 首先了解下1和2在Linux中代表什麼1.1 重定向1.2 標準操作符1.3 關於2 >& 1的含義1.3.1 把標準輸出和標準錯誤輸出分別指定到不同的檔案中1.

Linux shell中2>&1的含義解釋

A.首先了解下1和2在Linux中代表什麼在Linux系統中0 1 2是一個檔案描述符名稱程式碼操作符Java中表示Linux 下檔案描述符（Debian 為例)

linux 2>&1 和 &的意思

先上程式碼 $ command > file 2>&1 $ command >> file 2>&1 這裡的&沒有固定的意思

Day1-ES6-2-let & const

1. let let 宣告變數，僅在塊級作用域內有效，即let所在程式碼塊內有效。 let不存在變數提升，即需先宣告，再使用。

【資料結構】單鏈表的操作例2-3&4（TQ-P27）

#include <bits/stdc++.h> using namespace std; //定義單鏈表結點型別 typedef struct LNode { int data;//每個節點存放一個數據元素

	a	b	c	d
0	75	69	79	67
1	98	65	96	79
2	71	82	91	92
3	73	60	89	69
4	70	74	64	79
5	85	76	65	68
6	81	62	89	76
7	69	94	95	92

	a	b	c	d
0	75	69	79	67
1	98	65	96	79
2	71	82	91	92
3	73	60	89	69
4	70	74	64	79
5	85	76	65	68
6	81	62	89	76
7	69	94	95	92

2 Series&Pandas

為什麼學習pandas

什麼是pandas？

Series

DataFrame

相關推薦

	a	b	c	d
0	75	69	79	67
1	98	65	96	79
2	71	82	91	92
3	73	60	89	69
4	70	74	64	79
5	85	76	65	68
6	81	62	89	76
7	69	94	95	92