numpy最後一部分及pandas初識

阿新 • • 發佈：2020-09-03

今日內容概要

numpy剩餘的知識點
pandas模組

今日內容詳細

二元函式

加                   add
減                   sub
乘                   mul
除                   div
平方                 power

數學統計方法

sum                 求和
cumsum              累計求和
mean                對整體求平均數
std                 標準差
var                 方差
min
max
argmin              求最小元素對應的索引 
armax               求最大元素對應的索引

隨機數

np.random.rand(2.5) # 隨機0-1之間的小數
array([[0.65863779, 0.9994306 , 0.35758039, 0.02292617, 0.70794499],
       [0.15469852, 0.97426284, 0.25622487, 0.20442957, 0.95286145]])
np.random.randint(1,10) # 取給定的數之間的隨機整數
4
np.random.choice([111,222,333,444,555]) # 從給定的陣列中隨機的取一個
333
res = [1,2,3,4,5,6,7,8,9,10,'J','Q','K','A']
np.random.shuffle(res)  # 隨機打亂順序
res
[1, 'Q', 'J', 6, 8, 'K', 2, 3, 'A', 7, 10, 4, 9, 5]
np.random.uniform([100,10,11,22]) # 指定陣列位置產生對應的隨機數
array([83.53838005,  5.4824623 ,  4.85571734,  7.33774372])

特殊值含義

1、nan(Not a Number):不等於任何浮點數(nan != nan)
       表示的意思是缺失值(pandas裡面)
---------------------------------------------

2、inf(infinity):比任何浮點數都大
---------------------------------------------

Pandas模組

1.非常強大的python資料分析包
2.基於numpy構建的 所以學習起來會有一種似曾相識的感覺
2.pandas奠定了python在資料分析領域的一哥地位

主要功能

1.具有兩大非常靈活強大的資料型別
   series
   DataFrame
2.整合時間模組
3.提供豐富的數學運算和操作(基於numpy)
4.針對缺失資料操作非常靈活

兩大資料結構

python中的字典
     key：value的形式
             key是對value的描述性資訊
             value就是具體的資料
     res = {
          'username':'jason',
          'password':123,
          'hobby':'read'
     }

Series

類似於字典的帶有標籤的陣列

DataFrame
```
其實類似於excel表格資料
```
都是基於numpy構建出來的

公司中使用頻繁的是DataFrame，而Series是構成DataFrame的基礎，即一個DataFrame可能由N個Series構成

基本使用

一定要先匯入pandas，約定俗成的匯入語句書寫
import pandas as pd

資料結構之Series

是一種類似於一維陣列物件，由資料和相關的標籤(索引)組成
Series的建立方式總共有四種

左側In[*]  正在執行載入
In[數字]   載入或者執行成功
"""
在python中匯入模組的時候 只會載入一次
之後匯入相同的模組不會繼續載入而是直接使用
上一次載入的結果
python會將所有匯入的模組單獨存放到一個檔案中
      匯入模組的時候會自動去檔案中查詢  如果有則直接使用
python直譯器關閉之後該檔案就會清空      
"""

#第一種
# Series的建立
res = pd.Series([111,222,333,444,555])
res
0    111
1    222
2    333
3    444
4    555
dtype: int64 # 預設會自動幫你用索引作為資料的標籤

# 第二種
# 2.指定元素的標籤：個數一定要一致
res1 = pd.Series([111,222,333,444,555],index=['a','b','c','d','e'])
res1
a    111
b    222
c    333
d    444
e    555
dtype: int64
# 第三種
# 3.直接放字典
res2 = pd.Series({'username':'jason','password':123,'hobby':'read'})
res2
username    jason
password      123
hobby        read
dtype: object
# 第四種
pd.Series(0,index=['a','b','c'])
a    0
b    0
c    0
dtype: int64
'''
Series的結構
    左側是標籤
    右側是資料
'''

缺失資料

'''前戲'''
# 第一步，建立一個字典，通過Series方式建立一個Series物件
st = {"tony":18,"yang":19,"bella":20,"cloud":21}
obj = pd.Series(st)
obj
tony     18
yang     19
bella    20
cloud    21
dtype: int64
--------------------------------------------

# 第二步
a = {'tony','yang','cloud','satan'} # 定義一個索引變數
-------------------------------------------

# 第三步
obj1 = pd.Series(st,index=a)
obj1   # 將第二步定義的a變數作為索引傳入

yang     19.0
tony     18.0
satan     NaN
cloud    21.0
dtype: float64
# 因為rocky沒有出現在st的鍵中，所以返回的是缺失值

特殊值的處理

1.isnull
2.notnull
3.dropna
4.fillna

obj1.isnull()
yang     False
tony     False
satan     True
cloud    False
dtype: bool

obj1.notnull()
yang      True
tony      True
satan    False
cloud     True
dtype: bool
    
3、
過濾缺失值 # 布林型索引
obj1[obj1.notnull()]
yang     19.0
tony     18.0
cloud    21.0
dtype: float64

obj1.dropna()   # 預設不會改變原來資料
yang     19.0
tony     18.0
cloud    21.0
dtype: float64
    
obj1.dropna(inplace=True)  # 該引數預設是Flase不修改資料本身
obj1
tony     18.0
yang     19.0
cloud    21.0
dtype: float64
    
obj1.fillna(666)   # 預設也是不修改原來的資料的 要想直接修改加引數inplace=True即可
yang     19.0
tony     18.0
cloud    21.0
dtype: float64
yang     19.0
tony     18.0
satan     NaN
cloud    21.0
dtype: float64

'''
上述四個方法中
     方法3和4是使用頻率最高的
'''

Series的各種特性

基於跟Numpy操作一致
1.ndarray直接建立Series：Series(array)
         Series可以直接將numpy中的一維陣列轉換（這裡必須只能是一維）
         
        res = pd.Series(np.array([1,2,3,4,5,6]))
        res
        0    1
        1    2
        2    3
        3    4
        4    5
        5    6
        dtype: int32
            
        res1 = pd.Series(np.array([[1,2,3,4],[5,6,7,8]]))
        res1    # 報錯
        Exception                                 Traceback (most recent call last)
<ipython-input-30-e48b7149c3f9> in <module>
----> 1 res1 = pd.Series(np.array([[1,2,3,4],[5,6,7,8]]))
      2 res1

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    309                     data = data.copy()
    310             else:
--> 311                 data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
    312 
    313                 data = SingleBlockManager(data, index, fastpath=True)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    727     elif subarr.ndim > 1:
    728         if isinstance(data, np.ndarray):
--> 729             raise Exception("Data must be 1-dimensional")
    730         else:
    731             subarr = com.asarray_tuplesafe(data, dtype=dtype)

Exception: Data must be 1-dimensional

2.與標量運算
    res = pd.Series([11,22,33,44,55])
    res
    0    11
    1    22
	2    33
	3    44
	4    55
	dtype: int64
    res * 2
    0     22
	1     44
	2     66
	3     88
	4    110
	dtype: int64
3.兩個Series運算
      res * res
    0     121
	1     484
	2    1089
	3    1936
	4    3025
	dtype: int64
        
  res1 = pd.Series([1,2,3,4],index=['a','b','c','d'])
  res * res1
  	0   NaN
	1   NaN
	2   NaN
	3   NaN
	4   NaN
	a   NaN
	b   NaN
	c   NaN
	d   NaN
	dtype: float64  
4.通用函式abs
   	res3 = pd.Series([-1,-2,-3,-4,5,6])
   	res3.abs()
   	0    1
	1    2
	2    3
	3    4
	4    5
	5    6
	dtype: int64
5.布林值索引
6.統計函式
7.從字典建立Series：Series(dic)
8.In運算
    res4 = pd.Series({'username':'satan','password':987})
    res4
    username    satan
	password      987
	dtype: object
    'username' in res4
    True
    
    for i in res4: # 跟python中的字典不一樣 這裡直接拿資料而不是標籤
    print(i)
    satan
	987
9.鍵索引與切片
10.其他函式等

小補充

# 當你的機器下載了anaconda之後無法正常呼起一個jupyter介面
1.先去你的cmd視窗檢查是否可以正常執行
    ipython
    jupyter notebook
2.anaconda軟體需要的資源其實有點多
3.你的預設瀏覽器沒有設定好

在你的cmd視窗中輸入jupyter notebook啟動一個服務端， 不借助於anaconda
手動將網址複製到瀏覽器位址列
直接使用即可