python-入門數分案例

阿新 • • 發佈：2020-09-20

Catalog：Click to jump to the corresponding position

目錄：

一、股票分析

二、雙均線策略制定

三、pandas資料清洗例項

四、python入門資料分析最終案例

=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=.=

一、股票分析

需求：

1.使用tushare包獲取某股票的歷史行情資料

2.輸出該股票所有收盤比開盤上漲3%以上的日期

3.輸出該股票所有開盤比前日收盤跌幅超過2%的日期。

4.假如我從2010年1月1日開始，每月第一個交易日買入1手股票，每年最後一個交易日賣出所有股票，到2019年為止，我的收益如何？

import tushare as ts
import pandas as pd
#獲取某隻股票的歷史行情資料
df = ts.get_k_data(code='600703',start='2000-01-01') #code:字串形式的股票程式碼
df.head() #顯示前5行資料

#將網際網路上獲取的股票資料儲存到本地
df.to_csv('F:\pycharm\data\sagd.csv')#呼叫to_xxx方法將df中的資料寫入到本地進行儲存
df = pd.read_csv('F:\pycharm\data\sagd.csv')    #將指定的csv讀到df中
df.head() # 
顯示前5行資料

#需要對讀取出來的資料進行相關的處理
#刪除df中指定的一列
df.drop(labels='Unnamed: 0',axis=1,inplace=True)  
#drop例的axis和其他地方相反，這裡的axis=1，表示列，0表示行
df.head()

#檢視每一列的資料型別
df.info()

#將date列轉為時間序列型別
df['date'] = pd.to_datetime(df['date'])
df.info()

#將date列作為源資料的行索引
df.set_index('date',inplace=True)
df.head()  #顯示前五行

#輸出該股票所有收盤比開盤上漲3%以上的日期
(df['open'] - df['close']) / df['open'] > 0.03  #返回的是布林值

#經驗分享：在分析的過程中如果產生了布林值則下一步馬上將布林值作為源資料的行索引
 #如果布林值作為df的行索引，則可以取出true對應的行資料，忽略false對應的行資料

df.loc[(df['open'] - df['close']) / df['open'] > 0.03] 
#獲取了True對應的行資料（滿足需求的行資料）

df.loc[(df['open'] - df['close']) / df['open'] > 0.03].index 
#返回為True的df行資料

#輸出該股票所有開盤比前日收盤跌幅超過2%的日期
(df['open'] - df['close'].shift(1))/df['close'].shift(1) < -0.02 
#shift()表示將整列移動N個單位，正數為向下，負數為向上

#將布林值作為源資料的行索引取出True對應的行資料
df.loc[(df['open'] - df['close'].shift(1))/df['close'].shift(1) < -0.01]
df.loc[(df['open'] - df['close'].shift(1))/df['close'].shift(1) < -0.01].index  #顯示日期

假如我從2010年1月1日開始，每月第一個交易日買入1手股票，每年最後一個交易日賣出所有股票，到2019年為止，我的收益如何？

時間節點：2010-2019

一手股票：100支股票

買：一個完整的年需要買入1200支股票

賣：一個完整的年需要賣出1200支股票

買賣股票的單價：開盤價

new_df = df['2010-01':'2019-01']  #只有行索引為日期格式時才可以這麼切列
new_df.head()

#買股票：找每個月的第一個交易日對應的行資料（捕獲到開盤價）==》每月的第一行資料
#根據月份從原始資料中提取指定的資料
#每月第一個交易日對應的行資料
df_monthly = new_df.resample('M').first()#資料的重新取樣
df_monthly.head()

#買入股票花費的總金額
cost = df_monthly['open'].sum()*100
cost

#賣出股票時對應的價格
df_yearly = new_df.resample('A').last()
df_yearly

#賣出股票到手的錢
resv = df_yearly['open'].sum()*1200
resv

#計算總收益
resv-cost

二、雙均線策略制定

需求:

df = pd.read_csv('F:\pycharm\data\sagd.csv').drop(labels='Unnamed: 0',axis=1)
df.head()

#將date列轉為時間序列且將其作為源資料的行索引
df['date'] = pd.to_datetime(df['date'])
df.set_index('date',inplace=True)
df.head()

#計算該股票歷史資料的5日均線和30日均線
ma5 = df['close'].rolling(5).mean()
ma30 = df['close'].rolling(30).mean()
ma5.head(10) #顯示前10行

#將空值切掉
ma5 = ma5[30:]
ma30 = ma30[30:]
df = df[30:]  #將df與ma對應起來
#定義一下金叉死叉
s1 = ma5 < ma30
s2 = ma5 > ma30

death_ex = s1 & s2.shift(1) #判定死叉的條件
df.loc[death_ex] #死叉對應的行資料
death_date = df.loc[death_ex].index
death_date

golden_ex = ~(s1 | s2.shift(1))#判定金叉的條件
golden_date = df.loc[golden_ex].index #金叉的時間
golden_date

如果我從假如我從2010年1月1日開始，初始資金為100000元，金叉儘量買入，死叉全部賣出，則到今天為止，我的炒股收益率如何？
分析：
買賣股票的單價使用開盤價
買賣股票的時機
最終手裡會有剩餘的股票沒有賣出去:
會有。如果最後一天為金叉，則買入股票。估量剩餘股票的價值計算到總收益
剩餘股票的單價就是用最後一天的收盤價

from pandas import Series
s1 = Series(data=1,index=golden_date) #1作為金叉的標識
s2 = Series(data=0,index=death_date) #0作為死叉的標識

s = s1.append(s2)
s = s.sort_index() #儲存的是金叉和死叉對應的時間
s.head()

s = s['2010':'2020'] #將資料切成10年--19年  資料儲存的是金叉和死叉對應的時間
s.head()

first_monry = 100000 #本金，不變
money = first_monry #可變的，買股票話的錢和賣股票收入的錢都從該變數中進行操作
hold = 0 #持有股票的數量（股數：100股=1手）
for i in range(0,len(s)): #i表示的s這個Series中的隱式索引
    #i = 0(死叉：賣) = 1（金叉：買）
    if s[i] == 1:#金叉的時間
        #基於100000的本金儘可能多的去買入股票
        #獲取股票的單價（金叉時間對應的行資料中的開盤價）
        time = s.index[i] #金叉的時間
        p = df.loc[time]['open'] #股票的單價
        hand_count = money // (p*100) #使用100000最多買入多少手股票
        hold = hand_count * 100 
        
        money -= (hold * p) #將買股票話的錢從money中減去
    else:
        #將買入的股票賣出去
        #找出賣出股票的單價
        death_time = s.index[i]
        p_death = df.loc[death_time]['open'] #賣股票的單價
        money += (p_death * hold) #賣出的股票收入加入到money
        hold = 0
#如何判定最後一天為金叉還是死叉
last_monry = hold * df['close'][-1] #剩餘股票的價值
#總收益
money + last_monry - first_monry