【Pandas-Cookbook】04:分組、聚集
阿新 • • 發佈:2019-02-12
# -*-coding:utf-8-*-
# by kevinelstri
# 2017.2.16
# ---------------------
# Chapter 4: Find out on which weekday people bike the most with groupby and aggregate
# ---------------------
import pandas as pd
import matplotlib.pyplot as plt
"""
4.1 Adding a 'weekday' column to our dataframe
"""
bikes = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', index_col='Date', parse_dates=['Date'],
dayfirst=True)
print bikes.head()
bikes['Berri 1'].plot() # 繪製曲線
# plt.show()
berri_bikes = bikes[['Berri 1']].copy() # 將某一列的資料複製出來,單獨為一列
print berri_bikes[:5]
print berri_bikes.index
print berri_bikes.index.day
print berri_bikes.index.weekday
berri_bikes.loc[:, 'weekday'] = berri_bikes.index.weekday
print berri_bikes[:5]
"""
4.2 Adding up the cyclists by weekday
"""
"""
使用DataFrames中的.groupby()方法進行分組,並計算每一組的數量和
"""
weekday_counts = berri_bikes.groupby('weekday').aggregate(sum)
print weekday_counts
weekday_counts.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
print weekday_counts
weekday_counts.plot(kind='bar')
# plt.show()
"""
4.3 Putting it together
"""
"""
所有程式碼彙總
"""
bikes = pd.read_csv('../data/bikes.csv', sep=';', encoding='latin1', index_col='Date', dayfirst=True,
parse_dates=['Date'])
berri_bikes = bikes[['Berri 1']].copy()
berri_bikes.loc[:, 'weekday'] = berri_bikes.index.weekday
weekday_counts = berri_bikes.groupby('weekday').aggregate(sum)
weekday_counts.index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
weekday_counts.plot(kind='bar')
plt.show()
"""
分析:
主要是計算時間,分組處理一週時間,將每週對應的數量加到對應的天上
方法:
1、csv資料的讀取
2、列資料的複製
3、將資料按照一週來進行劃分
4、按照一週進行分組處理資料,修改索引
5、直方圖展示
"""