關於python中時間格式的疑惑
python中間的時間格式,尤其是在用到 pandas 和 numpy之後可是迷迷糊糊的,處理起帶有時間的資料時就很暈。下面結合stackoverflow中的回答 對 python中的datetime標準模組,numpy模組和pandas模組中的時間objects做個區分記錄。
The datetime standard library of Python
這裡面只有4個主要的物件:
- time - 只有time,可以以hours,minutes,seconds和microseconds衡量
- date - 只有year, month, day
- datetime - 包含date和time的所有物件
- timedelta - 最大單位是天的一段時間
import datetime datetime.time(hour=1,minute=25,second=61,microsecond=6333) Traceback (most recent call last): File "<ipython-input-2-8e2667fea8f6>", line 1, in <module> datetime.time(hour=1,minute=25,second=61,microsecond=6333) ValueError: second must be in 0..59 datetime.time(hour=1,minute=25,second=22,microsecond=6333) Out[3]: datetime.time(1, 25, 22, 6333) datetime.date(year=2018,month=9,day=23) Out[4]: datetime.date(2018, 9, 23) datetime.datetime(year =2018,month=9,day=23,hour=20,minute=22,second=30,microsecond=3155) Out[5]: datetime.datetime(2018, 9, 23, 20, 22, 30, 3155) datetime.timedelta(days=3,minutes=55) Out[6]: datetime.timedelta(3, 3300) datetime.timedelta(days=3,minutes=55) + datetime.datetime(year =2018,month=9,day=23,hour=20,minute=22,second=30,microsecond=3155) Out[7]: datetime.datetime(2018, 9, 26, 21, 17, 30, 3155) datetime.date(2018,9,23) Out[8]: datetime.date(2018, 9, 23) datetime.date(2018,23,9) Traceback (most recent call last): File "<ipython-input-40-258ea9b432d0>", line 1, in <module> datetime.date(2018,23,9) ValueError: month must be in 1..12
可以看到中間,我瞎試了以下 second>59這是不允許的,然後你照著預設的年月日 時分秒的順序來其實是可以不用輸入 year=,month=,...這之類的
Numpy's datetime64 and timedelta64 objects
Numpy中間沒有分離date和time物件,只有一個datetime64物件表示一瞬間的時間,datetime模組中間的datetime物件精度為微秒級(10^-7)而Numpy中的datetime64物件精度有到attoseconds(10^-18),更靈活能有支援更多型別的輸入
import numpy as np np.datetime64(5,'ns') Out[9]: numpy.datetime64('1970-01-01T00:00:00.000000005') np.datetime64('2018-09-23') Out[10]: numpy.datetime64('2018-09-23') np.datetime64('2018-9-23') Traceback (most recent call last): File "<ipython-input-11-5f3797908da0>", line 1, in <module> np.datetime64('2018-9-23') ValueError: Error parsing datetime string "2018-9-23" at position 5 np.datetime64('2018/09/23') Traceback (most recent call last): File "<ipython-input-12-fbe5ac53716b>", line 1, in <module> np.datetime64('2018/09/23') ValueError: Error parsing datetime string "2018/09/23" at position 4 np.datetime64('2018-09-23 05:00') Out[13]: numpy.datetime64('2018-09-23T05:00') np.timedelta64(5,'D') Out[15]: numpy.timedelta64(5,'D') np.datetime64('2018-09-23 05:00') - np.datetime64('2018-09-23 04:00:59') Out[16]: numpy.timedelta64(3541,'s')
這裡可以看出datetime64對於 對於時間的格式要求還是很 嚴格的,而且必須帶單位,直接字串轉變的時候必須符合xxxx-xx-xx xx:xx:xx的形式,比如2018-09-24 變為2018-9-24都不行。
Pandas中的Timestamp和Timedelta
其實這兩個就是在Numpy的時間格式的基礎上深入,pandas中的Timestamp也是表示一瞬間的時間,跟datetime很相似,但有更多功能,可以用pd.Timestamp和pd.to_datetime來構建此物件。
import pandas as pd
pd.Timestamp(1234.1256537)#default ns
Out[19]: Timestamp('1970-01-01 00:00:00.000001234')
pd.Timestamp(1234.1256537, unit='h')#change units
Out[21]: Timestamp('1970-02-21 10:07:32.354399999')
pd.Timestamp('2018-9-23 5:00')
Out[22]: Timestamp('2018-09-23 05:00:00')
pd.to_datetime('2018-9-23 5:00')
Out[23]: Timestamp('2018-09-23 05:00:00')
pd.to_datetime(['2018-9-23 5:00','2018-9-23 15:00'])
Out[24]: DatetimeIndex(['2018-09-23 05:00:00', '2018-09-23 15:00:00'], dtype='datetime64[ns]', freq=None)
pd.to_datetime(['2018-9-23 5:00'])
Out[25]: DatetimeIndex(['2018-09-23 05:00:00'], dtype='datetime64[ns]', freq=None)
pd.to_datetime(['2018-9-23 5:00','2018-9-23 15:00'])[0]
Out[26]: Timestamp('2018-09-23 05:00:00')
a = pd.DataFrame([['2018-9-24 12:00',1,3],['2018-9-24 11:00',2,4],['2018-9-24 10:00',5,9]],columns=['date','num1','num2'])
a
Out[27]:
date num1 num2
0 2018-9-24 12:00 1 3
1 2018-9-24 11:00 2 4
2 2018-9-24 10:00 5 9
a.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
date 3 non-null object
num1 3 non-null int64
num2 3 non-null int64
dtypes: int64(2), object(1)
memory usage: 152.0+ bytes
b = a.date.apply(lambda x:pd.Timestamp(x))
b
Out[28]:
0 2018-09-24 12:00:00
1 2018-09-24 11:00:00
2 2018-09-24 10:00:00
Name: date, dtype: datetime64[ns]
b[0]
Out[29]: Timestamp('2018-09-24 12:00:00')
這裡可以看出pandas中對於時間的格式要求不高,2018-9-24也可以通過,但是也出來我很疑惑的一點了,在一個Series中,輸出info資訊,會出現dtype為 datetime64[ns], 但對於每一個單獨的,又是timestamp格式???
Convert Python datetime to datetime64 and Timestamp
這兩個轉變都很簡單,如下所示
dt = datetime.datetime(2018,9,24,13,39,40,34676)
dt
Out[59]: datetime.datetime(2018, 9, 24, 13, 39, 40, 34676)
np.datetime64(dt)
Out[60]: numpy.datetime64('2018-09-24T13:39:40.034676')
pd.Timestamp(dt)
Out[61]: Timestamp('2018-09-24 13:39:40.034676')
pd.to_datetime(dt)
Out[62]: Timestamp('2018-09-24 13:39:40.034676')
Convert datetime64 to datetime and Timestamp
前者比較麻煩 要先變為float 然後變為datetime 後者更容易 pd.Timestamp/to_datetime()
dt64 = np.datetime64('2017-10-24 05:34:00.136562')
dt64
Out[30]: numpy.datetime64('2017-10-24T05:34:00.136562')
unix_epoch = np.datetime64(0, 's')
one_second = np.timedelta64(1, 's')
seconds_since_epoch = (dt64 - unix_epoch) / one_second
seconds_since_epoch
Out[32]: 1508823240.1365621
datetime.datetime.utcfromtimestamp(seconds_since_epoch)
Out[33]: datetime.datetime(2017, 10, 24, 5, 34, 0, 136562)
pd.to_datetime(dt64)
Out[34]: Timestamp('2017-10-24 05:34:00.136562')
pd.Timestamp(dt64)
Out[35]: Timestamp('2017-10-24 05:34:00.136562')
Convert Timestamp to datetime datetime64
這個也比較簡單,如程式碼所示
ts = pd.Timestamp('2018-9-24 10:22:46.3654')
ts.to_pydatetime()#python's datetime
Out[37]: datetime.datetime(2018, 9, 24, 10, 22, 46, 365400)
ts.to_datetime64()
Out[38]: numpy.datetime64('2018-09-24T10:22:46.365400000')
這幾種都可以互相比較大小的嘛??
dt64
Out[63]: numpy.datetime64('2017-10-24T05:34:00.136562')
dt
Out[64]: datetime.datetime(2018, 9, 24, 13, 39, 40, 34676)
ts
Out[65]: Timestamp('2018-09-24 10:22:46.365400')
dt64>dt
Out[66]: False
ts>dt
Out[67]: False
ts>dt64
Out[68]: True
那兩種型別單獨輸出都是 timestamp 但是比較起來提示 float和timestamp不能比較的原因是??
有點懵