菜鷄日記——《Python資料分析與挖掘實戰》實驗6-1 拉格朗日插值法
阿新 • • 發佈:2018-11-10
實驗6-1 用拉格朗日插值法
題目描述:用拉格朗日插值法對missing_data.xls中表格的空值進行填補。
# p1, lab6 # Fill all of the null values with Lagrange's interpolation # Data file name is "missing_data.xls" import pandas as pd from scipy.interpolate import lagrange dir = 'F:/Data Mining/codes/ch6/lab6_1' # dir is a built-in name, will be shadowed if is distinctly defined data = pd.read_excel(dir + '/data/missing_data.xls', header=None) # header=None indicates that the table does not have header def lagrange_interpolate(s, n, k=5): y = s[list(range(n-k, n)) + list(range(n+1, n+1+k))] # may create indexes out of bound, which are defined as null values y = y[y.notnull()] # y.notnull() returns a Series object in boolean type return lagrange(y.index, list(y))(n) # method lagrange(x, w) in module scipy.interpolate # param x is an array like object, represents the x-coordinates of a set of points # param w is an array like object, represents the y-coordinates of a set of points # return a numpy.lib.polynomial.poly1d object (polynomial type) represents the Lagrange interpolating polynomial # WARNING: this implementation is unstable, do not expect to be able to use more than 20 points # (poly1d)(n) gets the result of the polynomial when x=n for col in data.columns: for i in range(len(data)): if data[col].isnull()[i]: # Series.isnull() returns a Series object in boolean type data[col][i] = lagrange_interpolate(data[col], i) # DataFrame[column][index] can locate elements in the DataFrame object # error ever made: in the conditional statement, miss [col] so that returns a DataFrame object rather than a Series object data.to_excel(dir + '/data/result.xls', header=None, index=False) # the last two params construct a table without header and index
missing_data.xls result.xls
我學到了什麼?
- 拉格朗日插值法 https://blog.csdn.net/xidiancoder/article/details/71244316
- excel表格匯入和匯出時表頭和索引的控制
df.read_excel(header=None) 說明讀入的表格沒有表頭,否則missing_data.xls的首行會被當作表頭
df.to_excel(header=None, index=False) 指定匯出的表格不含表頭和索引,否則result.xls會有表頭並在最左邊顯示索引
- isnull()和notnull()的返回物件
二者都是DataFrame或Series的方法,用於空值的判斷,返回DataFrame或Series物件。isnull()方法在空值的位置記為True,否則記為False;notnull()方法在空值的位置記為False,否則記為True
- DataFrame物件的定位
data[column][index]可以定位到列名為column、索引名為index的位置
- 提取資料時越界
在上面的lagrange_interpolate()方法中,首行用於提取樣本點,顯然(n-k)和(n+k)都可能越界。但是通過除錯觀察發現,當發生越界時,越界的下標對應的位置值位空值,然後在配合下一條去除空值的語句將越界的取值剔除了