pandas read_excel()和to_excel()函式解析

阿新 • • 發佈：2020-01-09

前言

資料分析時候，需要將資料進行載入和儲存，本文主要介紹和excel的互動。

read_excel()

載入函式為read_excel(),其具體引數如下。

read_excel(io,sheetname=0,header=0,skiprows=None,skip_footer=0,index_col=None,names=None,parse_cols=None,parse_dates=False,date_parser=None,na_values=None,thousands=None,convert_float=True,has_index_names=None,converters=None,dtype=None,true_values=None,false_values=None,engine=None,squeeze=False,**kwds)

常用引數解析：

io : string,path object ; excel 路徑。
sheetname : string,int,mixed list of strings/ints,or None,default 0 返回多表使用sheetname=[0,1],若sheetname=None是返回全表注意：int/string 返回的是dataframe，而none和list返回的是dict of dataframe
header : int,list of ints,default 0 指定列名行，預設0，即取第一行，資料為列名行以下的資料若資料不含列名，則設定 header = None

skiprows : list-like,Rows to skip at the beginning，省略指定行數的資料
skip_footer : int,default 0,省略從尾部數的int行資料
index_col : int,default None指定列為索引列，也可以使用u”strings”
names : array-like,default None,指定列的名字。

資料來源：

sheet1:
ID NUM-1  NUM-2  NUM-3
36901  142 168 661
36902  78 521 602
36903  144 600 521
36904  95 457 468
36905  69 596 695

sheet2：
ID NUM-1  NUM-2  NUM-3
36906  190 527 691
36907  101 403 470

(1)函式原型

basestation ="F://pythonBook_PyPDAM/data/test.xls"
data = pd.read_excel(basestation)
print data

輸出：是一個dataframe

   ID NUM-1 NUM-2 NUM-3
0 36901  142  168  661
1 36902   78  521  602
2 36903  144  600  521
3 36904   95  457  468
4 36905   69  596  695

(2) sheetname引數：返回多表使用sheetname=[0,若sheetname=None是返回全表注意：int/string 返回的是dataframe，而none和list返回的是dict of dataframe

data_1 = pd.read_excel(basestation,sheetname=[0,1])
print data_1
print type(data_1)

輸出：dict of dataframe

OrderedDict([(0,ID NUM-1 NUM-2 NUM-3
0 36901  142  168  661
1 36902   78  521  602
2 36903  144  600  521
3 36904   95  457  468
4 36905   69  596  695),(1,ID NUM-1 NUM-2 NUM-3
0 36906  190  527  691
1 36907  101  403  470)])

（3）header引數:指定列名行，預設0，即取第一行，資料為列名行以下的資料若資料不含列名，則設定 header = None ，注意這裡還有列名的一行。

data = pd.read_excel(basestation,header=None)
print data
輸出：
    0   1   2   3
0   ID NUM-1 NUM-2 NUM-3
1 36901  142  168  661
2 36902   78  521  602
3 36903  144  600  521
4 36904   95  457  468
5 36905   69  596  695

data = pd.read_excel(basestation,header=[3])
print data
輸出：
  36903 144  600  521 
0 36904   95  457  468
1 36905   69  596  695

（4） skiprows 引數：省略指定行數的資料

data = pd.read_excel(basestation,skiprows = [1])
print data
輸出:
   ID NUM-1 NUM-2 NUM-3
0 36902   78  521  602
1 36903  144  600  521
2 36904   95  457  468
3 36905   69  596  695

(5)skip_footer引數：省略從尾部數的int行的資料

data = pd.read_excel(basestation,skip_footer=3)
print data
輸出：
   ID NUM-1 NUM-2 NUM-3
0 36901  142  168  661
1 36902   78  521  602

（6）index_col引數：指定列為索引列，也可以使用u”strings”

data = pd.read_excel(basestation,index_col="NUM-3")
print data
輸出：
     ID NUM-1 NUM-2
NUM-3           
661  36901  142  168
602  36902   78  521
521  36903  144  600
468  36904   95  457
695  36905   69  596

（7）names引數：指定列的名字。

data = pd.read_excel(basestation,names=["a","b","c","e"])
print data
    a  b  c  e
0 36901 142 168 661
1 36902  78 521 602
2 36903 144 600 521
3 36904  95 457 468
4 36905  69 596 695

具體引數如下：

>>> print help(pandas.read_excel)
Help on function read_excel in module pandas.io.excel:

read_excel(io,**kwds)
Read an Excel table into a pandas DataFrame

Parameters
----------
io : string,path object (pathlib.Path or py._path.local.LocalPath),file-like object,pandas ExcelFile,or xlrd workbook.
The string could be a URL. Valid URL schemes include http,ftp,s3,and file. For file URLs,a host is expected. For instance,a local
file could be file://localhost/path/to/workbook.xlsx
sheetname : string,default 0

Strings are used for sheet names,Integers are used in zero-indexed
sheet positions.

Lists of strings/integers are used to request multiple sheets.

Specify None to get all sheets.

str|int -> DataFrame is returned.
list|None -> Dict of DataFrames is returned,with keys representing
sheets.

Available Cases

* Defaults to 0 -> 1st sheet as a DataFrame
* 1 -> 2nd sheet as a DataFrame
* "Sheet1" -> 1st sheet as a DataFrame
* [0,1,"Sheet5"] -> 1st,2nd & 5th sheet as a dictionary of DataFrames
* None -> All sheets as a dictionary of DataFrames

header : int,default 0
Row (0-indexed) to use for the column labels of the parsed
DataFrame. If a list of integers is passed those row positions will
be combined into a ``MultiIndex``
skiprows : list-like
Rows to skip at the beginning (0-indexed)
skip_footer : int,default 0
Rows at the end to skip (0-indexed)
index_col : int,default None
Column (0-indexed) to use as the row labels of the DataFrame.
Pass None if there is no such column. If a list is passed,those columns will be combined into a ``MultiIndex``. If a
subset of data is selected with ``parse_cols``,index_col
is based on the subset.
names : array-like,default None
List of column names to use. If file contains no header row,then you should explicitly pass header=None
converters : dict,default None
Dict of functions for converting values in certain columns. Keys can
either be integers or column labels,values are functions that take one
input argument,the Excel cell content,and return the transformed
content.
dtype : Type name or dict of column -> type,default None
Data type for data or columns. E.g. {'a': np.float64,'b': np.int32}
Use `object` to preserve data as stored in Excel and not interpret dtype.
If converters are specified,they will be applied INSTEAD
of dtype conversion.

.. versionadded:: 0.20.0

true_values : list,default None
Values to consider as True

.. versionadded:: 0.19.0

false_values : list,default None
Values to consider as False

.. versionadded:: 0.19.0

parse_cols : int or list,default None
* If None then parse all columns,* If int then indicates last column to be parsed
* If list of ints then indicates list of column numbers to be parsed
* If string then indicates comma separated list of Excel column letters and
column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
both sides.
squeeze : boolean,default False
If the parsed data only contains one column then return a Series
na_values : scalar,str,list-like,or dict,default None
Additional strings to recognize as NA/NaN. If dict passed,specific
per-column NA values. By default the following values are interpreted
as NaN: '','#N/A','#N/A N/A','#NA','-1.#IND','-1.#QNAN','-NaN','-nan','1.#IND','1.#QNAN','N/A','NA','NULL','NaN','nan'.
thousands : str,default None
Thousands separator for parsing string columns to numeric. Note that
this parameter is only necessary for columns stored as TEXT in Excel,any numeric columns will automatically be parsed,regardless of display
format.
keep_default_na : bool,default True
If na_values are specified and keep_default_na is False the default NaN
values are overridden,otherwise they're appended to.
verbose : boolean,default False
Indicate number of NA values placed in non-numeric columns
engine: string,default None
If io is not a buffer or path,this must be set to identify io.
Acceptable values are None or xlrd
convert_float : boolean,default True
convert integral floats to int (i.e.,1.0 --> 1). If False,all numeric
data will be read in as floats: Excel stores all numbers as floats
internally
has_index_names : boolean,default None
DEPRECATED: for version 0.17+ index names will be automatically
inferred based on index_col. To read Excel output from 0.16.2 and
prior that had saved index names,use True.

Returns

to_excel()

儲存函式為pd.DataFrame.to_excel()，注意，必須是DataFrame寫入excel,即Write DataFrame to an excel sheet。其具體引數如下：

to_excel(self,excel_writer,sheet_name='Sheet1',na_rep='',float_format=None,columns=None,header=True,index=True,index_label=None,startrow=0,startcol=0,merge_cells=True,encoding=None,inf_rep='inf',verbose=True,freeze_panes=None)

常用引數解析

- excel_writer : string or ExcelWriter object File path or existing ExcelWriter目標路徑
- sheet_name : string,default ‘Sheet1' Name of sheet which will contain DataFrame,填充excel的第幾頁
- na_rep : string,default ”,Missing data representation 缺失值填充
- float_format : string,default None Format string for floating point numbers
- columns : sequence,optional，Columns to write 選擇輸出的的列。
- header : boolean or list of string,default True Write out column names. If a list of string is given it is assumed to be aliases for the column names
- index : boolean,default True，Write row names (index)
- index_label : string or sequence,default None， Column label for index column(s) if desired. If None is given,andheader and index are True,then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
- startrow :upper left cell row to dump data frame
- startcol :upper left cell column to dump data frame
- engine : string,default None ，write engine to use - you can also set this via the options，io.excel.xlsx.writer,io.excel.xls.writer,andio.excel.xlsm.writer.
- merge_cells : boolean,default True Write MultiIndex and Hierarchical Rows as merged cells.
- encoding: string,default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.
- inf_rep : string,default ‘inf' Representation for infinity (there is no native representation for infinity in Excel)
- freeze_panes : tuple of integer (length 2),default None Specifies the one-based bottommost row and rightmost column that is to be frozen

資料來源：

  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453 

載入資料：
basestation ="F://python/data/test.xls"
basestation_end ="F://python/data/test_end.xls"
data = pd.read_excel(basestation)

（1）引數excel_writer，輸出路徑。

data.to_excel(basestation_end)
輸出：
  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453

（2）sheet_name，將資料儲存在excel的那個sheet頁面。

data.to_excel(basestation_end,sheet_name="sheet2")

（3）na_rep，缺失值填充

data.to_excel(basestation_end,na_rep="NULL")
輸出：
  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453 NULL

(4) colums引數： sequence,optional，Columns to write 選擇輸出的的列。

data.to_excel(basestation_end,columns=["ID"])
輸出
  ID
0  36901
1  36902
2  36903
3  36904
4  36905
5  36906

（5）header 引數： boolean or list of string，預設為True,可以用list命名列的名字。header = False 則不輸出題頭。

data.to_excel(basestation_end,header=["a","d"])
輸出：
  a  b  c  d
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453 


data.to_excel(basestation_end,header=False,columns=["ID"])
header = False 則不輸出題頭
輸出：
0  36901
1  36902
2  36903
3  36904
4  36905
5  36906

（6）index : boolean,default True Write row names (index)

預設為True，顯示index，當index=False 則不顯示行索引（名字）。

index_label : string or sequence,default None

設定索引列的列名。

data.to_excel(basestation_end,index=False)
輸出：
ID NUM-1  NUM-2  NUM-3
36901  142 168 661
36902  78 521 602
36903  144 600 521
36904  95 457 468
36905  69 596 695
36906  165 453 

data.to_excel(basestation_end,index_label=["f"])
輸出：
f  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453

以上就是本文的全部內容，希望對大家的學習有所幫助，也希望大家多多支援我們。

pandas read_excel()和to_excel()函式解析

pandas read_excel()和to_excel()函式解析

python next()和iter()函式原理解析

pandas實現excel中的資料透視表和Vlookup函式功能程式碼

Python urlencode和unquote函式使用例項解析

16-Pandas資料分組的函式應用（apply()、agg()和transform()、applymap()）

Pandas中loc和iloc函式用法詳解（原始碼+例項）

Django Model層F，Q物件和聚合函式原理解析

Pandas中loc和iloc函式的用法

5-Pandas資料分組的函式應用（df.apply()、df.agg()和df.transform()、df.applymap()）

插曲：Kafka的生產者案例和消費者原理解析

Java變數常量宣告和定義原理解析

Java多執行緒併發程式設計和鎖原理解析

springmvc模式的上傳和下載實現解析

java基礎知識 super和this使用解析

細談Mysql的儲存過程和儲存函式

使用SQL語句統計資料時sum和count函式中使用if判斷條件的講解

oracle中的greatest 函式和 least函式示例程式碼

Orancle的SQL語句之多表查詢和組函式

Oracle中的translate函式和replace函式的用法詳解

Oracle生成不重複票號與LPAD,RPAD與NEXTVAL函式解析

pandas read_excel()和to_excel()函式解析

相關推薦