1. 程式人生 > >np.nan is an invalid document, expected byte or unicode string.

np.nan is an invalid document, expected byte or unicode string.

last () read form print get cte ipy document

ValueError                                Traceback (most recent call last)
<ipython-input-12-1dc462ae8893> in <module>()
     15     print(‘cv prepared!‘)
     16     return df_x.astype(np.float64)
---> 17 df_test = get_feature(test_data,all_table,ready_cols,vec_col)
     18 df_train = get_feature(train_data,all_table,ready_cols,vec_col)

<ipython-input-12-1dc462ae8893> in get_feature(df, all_data, cols, vec_col)
      9     cv=CountVectorizer()
     10     for feature in vec_col:
---> 11         cv.fit(all_data[feature])
     12         df_a = cv.transform(df[feature])
     13         df_x = sparse.hstack((df_x, df_a))

def get_feature(df,all_data,cols,vec_col):
  enc = OneHotEncoder()
  df_x=np.int64(df[cols])
  cv=CountVectorizer()
  for feature in vec_col:
    cv.fit(all_data[feature])
    df_a = cv.transform(df[feature])
    df_x = sparse.hstack((df_x, df_a))
    print(‘Done Feature ‘+ str(feature))
  print(‘cv prepared!‘)
  return df_x.astype(np.float64)

原因分析:我的all_data中存在nan的數據,我在數據讀入的時候使用了all_table.fillna(-1),我理解只會填充空值,但是all_table中原本為nan的值,不會改變。改為all_table.fillna(-1),可執行。

np.nan is an invalid document, expected byte or unicode string.