降維例項之主成分分析
阿新 • • 發佈:2018-12-25
資料集來源:https://www.kaggle.com/psparks/instacart-market-basket-analysis
思路:
例項程式碼:
import pandas as pd from sklearn.decomposition import PCA def main(): ''' 降維例項:主成分分析 :return: None ''' # 讀取資料 prior = pd.read_csv("order_products__prior.csv") products= pd.read_csv("products.csv") orders = pd.read_csv("orders.csv") aisles = pd.read_csv("aisles.csv") # 合併資料 _mg = pd.merge(prior, products, on=['product_id', 'product_id']) _mg = pd.merge(_mg, orders, on=['order_id', 'order_id']) mt = pd.merge(_mg, aisles, on=['aisle_id', 'aisle_id']) # print(mt.head(10)) # 交叉表 cross = pd.crosstab(mt['user_id'], mt['aisle']) # print(cross) pca = PCA(n_components=0.9) data = pca.fit_transform(cross) print(data) print(data.shape) return None if __name__ == '__main__': main()
執行結果:
從結果中可以看出資料的維數降到了27