六 Spark API介紹
阿新 • • 發佈:2019-01-02
Spark機器學習,API瀏覽 Spark官方API http://spark.apache.org/docs/1.6.2/api/java/index.html http://spark.apache.org/docs/2.2.0/api/java/index.html 1 RDD的支援,是Spark的基礎,2根據需求來檢視API 一Spark的功能模組 SparkSQL SparkGraphx SparkScreaming SparkML SparkMLLIb 二常用的機器學習的API ml 輸入採用DataFrame(輸入來源於SparkSQL) mllib 輸入引數是普通的RDD(輸入來自於hdfs) 例子userId(使用者ID),productId(產品ID),評分,來推薦給使用者 協同過濾來找到使用者對其它產品感興趣 常用演算法:ALS演算法(最小二乘法) org.apache.spark.ml.recommendation ALS 監督分類: org.apache.spark.mllib.classification, 預先給使用者打上標籤 非監督分類mllib.clustering 裡面也是一樣的方法 KMeans 決策樹 mllib.tree 圖形計算org.apache.spark.graphx org.apache.spark.sql : 我們把資料匯入到mysql中,如何放入到spark中來,然後進行機器學習進行預測統計分析,然後放入到hdfs中去 四API擴充套件 可以從mysql,oracle中讀取資料 org.apache.spark.sql org.apache.spark.sql.api.java org.apache.spark.sql.expressions org.apache.spark.sql.hive org.apache.spark.sql.hive.execution org.apache.spark.sql.jdbc org.apache.spark.sql.sources org.apache.spark.sql.types org.apache.spark.sql.util org.apache.spark.straming相當於我們的流式計算, org.apache.spark.streaming.flume org.apache.spark.streaming.kafka org.apache.spark.streaming.kinesis org.apache.spark.streaming.mqtt org.apache.spark.streaming.receiver org.apache.spark.streaming.scheduler org.apache.spark.streaming.twitter org.apache.spark.streaming.util org.apache.spark.streaming.zeromq ml 輸入採用DataFrame(輸入來源於SparkSQL) org.apache.spark.ml org.apache.spark.ml.attribute org.apache.spark.ml.classification org.apache.spark.ml.clustering org.apache.spark.ml.evaluation org.apache.spark.ml.feature org.apache.spark.ml.param org.apache.spark.ml.recommendation org.apache.spark.ml.regression org.apache.spark.ml.source.libsvm org.apache.spark.ml.tree org.apache.spark.ml.tuning org.apache.spark.ml.util mllib 輸入引數是普通的RDD(輸入來自於hdfs) org.apache.spark.mllib.classification org.apache.spark.mllib.clustering org.apache.spark.mllib.evaluation org.apache.spark.mllib.feature org.apache.spark.mllib.fpm org.apache.spark.mllib.linalg org.apache.spark.mllib.linalg.distributed org.apache.spark.mllib.optimization org.apache.spark.mllib.pmml org.apache.spark.mllib.random org.apache.spark.mllib.rdd org.apache.spark.mllib.recommendation org.apache.spark.mllib.regression org.apache.spark.mllib.stat org.apache.spark.mllib.stat.distribution org.apache.spark.mllib.stat.test org.apache.spark.mllib.tree org.apache.spark.mllib.tree.configuration org.apache.spark.mllib.tree.impurity org.apache.spark.mllib.tree.loss org.apache.spark.mllib.tree.model org.apache.spark.mllib.util