1. 程式人生 > >RDD的5大特性

RDD的5大特性

RDD5大特點

 * Internally, each RDD is characterized by five main properties:
 *
 *  - A list of partitions
 *  - A function for computing each split
 *  - A list of dependencies on other RDDs
 *  - Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
 *  - Optionally, a list of preferred locations to compute each split on (e.g. block locations for
 *    an HDFS file)

特點一

A list of partitions

RDD的基本構成是有partitions構成的,

特點二

A function for computing each split

對每個分割槽都是用相通的函式記性計算

特點三

A list of dependencies on other RDDs
  • RDD之間是有依賴關係的;
    • RDDA==>RDDB==>RDDC==>RDDD
  • 這幾個RDD之間都是有相互依賴的關係,**假設RDDC資料丟失了,可以讓RDDB重新計算給RDDC,**這就是RDD彈性的體現,

特點四

Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)

分割槽的時候,預設是對key做hash 進行分發

特點五

 Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file

資料本地化計算,資料在哪裡我們的計算的task就應該在哪裡計算,這樣效能最好

RDD的5個特性,都體現出來了RDD(Resilient Distributed Dataset )