RDD的5大特性
阿新 • • 發佈:2018-11-17
RDD5大特點
* Internally, each RDD is characterized by five main properties: * * - A list of partitions * - A function for computing each split * - A list of dependencies on other RDDs * - Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned) * - Optionally, a list of preferred locations to compute each split on (e.g. block locations for * an HDFS file)
特點一
A list of partitions
RDD的基本構成是有partitions構成的,
特點二
A function for computing each split
對每個分割槽都是用相通的函式記性計算
特點三
A list of dependencies on other RDDs
- RDD之間是有依賴關係的;
- RDDA==>RDDB==>RDDC==>RDDD
- 這幾個RDD之間都是有相互依賴的關係,**假設RDDC資料丟失了,可以讓RDDB重新計算給RDDC,**這就是RDD彈性的體現,
特點四
Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
分割槽的時候,預設是對key做hash 進行分發
特點五
Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file
資料本地化計算,資料在哪裡我們的計算的task就應該在哪裡計算,這樣效能最好
RDD的5個特性,都體現出來了RDD(Resilient Distributed Dataset )