1. 程式人生 > >【spark,RDD,1】彈性分散式資料集RDD介紹

【spark,RDD,1】彈性分散式資料集RDD介紹

scala> val rdd = sc.textFile("hdfs://yarn1:8020/hmbbs_logs/access_2013_05_31.log") 16/04/27 21:45:41 INFO MemoryStore: ensureFreeSpace(219256) called with curMem=0, maxMem=311387750 16/04/27 21:45:41 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 214.1 KB, free 296.8 MB) rdd: org.apache.spark.rdd.RDD[String] = MappedRDD[4] at textFile at <console>:12 #通過依賴關係找到原始的rdd scala> val hadoopRDD = rdd.dependencies(0).rdd hadoopRDD: org.apache.spark.rdd.RDD[_] = HadoopRDD[3] at textFile at <console>:12 #hadoopRDD分割槽個數
scala> hadoopRDD.partitions.size 16/04/27 21:46:35 INFO FileInputFormat: Total input paths to process : 1 16/04/27 21:46:35 INFO NetworkTopology: Adding a new node: /default/192.168.1.64:50010 16/04/27 21:46:35 INFO NetworkTopology: Adding a new node: /default/192.168.1.63:50010 16/04/27 21:46:35 INFO NetworkTopology: Adding a new node: /default/192.168.1.62:50010 res7: Int = 2 返回第一分割槽所在伺服器
scala> hadoopRDD.preferredLocations(hadoopRDD.partitions(0)) res9: Seq[String] = WrappedArray(yarn4, yarn3, yarn2)