1. 程式人生 > 實用技巧 >分割迭代器Spliterator原始碼文件翻譯

分割迭代器Spliterator原始碼文件翻譯

前言

身體是革命的本錢,不舒服了2周,現在好點了。
學習JDK8的Stream,Spliterator這個分割迭代器是必須要重視的。
Notes:下方藍色文字是自己的翻譯(如果有問題請指正)。黑色文字是源文件。紅色文字是自己的備註。

Spliterator類原始碼文件

用於遍歷和分割一個數據源中的元素的物件。分割迭代器覆蓋的資料來源可以是陣列,集合,IO通道或生成器函式。
An object for traversing and partitioning elements of a source.
The source of elements covered by a Spliterator could be, for example, an array, a Collection, an IO channel, or a generator function.

一個分割迭代器可以單獨地(tryAdvance())或者順序地按塊(forEachRemaining())遍歷元素。
A Spliterator may traverse elements individually (tryAdvance()) or sequentially in bulk (forEachRemaining()).

分割迭代器還可以將其某些元素分割槽(使用trySplit())作為另一個分割迭代器,以用於可能的並行操作中。
A Spliterator may also partition off some of its elements (using trySplit()) as another Spliterator, to be used in possibly-parallel operations.
如果你的分割迭代器的操作使其不能被分割,或者以一種高度不平衡的方式分割,那麼你不會從平行計算中獲益。


Operations using a Spliterator that cannot split, or does so in a highly imbalanced or inefficient manner, are unlikely to benefit from parallelism.
在遍歷和分割元素的整個過程中,每個分割迭代器只能被一個單獨的資料塊使用。
Traversal and splitting exhaust elements; each Spliterator is useful for only a single bulk computation.

一個分割迭代器需要報告一組特性值(characteristics()),特性值是關於資料來源的結構,源,元素的。目前的特性值包含8個:ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, and SUBSIZED.


A Spliterator also reports a set of characteristics() of its structure, source, and elements from among ORDERED, DISTINCT, SORTED, SIZED, NONNULL, IMMUTABLE, CONCURRENT, and SUBSIZED.
這些報告的特性值可能被Spliterator客戶端用來特殊處理:特殊化處理或簡化計算。例如:一個Collection的分割迭代器應該報告SIZED特性;一個Set的分割迭代器應該報告DISTINCT特性;一個SortedSet的分割迭代器應該報告SORTED特性。
這裡希望特殊說明一下:SORTED特性和ORDERED特性。它們代表的含義是不同的。SORTED特性代表的是這個源是被排過序的(如:按年齡大小順序排過序);ORDERED特性代表的是這個源是有序的(如:ArrayList)。
These may be employed by Spliterator clients to control, specialize or simplify computation. For example, a Spliterator for a Collection would report SIZED, a Spliterator for a Set would report DISTINCT, and a Spliterator for a SortedSet would also report SORTED.
特性值被用一組二進位制來表示。一些特性值還限制了方法的行為。例如,如果為ORDERED,則遍歷方法必須符合其記錄的順序。
Characteristics are reported as a simple unioned bit set. Some characteristics additionally constrain method behavior; for example if ORDERED, traversal methods must conform to their documented ordering.
將來可能會定義新的特性,因此實現者不應為沒列出的值分配含義。
New characteristics may be defined in the future, so implementors should not assign meanings to unlisted values.

我們希望不包含IMMUTABLE和CONCURRENT的分割迭代器有一個文件化策略的考量:1.何時分割迭代器繫結到源中的元素?2.繫結資料來源之後,對資料來源的資料結構上的干擾的檢測。
structural interference意思是:源在結構上受到干擾,如:新增、替換或刪除元素。
A Spliterator that does not report IMMUTABLE or CONCURRENT is expected to have a documented policy concerning: when the spliterator binds to the element source; and detection of structural interference of the element source detected after binding.
一個延遲繫結的分割迭代器在第一次遍歷或者第一次分割或第一次查詢資料來源的size的時候會繫結至資料來源,而不是建立分割迭代器的時候繫結上去的。
A late-binding Spliterator binds to the source of elements at the point of first traversal, first split, or first query for estimated size, rather than at the time the Spliterator is created.
一個非延遲繫結的分割迭代器在構造器或任何一個方法首次被呼叫的時候會繫結至資料來源。
A Spliterator that is not late-binding binds to the source of elements at the point of construction or first invocation of any method.
在繫結之前,對元素的修改能在分割迭代器遍歷時反映出來。而繫結拆分器後,如果檢測到資料來源結構有變化,則應盡最大努力丟擲ConcurrentModificationException。分割迭代器這樣的行為稱為fail fast。
Modifications made to the source prior to binding are reflected when the Spliterator is traversed. After binding a Spliterator should, on a best-effort basis, throw ConcurrentModificationException if structural interference is detected. Spliterators that do this are called fail-fast.
分割迭代器對資料塊的迭代方法(forEachRemaining())會優化遍歷,可以優化遍歷的過程並在遍歷完所有元素後,檢測結構變化,而不是一個元素一個元素的檢查,並立即失敗。
The bulk traversal method (forEachRemaining()) of a Spliterator may optimize traversal and check for structural interference after all elements have been traversed, rather than checking per-element and failing immediately.

Spliterators可以通過estimateSize()方法獲取剩餘元素數量的估計值。理想情況下,正如特徵值SIZED所反映的那樣,這個值恰好對應於成功遍歷過程中會遇到的元素數量
Spliterators can provide an estimate of the number of remaining elements via the estimateSize() method. Ideally, as reflected in characteristic SIZED, this value corresponds exactly to the number of elements that would be encountered in a successful traversal.
然而,即使在不完全確認的情況下,估計值(estimated value)對於正在源上執行的操作仍然是有用的,比如幫助確定進一步分割或按順序遍歷剩餘元素是更好的。
However, even when not exactly known, an estimated value value may still be useful to operations being performed on the source, such as helping to determine whether it is preferable to split further or traverse the remaining elements sequentially.

儘管spliterators在並行演算法中有明顯的效用,但它並不期望是執行緒安全的;相反,使用spliterator實現並行演算法應該確保spliterator一次只被一個執行緒使用。這通常很容易通過序列執行緒限制來實現,而序列執行緒限制通常是通過遞迴分解工作的典型並行演算法的自然結果。
Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition.
呼叫trySplit()的執行緒可以將返回的Spliterator移交給另一個執行緒,後者依次遍歷或進一步分割該Spliterator。如果兩個或多個執行緒在同一個spliterator上併發操作,則分割和遍歷的行為是未定義的。如果原始執行緒將一個spliterator交給另一個執行緒進行處理,那麼最好是在tryAdvance()使用任何元素之前進行切換,因為某些保證(例如對於有SIZED特性的spliterators的estimateSize()的準確性)僅在遍歷開始之前有效。
A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.

Spliterator的原始資料型別的實現可用於int、long和double值。Spliterator的子介面中的tryAdvance(java.util.function.Consumer)和forEachRemaining(java.util.function.Consumer)的的預設實現將原始值封裝到它們對應的包裝類的例項中。這種裝箱可能會破壞通過使用原生特化所獲得的任何效能優勢。
這裡的"Spliterator的子介面"指的是Spliterator.java中的子介面:OfDouble,OfInt,OfLong,OfPrimitive
Primitive subtype specializations of Spliterator are provided for int, long, and double values. The subtype default implementations of tryAdvance(java.util.function.Consumer) and forEachRemaining(java.util.function.Consumer) box primitive values to instances of their corresponding wrapper class. Such boxing may undermine any performance advantages gained by using the primitive specializations.
為了避免裝箱,應該使用相應的基於原生型別的方法。例如,Spliterator.OfInt.tryAdvance (java.util.function.IntConsumer)和Spliterator.OfInt.forEachRemaining (java.util.function.IntConsumer)在使用時,應該優先於Spliterator.OfInt.tryAdvance (java.util.function.Consumer)和Spliterator.OfInt.forEachRemaining (java.util.function.Consumer)。
To avoid boxing, the corresponding primitive-based methods should be used. For example, Spliterator.OfInt.tryAdvance(java.util.function.IntConsumer) and Spliterator.OfInt.forEachRemaining(java.util.function.IntConsumer) should be used in preference to Spliterator.OfInt.tryAdvance(java.util.function.Consumer) and Spliterator.OfInt.forEachRemaining(java.util.function.Consumer).
使用基於裝箱的方法tryAdvance()和forEachRemaining()遍歷原生型別不會影響轉換為裝箱值的值所遇到的順序。
Traversal of primitive values using boxing-based methods tryAdvance() and forEachRemaining() does not affect the order in which the values, transformed to boxed values, are encountered.

API說明:
API Note:
與Iterators一樣,Spliterators用於遍歷源的元素。Spliterator API通過支援分割(split)和單元素迭代,被設計為除了順序遍歷之外還支援有效的並行遍歷。此外,通過Spliterator訪問元素的協議被設計為比Iterator施加更小的單個元素開銷,並避免使用單獨的hasNext()和next()方法所帶來的固有競爭。
Spliterators, like Iterators, are for traversing the elements of a source. The Spliterator API was designed to support efficient parallel traversal in addition to sequential traversal, by supporting decomposition as well as single-element iteration. In addition, the protocol for accessing elements via a Spliterator is designed to impose smaller per-element overhead than Iterator, and to avoid the inherent race involved in having separate methods for hasNext() and next().

對於可變的資料來源,如果在Spliterator繫結到其資料來源和遍歷結束之間,源在結構上受到干擾(新增、替換或刪除元素),則可能出現任意的和非確定的行為。例如,在使用java.util.stream時,這種干擾將產生任意的、不確定的結果。
For mutable sources, arbitrary and non-deterministic behavior may occur if the source is structurally interfered with (elements added, replaced, or removed) between the time that the Spliterator binds to its data source and the end of traversal. For example, such interference will produce arbitrary, non-deterministic results when using the java.util.stream framework.

源的結構性干擾可以用下列方法來管理(按可取性遞減的大致順序)
Structural interference of a source can be managed in the following ways (in approximate order of decreasing desirability):

  • 不能從結構上干擾源。
    例如,CopyOnWriteArrayList的一個例項就是一個不可變源。從這個源建立的Spliterator報告IMMUTABLE(不可變的)特性。
    The source cannot be structurally interfered with.
    For example, an instance of CopyOnWriteArrayList is an immutable source. A Spliterator created from the source reports a characteristic of IMMUTABLE.

  • 資料來源物件負責管理併發修改。
    例如,ConcurrentHashMap的key的set是一個支援併發的資料來源。從這個源建立的Spliterator報告CONCURRENT特性。
    The source manages concurrent modifications.
    For example, a key set of a ConcurrentHashMap is a concurrent source. A Spliterator created from the source reports a characteristic of CONCURRENT.

  • 可變源提供一個延遲繫結和快速失敗的Spliterator。
    延遲繫結縮小了干擾會影響計算的視窗;fail-fast以最大的努力檢測到,在遍歷開始後發生了結構干擾,並丟擲了ConcurrentModificationException異常。例如,JDK中的ArrayList和許多其他非併發集合類提供了延遲繫結、快速失敗的spliterator。
    The mutable source provides a late-binding and fail-fast Spliterator.
    Late binding narrows the window during which interference can affect the calculation; fail-fast detects, on a best-effort basis, that structural interference has occurred after traversal has commenced and throws ConcurrentModificationException. For example, ArrayList, and many other non-concurrent Collection classes in the JDK, provide a late-binding, fail-fast spliterator.

  • 可變源提供一個非延遲繫結,但快速失敗的Spliterator。
    該源增加了丟擲ConcurrentModificationException的可能性,因為潛在干擾的視窗更大。
    The mutable source provides a non-late-binding but fail-fast Spliterator.
    The source increases the likelihood of throwing ConcurrentModificationException since the window of potential interference is larger.

  • 可變源提供一個延遲繫結,但非快速失敗的Spliterator。
    在遍歷開始後,由於沒有檢測到干擾,源有可能出現任意的、不確定的行為。
    The mutable source provides a late-binding and non-fail-fast Spliterator.
    The source risks arbitrary, non-deterministic behavior after traversal has commenced since interference is not detected.

  • 可變源提供一個非延遲繫結,且非快速失敗的Spliterator。
    源增加了任意、非確定性行為的風險,因為未受檢測的干擾行為可能在Spliterator構造後發生。
    The mutable source provides a non-late-binding and non-fail-fast Spliterator.
    The source increases the risk of arbitrary, non-deterministic behavior since non-detected interference may occur after construction.

例子,這裡有一個類(除了用於說明之外,沒有其他作用),它維護一個數組,其中實際資料儲存在偶數位置,而不相關的標記資料儲存在奇數位置。它的Spliterator忽略標記資料。
Example. Here is a class (not a very useful one, except for illustration) that maintains an array in which the actual data are held in even locations, and unrelated tag data are held in odd locations. Its Spliterator ignores the tags.

 class TaggedArray<T> {
   private final Object[] elements; // immutable after construction
   TaggedArray(T[] data, Object[] tags) {
     int size = data.length;
     if (tags.length != size) throw new IllegalArgumentException();
     this.elements = new Object[2 * size];
     for (int i = 0, j = 0; i < size; ++i) {
       elements[j++] = data[i];
       elements[j++] = tags[i];
     }
   }

   public Spliterator<T> spliterator() {
     return new TaggedArraySpliterator<>(elements, 0, elements.length);
   }

   static class TaggedArraySpliterator<T> implements Spliterator<T> {
     private final Object[] array;
     private int origin; // current index, advanced on split or traversal
     private final int fence; // one past the greatest index

     TaggedArraySpliterator(Object[] array, int origin, int fence) {
       this.array = array; this.origin = origin; this.fence = fence;
     }

     public void forEachRemaining(Consumer<? super T> action) {
       for (; origin < fence; origin += 2)
         action.accept((T) array[origin]);
     }

     public boolean tryAdvance(Consumer<? super T> action) {
       if (origin < fence) {
         action.accept((T) array[origin]);
         origin += 2;
         return true;
       }
       else // cannot advance
         return false;
     }

     public Spliterator<T> trySplit() {
       int lo = origin; // divide range in half
       int mid = ((lo + fence) >>> 1) & ~1; // force midpoint to be even
       if (lo < mid) { // split out left half
         origin = mid; // reset this Spliterator's origin
         return new TaggedArraySpliterator<>(array, lo, mid);
       }
       else       // too small to split
         return null;
     }

     public long estimateSize() {
       return (long)((fence - origin) / 2);
     }

     public int characteristics() {
       return ORDERED | SIZED | IMMUTABLE | SUBSIZED;
     }
   }
 }

作為一個介紹Spliterator是如何支援平行計算的例子,如java.util.stream包,本例將在平行計算中使用Spliterator,下面是一種實現相關並行forEach的方法,它演示了分離子任務的主要用法,直到估計的工作量足夠小,可以按順序執行為止。
As an example how a parallel computation framework, such as the java.util.stream package, would use Spliterator in a parallel computation, here is one way to implement an associated parallel forEach, that illustrates the primary usage idiom of splitting off subtasks until the estimated amount of work is small enough to perform sequentially.
這裡我們假設子任務的處理順序無關緊要;不同的(forked)任務可以進一步拆分和以未確定的順序併發處理元素。這個例子使用了CountedCompleter;類似的用法也適用於其他並行任務。
Here we assume that the order of processing across subtasks doesn't matter; different (forked) tasks may further split and process elements concurrently in undetermined order. This example uses a CountedCompleter; similar usages apply to other parallel task constructions.

 static <T> void parEach(TaggedArray<T> a, Consumer<T> action) {
   Spliterator<T> s = a.spliterator();
   long targetBatchSize = s.estimateSize() / (ForkJoinPool.getCommonPoolParallelism() * 8);
   new ParEach(null, s, action, targetBatchSize).invoke();
 }

 static class ParEach<T> extends CountedCompleter<Void> {
   final Spliterator<T> spliterator;
   final Consumer<T> action;
   final long targetBatchSize;

   ParEach(ParEach<T> parent, Spliterator<T> spliterator,
           Consumer<T> action, long targetBatchSize) {
     super(parent);
     this.spliterator = spliterator; this.action = action;
     this.targetBatchSize = targetBatchSize;
   }

   public void compute() {
     Spliterator<T> sub;
     while (spliterator.estimateSize() > targetBatchSize &&
            (sub = spliterator.trySplit()) != null) {
       addToPendingCount(1);
       new ParEach<>(this, sub, action, targetBatchSize).fork();
     }
     spliterator.forEachRemaining(action);
     propagateCompletion();
   }
 }

實現說明:
Implementation Note:
如果系統屬性之一的布林值org.openjdk.java.util.stream.tripwire設定為true,那麼在操作原生特化子型別時,如果發生原生型別的裝箱,就會報告診斷警告。
If the boolean system property org.openjdk.java.util.stream.tripwire is set to true then diagnostic warnings are reported if boxing of primitive values occur when operating on primitive subtype specializations.