1. 程式人生 > >Storm Stream grouping

Storm Stream grouping

在Storm中, 開發者可以為上游spout/bolt發射出的tuples指定下游bolt的哪個/哪些task(s)來處理該tuples。這種指定在storm中叫做對stream的分組,即stream grouping,分組方式主要有以下6種

  • Shuffle Grouping 或 None Grouping
  • Fields Grouping
  • All Grouping
  • Global Grouping
  • LocalOrShuffle Grouping
  • Direct Grouping

1. Shuffle Grouping或None Grouping

1.1 定義

    Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

    None grouping: This grouping specifies that you don't care how the stream is grouped. Currently, none groupings are equivalent to shuffle groupings. Eventually though, Storm will push down bolts with none groupings to execute in the same thread as the bolt or spout they subscribe from (when possible).

——官方文件

隨機分組,隨機派發stream裡面的tuple,下游每個bolt均衡接收到上游的tuple。

                                                                 

                                                                                                                                        (圖1)

2. Fields Grouping

2.1 定義

    The stream is partitioned by the fields specified in the grouping. For example, if the stream is grouped by the "user-id" field, tuples with the same "user-id" will always go to the same task, but tuples with different "user-id"'s may go to different tasks.

——官方文件

按欄位分組,比如按userid來分組,具有同樣userid的tuple會被分到相同的bolt,而不同的userid則被分配到不同的bolots。

                                                    

      (圖2)

3. All Grouping

3.1 定義

    The stream is replicated across all the bolt's tasks. Use this grouping with care.

——官方文件

廣播發送,對於每一個tuple,所有的bolts都會收到。

(圖3)

4. Global Grouping

4.1 定義

    The entire stream goes to a single one of the bolt's tasks. Specifically, it goes to the task with the lowest id.

——官方文件

全域性分組,所有tuple被分配到storm中的一個bolt的其中一個task。再具體一點就是分配給id值最低的那個task。

(圖4)

5. LocalOrShuffle Grouping

5.1 定義

    If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks. Otherwise, this acts like a normal shuffle grouping.

——官方文件

如果下游bolt的某些task與上游spout/bolt的某些task執行在同一個worker程序中,那麼上游spout/bolt的這些task所發射的所有tuples均由下游bolt的同進程的tasks來處理;否則,這種分組方式等同於shuffle grouping。

(圖5)

6. Direct Grouping

6.1 定義

    This is a special kind of grouping. A stream grouped this way means that the producer of the tuple decides which task of the consumer will receive this tuple. Direct groupings can only be declared on streams that have been declared as direct streams. Tuples emitted to a direct stream must be emitted using one of the emitDirect methods. A bolt can get the task ids of its consumers by either using the provided TopologyContext or by keeping track of the output of the emit method in OutputCollector (which returns the task ids that the tuple was sent to).

——官方文件

直接分組,用這種分組意味著訊息的傳送者指定優訊息接收者的某個task處理這個訊息,只有被宣告為DirectStream的訊息流可以宣告這種分組方法。而且這種訊息tuple必須使用emitDirect方法來發射。訊息處理者可以通過TopologyContext來獲取處理它的訊息的taskid(OutputCollector.emit方法也會返回taskid)。

(圖6)