聊聊storm trident batch的分流與聚合

阿新 • • 發佈：2018-11-22

序

本文主要研究一下storm trident batch的分流與聚合

例項

        TridentTopology topology = new TridentTopology();
        topology.newStream("spout1", spout)
                .partitionBy(new Fields("user"))
                .partitionAggregate(new Fields("user","score","batchId"),new OriginUserCountAggregator(),new Fields("result" 
,"aggBatchId"))
                .parallelismHint(3)
                .global()
                .aggregate(new Fields("result","aggBatchId"),new AggAgg(),new Fields("agg"))
                .each(new Fields("agg"),new PrintEachFunc(),new Fields())
        ;
複製程式碼

這裡最後構造了3個bolt，分別為b-0、b-1、b-2
b-0主要是partitionAggregate，它的parallelismHint為3

b-1主要是處理CombinerAggregator的init，它的parallelismHint為1，由於它的上游bolt有3個task，因而它的TridentBoltExecutor的tracked.condition.expectedTaskReports為3，它要等到這三個task的聚合資料都到了之後，才能finishBatch
b-2主要是處理CombinerAggregator的combine以及each操作
整個資料流從spout開始的一個batch，到了b-0通過partitionBy分流為3個子batch，到了b-1則聚合了3個子batch之後才finishBatch，到了b-2則在b-1聚合之後的結果在做最後的聚合

log例項

23:22:00.718 [Thread-49-spout-spout1-executor[11 11]] INFO  com.example.demo.trident.batch.DebugFixedBatchSpout - batchId:1,emit:[nickt1, 1]
23:22:00.718 [Thread-49-spout-spout1-executor[11 11]] INFO  com.example.demo.trident.batch.DebugFixedBatchSpout - batchId:1,emit:[nickt2, 1]
23:22:00.718 [Thread-49-spout-spout1-executor[11 11]] INFO  com.example.demo.trident.batch.DebugFixedBatchSpout - batchId:1,emit:[nickt3, 1]
23:22:00.720 [Thread-45-b-0-executor[8 8]] INFO  com.example.demo.trident.OriginUserCountAggregator - null init map, aggBatchId:1:0
23:22:00.720 [Thread-45-b-0-executor[8 8]] INFO  com.example.demo.trident.OriginUserCountAggregator - null aggregate batch:1,tuple:[nickt2, 1, 1]
23:22:00.720 [Thread-45-b-0-executor[8 8]] INFO  com.example.demo.trident.OriginUserCountAggregator - null complete agg batch:1:0,val:{1={nickt2=1}}
23:22:00.722 [Thread-22-b-0-executor[7 7]] INFO  com.example.demo.trident.OriginUserCountAggregator - null init map, aggBatchId:1:0
23:22:00.723 [Thread-29-b-0-executor[6 6]] INFO  com.example.demo.trident.OriginUserCountAggregator - null init map, aggBatchId:1:0
23:22:00.723 [Thread-22-b-0-executor[7 7]] INFO  com.example.demo.trident.OriginUserCountAggregator - null aggregate batch:1,tuple:[nickt1, 1, 1]
23:22:00.723 [Thread-29-b-0-executor[6 6]] INFO  com.example.demo.trident.OriginUserCountAggregator - null aggregate batch:1,tuple:[nickt3, 1, 1]
23:22:00.723 [Thread-22-b-0-executor[7 7]] INFO  com.example.demo.trident.OriginUserCountAggregator - null complete agg batch:1:0,val:{1={nickt1=1}}
23:22:00.723 [Thread-29-b-0-executor[6 6]] INFO  com.example.demo.trident.OriginUserCountAggregator - null complete agg batch:1:0,val:{1={nickt3=1}}
23:22:00.724 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - zero called
23:22:00.724 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - init tuple:[{1={nickt2=1}}, 1:0]
23:22:00.724 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - combine val1:{},val2:{1={nickt2=1}}
23:22:00.726 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - init tuple:[{1={nickt3=1}}, 1:0]
23:22:00.727 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - combine val1:{1={nickt2=1}},val2:{1={nickt3=1}}
23:22:00.728 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - init tuple:[{1={nickt1=1}}, 1:0]
23:22:00.728 [Thread-36-b-1-executor[9 9]] INFO  com.example.demo.trident.AggAgg - combine val1:{1={nickt3=1, nickt2=1}},val2:{1={nickt1=1}}
23:22:00.731 [Thread-31-b-2-executor[10 10]] INFO  com.example.demo.trident.AggAgg - zero called
23:22:00.731 [Thread-31-b-2-executor[10 10]] INFO  com.example.demo.trident.AggAgg - combine val1:{},val2:{1={nickt3=1, nickt2=1, nickt1=1}}
23:22:00.731 [Thread-31-b-2-executor[10 10]] INFO  com.example.demo.trident.PrintEachFunc - null each tuple:[{1={nickt3=1, nickt2=1, nickt1=1}}]
複製程式碼

這裡看到storm的執行緒的命名已經帶上了bolt的命名，比如b-0、b-1、b-2

TridentBoltExecutor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java

    public void execute(Tuple tuple) {
        if(TupleUtils.isTick(tuple)) {
            long now = System.currentTimeMillis();
            if(now - _lastRotate > _messageTimeoutMs) {
                _batches.rotate();
                _lastRotate = now;
            }
            return;
        }
        String batchGroup = _batchGroupIds.get(tuple.getSourceGlobalStreamId());
        if(batchGroup==null) {
            // this is so we can do things like have simple DRPC that doesn't need to use batch processing
            _coordCollector.setCurrBatch(null);
            _bolt.execute(null, tuple);
            _collector.ack(tuple);
            return;
        }
        IBatchID id = (IBatchID) tuple.getValue(0);
        //get transaction id
        //if it already exists and attempt id is greater than the attempt there
        
        
        TrackedBatch tracked = (TrackedBatch) _batches.get(id.getId());
//        if(_batches.size() > 10 && _context.getThisTaskIndex() == 0) {
//            System.out.println("Received in " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()
//                    + " (" + _batches.size() + ")" +
//                    "\ntuple: " + tuple +
//                    "\nwith tracked " + tracked +
//                    "\nwith id " + id + 
//                    "\nwith group " + batchGroup
//                    + "\n");
//            
//        }
        //System.out.println("Num tracked: " + _batches.size() + " " + _context.getThisComponentId() + " " + _context.getThisTaskIndex());
        
        // this code here ensures that only one attempt is ever tracked for a batch, so when
        // failures happen you don't get an explosion in memory usage in the tasks
        if(tracked!=null) {
            if(id.getAttemptId() > tracked.attemptId) {
                _batches.remove(id.getId());
                tracked = null;
            } else if(id.getAttemptId() < tracked.attemptId) {
                // no reason to try to execute a previous attempt than we've already seen
                return;
            }
        }
        
        if(tracked==null) {
            tracked = new TrackedBatch(new BatchInfo(batchGroup, id, _bolt.initBatchState(batchGroup, id)), _coordConditions.get(batchGroup), id.getAttemptId());
            _batches.put(id.getId(), tracked);
        }
        _coordCollector.setCurrBatch(tracked);
        
        //System.out.println("TRACKED: " + tracked + " " + tuple);
        
        TupleType t = getTupleType(tuple, tracked);
        if(t==TupleType.COMMIT) {
            tracked.receivedCommit = true;
            checkFinish(tracked, tuple, t);
        } else if(t==TupleType.COORD) {
            int count = tuple.getInteger(1);
            tracked.reportedTasks++;
            tracked.expectedTupleCount+=count;
            checkFinish(tracked, tuple, t);
        } else {
            tracked.receivedTuples++;
            boolean success = true;
            try {
                _bolt.execute(tracked.info, tuple);
                if(tracked.condition.expectedTaskReports==0) {
                    success = finishBatch(tracked, tuple);
                }
            } catch(FailedException e) {
                failBatch(tracked, e);
            }
            if(success) {
                _collector.ack(tuple);                   
            } else {
                _collector.fail(tuple);
            }
        }
        _coordCollector.setCurrBatch(null);
    }

    private void failBatch(TrackedBatch tracked, FailedException e) {
        if(e!=null && e instanceof ReportedFailedException) {
            _collector.reportError(e);
        }
        tracked.failed = true;
        if(tracked.delayedAck!=null) {
            _collector.fail(tracked.delayedAck);
            tracked.delayedAck = null;
        }
    }

    private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) {
        if(tracked.failed) {
            failBatch(tracked);
            _collector.fail(tuple);
            return;
        }
        CoordCondition cond = tracked.condition;
        boolean delayed = tracked.delayedAck==null &&
                              (cond.commitStream!=null && type==TupleType.COMMIT
                               || cond.commitStream==null);
        if(delayed) {
            tracked.delayedAck = tuple;
        }
        boolean failed = false;
        if(tracked.receivedCommit && tracked.reportedTasks == cond.expectedTaskReports) {
            if(tracked.receivedTuples == tracked.expectedTupleCount) {
                finishBatch(tracked, tuple);                
            } else {
                //TODO: add logging that not all tuples were received
                failBatch(tracked);
                _collector.fail(tuple);
                failed = true;
            }
        }
        
        if(!delayed && !failed) {
            _collector.ack(tuple);
        }
        
    }


複製程式碼

execute方法裡頭在TrackedBatch不存在時會建立一個，建立的時候會呼叫_bolt.initBatchState方法
這裡頭可以看到在接收到正常tuple的時候，先呼叫_bolt.execute(tracked.info, tuple)執行，然後在呼叫_collector的ack，如果_bolt.execute丟擲FailedException，則直接failBatch，它會標記tracked.failed為true，最後在整個batch的tuple收發結束之後呼叫checkFinish，一旦發現有tracked.failed，則會呼叫_collector.fail
這裡的_bolt有兩類，分別是TridentSpoutExecutor與SubtopologyBolt；如果是TridentSpoutExecutor，則tracked.condition.expectedTaskReports為0，這裡每收到一個tuple(實際是發射一個batch的指令)，在_bolt.execute之後就立馬finishBatch；而對於SubtopologyBolt，這裡tracked.condition.expectedTaskReports不為0，需要等到最後的[id,count]指令再checkFinish

TridentSpoutExecutor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/TridentSpoutExecutor.java

    @Override
    public void prepare(Map conf, TopologyContext context, BatchOutputCollector collector) {
        _emitter = _spout.getEmitter(_txStateId, conf, context);
        _collector = new AddIdCollector(_streamName, collector);
    }

    @Override
    public void execute(BatchInfo info, Tuple input) {
        // there won't be a BatchInfo for the success stream
        TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);
        if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
            if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
                ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
                _activeBatches.remove(attempt.getTransactionId());
            } else {
                 throw new FailedException("Received commit for different transaction attempt");
            }
        } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
            // valid to delete before what's been committed since 
            // those batches will never be accessed again
            _activeBatches.headMap(attempt.getTransactionId()).clear();
            _emitter.success(attempt);
        } else {            
            _collector.setBatch(info.batchId);
            _emitter.emitBatch(attempt, input.getValue(1), _collector);
            _activeBatches.put(attempt.getTransactionId(), attempt);
        }
    }

    @Override
    public void finishBatch(BatchInfo batchInfo) {
    }

    @Override
    public Object initBatchState(String batchGroup, Object batchId) {
        return null;
    }
複製程式碼

TridentSpoutExecutor使用的是AddIdCollector，它的initBatchState以及finishBatch方法均為空操作
execute方法分COMMIT_STREAM_ID、SUCCESS_STREAM_ID、普通stream來處理
普通的stream發來的tuple就是發射batch的指令，這裡就呼叫_emitter.emitBatch發射batch的tuples

SubtopologyBolt

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/SubtopologyBolt.java

    @Override
    public Object initBatchState(String batchGroup, Object batchId) {
        ProcessorContext ret = new ProcessorContext(batchId, new Object[_nodes.size()]);
        for(TridentProcessor p: _myTopologicallyOrdered.get(batchGroup)) {
            p.startBatch(ret);
        }
        return ret;
    }

    @Override
    public void execute(BatchInfo batchInfo, Tuple tuple) {
        String sourceStream = tuple.getSourceStreamId();
        InitialReceiver ir = _roots.get(sourceStream);
        if(ir==null) {
            throw new RuntimeException("Received unexpected tuple " + tuple.toString());
        }
        ir.receive((ProcessorContext) batchInfo.state, tuple);
    }

    @Override
    public void finishBatch(BatchInfo batchInfo) {
        for(TridentProcessor p: _myTopologicallyOrdered.get(batchInfo.batchGroup)) {
            p.finishBatch((ProcessorContext) batchInfo.state);
        }
    }

    protected static class InitialReceiver {
        List<TridentProcessor> _receivers = new ArrayList<>();
        RootFactory _factory;
        ProjectionFactory _project;
        String _stream;
        
        public InitialReceiver(String stream, Fields allFields) {
            // TODO: don't want to project for non-batch bolts...???
            // how to distinguish "batch" streams from non-batch streams?
            _stream = stream;
            _factory = new RootFactory(allFields);
            List<String> projected = new ArrayList<>(allFields.toList());
            projected.remove(0);
            _project = new ProjectionFactory(_factory, new Fields(projected));
        }
        
        public void receive(ProcessorContext context, Tuple tuple) {
            TridentTuple t = _project.create(_factory.create(tuple));
            for(TridentProcessor r: _receivers) {
                r.execute(context, _stream, t);
            }            
        }
        
        public void addReceiver(TridentProcessor p) {
            _receivers.add(p);
        }
        
        public Factory getOutputFactory() {
            return _project;
        }
    }
複製程式碼

它的initBatchState方法，會建立ProcessorContext，然後會呼叫TridentProcessor(比如AggregateProcessor、EachProcessor)的startBatch方法
execute方法則呼叫InitialReceiver的execute，而它則是呼叫TridentProcessor的execute方法(比如AggregateProcessor)
finishBatch的時候則是呼叫TridentProcessor(比如AggregateProcessor、EachProcessor)的finishBatch方法

WindowTridentProcessor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/windowing/WindowTridentProcessor.java

    @Override
    public void startBatch(ProcessorContext processorContext) {
        // initialize state for batch
        processorContext.state[tridentContext.getStateIndex()] = new ArrayList<TridentTuple>();
    }

    @Override
    public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) {
        // add tuple to the batch state
        Object state = processorContext.state[tridentContext.getStateIndex()];
        ((List<TridentTuple>) state).add(projection.create(tuple));
    }

    @Override
    public void finishBatch(ProcessorContext processorContext) {

        Object batchId = processorContext.batchId;
        Object batchTxnId = getBatchTxnId(batchId);

        LOG.debug("Received finishBatch of : [{}] ", batchId);
        // get all the tuples in a batch and add it to trident-window-manager
        List<TridentTuple> tuples = (List<TridentTuple>) processorContext.state[tridentContext.getStateIndex()];
        tridentWindowManager.addTuplesBatch(batchId, tuples);

        List<Integer> pendingTriggerIds = null;
        List<String> triggerKeys = new ArrayList<>();
        Iterable<Object> triggerValues = null;

        if (retriedAttempt(batchId)) {
            pendingTriggerIds = (List<Integer>) windowStore.get(inprocessTriggerKey(batchTxnId));
            if (pendingTriggerIds != null) {
                for (Integer pendingTriggerId : pendingTriggerIds) {
                    triggerKeys.add(triggerKey(pendingTriggerId));
                }
                triggerValues = windowStore.get(triggerKeys);
            }
        }

        // if there are no trigger values in earlier attempts or this is a new batch, emit pending triggers.
        if(triggerValues == null) {
            pendingTriggerIds = new ArrayList<>();
            Queue<StoreBasedTridentWindowManager.TriggerResult> pendingTriggers = tridentWindowManager.getPendingTriggers();
            LOG.debug("pending triggers at batch: [{}] and triggers.size: [{}] ", batchId, pendingTriggers.size());
            try {
                Iterator<StoreBasedTridentWindowManager.TriggerResult> pendingTriggersIter = pendingTriggers.iterator();
                List<Object> values = new ArrayList<>();
                StoreBasedTridentWindowManager.TriggerResult triggerResult = null;
                while (pendingTriggersIter.hasNext()) {
                    triggerResult = pendingTriggersIter.next();
                    for (List<Object> aggregatedResult : triggerResult.result) {
                        String triggerKey = triggerKey(triggerResult.id);
                        triggerKeys.add(triggerKey);
                        values.add(aggregatedResult);
                        pendingTriggerIds.add(triggerResult.id);
                    }
                    pendingTriggersIter.remove();
                }
                triggerValues = values;
            } finally {
                // store inprocess triggers of a batch in store for batch retries for any failures
                if (!pendingTriggerIds.isEmpty()) {
                    windowStore.put(inprocessTriggerKey(batchTxnId), pendingTriggerIds);
                }
            }
        }

        collector.setContext(processorContext);
        int i = 0;
        for (Object resultValue : triggerValues) {
            collector.emit(new ConsList(new TriggerInfo(windowTaskId, pendingTriggerIds.get(i++)), (List<Object>) resultValue));
        }
        collector.setContext(null);
    }
複製程式碼

可以看到WindowTridentProcessor在startBatch的時候，給processorContext.state[tridentContext.getStateIndex()]重新new了一個list
在execute的時候，將接收到的tuple存到processorContext.state[tridentContext.getStateIndex()]中
在finishBatch的時候，將processorContext.state[tridentContext.getStateIndex()]的資料新增到windowStore以及windowManager的ConcurrentLinkedQueue中
window的trigger會從ConcurrentLinkedQueue取出視窗資料，新增到pendingTriggers中；而WindowTridentProcessor在finishBatch的時候，會移除pendingTriggers的資料，然後通過FreshCollector進行emit
通過FreshCollector發射出來的資料，會被它的TupleReceiver接收處理(比如ProjectedProcessor、PartitionPersistProcessor)，PartitionPersistProcessor就是將資料存到state中，而ProjectedProcessor則根據window的outputFields提取欄位，然後將資料傳遞給下游的各種processor，比如EachProcessor

小結

trident spout發射一個batch的資料，然後等待下游執行完這個batch資料就會按batch來finishBatch；對於bolt與bolt來說，之間tuple的ack間隔取決於每個tuple的處理時間(TridentBoltExecutor會在tuple處理完之後自動幫你進行ack)，如果整體處理時間過長，會導致整個topology的tuple處理超時，觸發spout的fail操作，這個時候就會重新觸發該batchId，如果spout是transactional的，那麼batchId對應的tuples在重新觸發時不變
window操作會打亂trident spout原始的batch，一個batch的資料先是累積在ProcessContext的state中(WindowTridentProcessor每次在startBatch的時候都會重置state)中，在finishBatch的時候，將資料拷貝到windowStore以及windowManager的ConcurrentLinkedQueue，之後等待window的trigger觸發，計算出視窗資料，然後放到pendingTriggers中，而在bolt finishBatch的時候是從pendingTriggers移除視窗資料，然後交給FreshCollector然後給到下游的processor處理，而下游的processor的startBatch及finishBatch時跟隨原始的spout的節奏來的，而非window來觸發
假設資料來源源不斷，那麼spout傳送batch的速度取決於Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS(topology.trident.batch.emit.interval.millis，在defaults.yaml預設為500)引數，而視窗的interval通常一般比預設的batch interval要大，這個樣子window就會聚合多個batch的資料；同時由於前面finishBatch的時候，才把資料新增到windowManager的ConcurrentLinkedQueue，因而這個時候的pendingTriggers還沒有資料，因而通常前面幾次finishBatch的時候從視窗獲取的資料為空，因而後續的processor也沒有資料處理，要注意判空防止出現空指標
如果對資料進行groupBy/partitionBy，當parallelism為1時，這個時候groupBy/partitionBy是按batch來的；當parallelism大於1時，原始的spout在emit一個batch的時候，會分發到多個partition/task，原始batch的資料流就被分流了，每個task自己處理完資料之後就執行各自的finishBatch操作(tuple按emit的順序來，最後一個是[id,count]，它就相當於結束batch的指令，用於檢測及觸發完成batch操作)，然後將新batch的資料傳送給下游，新的batch傳送完的時候傳送[id,cout]，依次在下游bolt進行batch操作；global操作將資料分發到同一個partition/task；batchGlobal在parallelism為1的時候效果跟global一樣，在parallelism大於1時，就按batchId將資料分發到不同的partition/task
aggregate操作用於聚合資料，一般配合groupBy或partitionBy，會對上游的batch再次進行分流，然後按分流後的batch來aggregate；這個時候如果parallelism大於1，則是分task來進行aggregate，之後還想把這些聚合在一起的話，可以配合global().aggregate()操作；只要中間沒有window操作，那麼還是會按原始的batch來最後aggregate的，因為TridentBoltExecutor的tracked.condition.expectedTaskReports記錄了該bolt需要等到哪幾個task彙報[id,count]，在接收[id,count]資料的時候，會先判斷tracked.reportedTasks是否等於cond.expectedTaskReports，相等之後再判斷tracked.receivedTuples是否等於tracked.expectedTupleCount，相等才能進行finishBatch，完成當前batch，然後向下遊發射[id,count]資料；通過expectedTaskReports的判斷，是的整個batch在經過多個task分流處理之後最後還能按原始的batch聚合在一起；不過要注意window操作會在window階段打亂trident spout原始的batch

doc

聊聊storm trident batch的分流與聚合

序本文主要研究一下storm trident batch的分流與聚合例項 TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout)

聊聊storm worker的executor與task

序本文主要研究一下storm worker的executor與task Worker storm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java public static void

聊聊storm trident spout的_maxTransactionActive

序本文主要研究一下storm trident spout的_maxTransactionActive MasterBatchCoordinator storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCo

storm trident 一個batch多大

conf increase trident eas example part broker 有一個 storm You can increase the batch size by changing "tridentKafkaConfig.fetchSizeBytes" p

storm trident 如何標記一個batch被處理——coordinator spout

tuples google setting IT sem com for oge ack Splitting a stream has no effect on the batch. If you join the stream back together, then ye

maven(二) maven項目構建ssh工程(父工程與子模塊的拆分與聚合)

子模塊 mbo warnings 找不到 .cn scope spl template opened 　　　　　　　　前一節我們明白了maven是個什麽玩意，這一節就來講講他的一個重要的應用場景，也就是通過maven將一個ssh項目分割為不同的幾個部分獨立開發，很重要，加油

storm trident merger

pos nds ride art func con clas meger tin import java.util.List; import backtype.storm.Config; import backtype.storm.LocalClus

JavaScript繼承與聚合

color 轉換是否 bject new json method 一個人 clas 一，繼承第一種方式：類與被繼承類直接耦合度高 1，首先，準備一個可以被繼承的類（父類），例如 //創建一個人員類 function Person(name) {//現在P

Storm Trident狀態

分享機制不知道變化 stat prev 批次更多如果　　Trident中有對狀態數據進行讀取和寫入操作的一流抽象工具。狀態既可以保存在拓撲內部，比如保存在內容中並由HDFS存儲，也可以通過外部存儲（比如Memcached或Cassandra）存儲在數據庫中。而對

【15】group by子句與聚合函數

字段 min sum count() span bsp 學生空值查詢 1.group by簡介 -> 使用group by子句可以將數據分組-> 語法group by 字段-> 註意-> 查詢中只允許出現分組的字段或聚合函數-> 分組查詢中

Storm Trident示例shuffle&parallelismHint

大並發 extends bool obj 輸出 bsp shuf shu private 本例包括Storm Trident中shuffle與parallelismHint的使用。代碼當中包括註釋 import java.util.Date; import java

Storm Trident示例partitionBy

fields number val orm 不同 col tails top b- 如下代碼使用partitionBy做repartition, partitionBy即根據相應字段的值按一定算法，把tuple分配到目標partition當中（Target Partitio

Storm Trident示例function, filter, projection

部分 tin keep class top collect storm topo .get 以下代碼演示function, filter, projection的使用，可結合註釋省略部分代碼，省略部分可參考：https://blog.csdn.net/nickta/art

Storm Trident示例ReducerAggregator

bug thread 一個 fields pan part 分區合並 use core ReducerAggregator首先在輸入流上運行全局重新分區操作(global)將同一批次的所有分區合並到一個分區中，然後在每個批次上運行的聚合功能，針對Batch操作。省略部

Storm Trident示例Aggregator

lds 分代 pos lob integer 所有 body AD news Aggregator首先在輸入流上運行全局重新分區操作(global)將同一批次的所有分區合並到一個分區中，然後在每個批次上運行的聚合功能，針對Batch操作。與ReduceAggregator很

storm trident State

storm AC 支持 updater ide ID 更新數據減少調用 State 是用來管理從數據存儲中查詢數據(使用batch中的tuple作為輸入來查詢) 插入和更新數據(把batch中的tuple更新或者插入到數據存儲) 裏面涉及到事務管理對於，數據存儲

storm trident 消息成功處理

都是 timeout ide 進行 play shuffle 並行消息 trident trident裏面 batch會被緩存，這樣失敗了可以重新發送多個batch可以並行被process，但是commit是嚴格按照txid順序來執行一個batch的狀態會存在zk裏

storm trident 事務和 spout和state有關

batch 寫到 eval BE prev storm 數據基類是否首先spout有三種：這些關系到相同的batchid裏面是否包含相同的tuple 事務性：相同模糊事務性：如果取不到原來的，則拿新的無事務：不一定所以只有事務性才能做到一個tuple唯一一次處理

轉帖：maven(二) maven項目構建ssh工程(父工程與子模塊的拆分與聚合)

圖片做的 bsp IT 是個 pan 有一種 junit img 出處：http://www.cnblogs.com/whgk/p/7121336.html 前一節我們明白了maven是個什麽玩意，這一節就來講講他的一個重要的應用場景，也就是通過maven將一個ssh

Oracle_SQL(2) 分組與聚合函數

select order by avi emp 針對 null distinct 排序 dep 一、聚合函數1.定義：對表或視圖的查詢時，針對多行記錄只返回一個值的函數。2.用途：用於select語句，HAVING條件二、5種聚合函數1.SUM(n) 對列求和 sele

聊聊storm trident batch的分流與聚合

序

例項

TridentBoltExecutor

TridentSpoutExecutor

SubtopologyBolt

WindowTridentProcessor

小結

doc

相關推薦