聊聊storm trident spout的_maxTransactionActive

阿新 • • 發佈：2018-11-22

序

本文主要研究一下storm trident spout的_maxTransactionActive

MasterBatchCoordinator

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCoordinator.java


   TreeMap<Long, TransactionStatus> _activeTx = new TreeMap<Long, TransactionStatus>();

   public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        _throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1);
        for 
(String spoutId: _managedSpoutIds) {
            _states.add(TransactionalState.newCoordinatorState(conf, spoutId));
        }
        _currTransaction = getStoredCurrTransaction();

        _collector = collector;
        Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING);
        if 
(active==null) {
            _maxTransactionActive = 1;
        } else {
            _maxTransactionActive = active.intValue();
        }
        _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive);

        
        for(int i=0; i<_spouts.size(); i++) {
            String txId = _managedSpoutIds.get(i);
            _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context));
        }
        LOG.debug("Opened {}" 
, this);
    }

   public void nextTuple() {
        sync();
    }

    private void sync() {
        // note that sometimes the tuples active may be less than max_spout_pending, e.g.
        // max_spout_pending = 3
        // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet),
        // and there won't be a batch for tx 4 because there's max_spout_pending tx active
        TransactionStatus maybeCommit = _activeTx.get(_currTransaction);
        if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {
            maybeCommit.status = AttemptStatus.COMMITTING;
            _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);
            LOG.debug("Emitted on [stream = {}], [tx_status = {}], [{}]", COMMIT_STREAM_ID, maybeCommit, this);
        }
        
        if(_active) {
            if(_activeTx.size() < _maxTransactionActive) {
                Long curr = _currTransaction;
                for(int i=0; i<_maxTransactionActive; i++) {
                    if(!_activeTx.containsKey(curr) && isReady(curr)) {
                        // by using a monotonically increasing attempt id, downstream tasks
                        // can be memory efficient by clearing out state for old attempts
                        // as soon as they see a higher attempt id for a transaction
                        Integer attemptId = _attemptIds.get(curr);
                        if(attemptId==null) {
                            attemptId = 0;
                        } else {
                            attemptId++;
                        }
                        _attemptIds.put(curr, attemptId);
                        for(TransactionalState state: _states) {
                            state.setData(CURRENT_ATTEMPTS, _attemptIds);
                        }
                        
                        TransactionAttempt attempt = new TransactionAttempt(curr, attemptId);
                        final TransactionStatus newTransactionStatus = new TransactionStatus(attempt);
                        _activeTx.put(curr, newTransactionStatus);
                        _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);
                        LOG.debug("Emitted on [stream = {}], [tx_attempt = {}], [tx_status = {}], [{}]", BATCH_STREAM_ID, attempt, newTransactionStatus, this);
                        _throttler.markEvent();
                    }
                    curr = nextTransactionId(curr);
                }
            }
        }
    }

    private static class TransactionStatus {
        TransactionAttempt attempt;
        AttemptStatus status;
        
        public TransactionStatus(TransactionAttempt attempt) {
            this.attempt = attempt;
            this.status = AttemptStatus.PROCESSING;
        }

        @Override
        public String toString() {
            return attempt.toString() + " <" + status.toString() + ">";
        }        
    }

    private static enum AttemptStatus {
        PROCESSING,
        PROCESSED,
        COMMITTING
    }
複製程式碼

MasterBatchCoordinator在open方法對_maxTransactionActive進行設定，從Config.TOPOLOGY_MAX_SPOUT_PENDING(topology.max.spout.pending)，配置檔案預設為null，這裡在該值為null時設定_maxTransactionActive為1
nextTuple這裡對同時處理的batches的數量進行了控制，只有_activeTx中的batches處理成功或失敗之後才能繼續下一個batch
_activeTx是一個treeMap，它以transactionId為key，value是TransactionStatus，它裡頭包含了TransactionAttempt及AttemptStatus；AttemptStatus有三種狀態，分別是PROCESSING、PROCESSED、COMMITTING

TridentSpoutCoordinator

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/TridentSpoutCoordinator.java

    RotatingTransactionalState _state;

    public void prepare(Map conf, TopologyContext context) {
        _coord = _spout.getCoordinator(_id, conf, context);
        _underlyingState = TransactionalState.newCoordinatorState(conf, _id);
        _state = new RotatingTransactionalState(_underlyingState, META_DIR);
    }

    public void execute(Tuple tuple, BasicOutputCollector collector) {
        TransactionAttempt attempt = (TransactionAttempt) tuple.getValue(0);

        if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
            _state.cleanupBefore(attempt.getTransactionId());
            _coord.success(attempt.getTransactionId());
        } else {
            long txid = attempt.getTransactionId();
            Object prevMeta = _state.getPreviousState(txid);
            Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid));
            _state.overrideState(txid, meta);
            collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));
        }
                
    }
複製程式碼

TridentSpoutCoordinator的execute方法按txid來存取meta，之後往TridentBoltExecutor發射Values(attempt, meta)

TridentBoltExecutor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/TridentBoltExecutor.java


    RotatingMap<Object, TrackedBatch> _batches;


    public void execute(Tuple tuple) {
        if(TupleUtils.isTick(tuple)) {
            long now = System.currentTimeMillis();
            if(now - _lastRotate > _messageTimeoutMs) {
                _batches.rotate();
                _lastRotate = now;
            }
            return;
        }
        String batchGroup = _batchGroupIds.get(tuple.getSourceGlobalStreamId());
        if(batchGroup==null) {
            // this is so we can do things like have simple DRPC that doesn't need to use batch processing
            _coordCollector.setCurrBatch(null);
            _bolt.execute(null, tuple);
            _collector.ack(tuple);
            return;
        }
        IBatchID id = (IBatchID) tuple.getValue(0);
        //get transaction id
        //if it already exists and attempt id is greater than the attempt there
        
        
        TrackedBatch tracked = (TrackedBatch) _batches.get(id.getId());
//        if(_batches.size() > 10 && _context.getThisTaskIndex() == 0) {
//            System.out.println("Received in " + _context.getThisComponentId() + " " + _context.getThisTaskIndex()
//                    + " (" + _batches.size() + ")" +
//                    "\ntuple: " + tuple +
//                    "\nwith tracked " + tracked +
//                    "\nwith id " + id + 
//                    "\nwith group " + batchGroup
//                    + "\n");
//            
//        }
        //System.out.println("Num tracked: " + _batches.size() + " " + _context.getThisComponentId() + " " + _context.getThisTaskIndex());
        
        // this code here ensures that only one attempt is ever tracked for a batch, so when
        // failures happen you don't get an explosion in memory usage in the tasks
        if(tracked!=null) {
            if(id.getAttemptId() > tracked.attemptId) {
                _batches.remove(id.getId());
                tracked = null;
            } else if(id.getAttemptId() < tracked.attemptId) {
                // no reason to try to execute a previous attempt than we've already seen
                return;
            }
        }
        
        if(tracked==null) {
            tracked = new TrackedBatch(new BatchInfo(batchGroup, id, _bolt.initBatchState(batchGroup, id)), _coordConditions.get(batchGroup), id.getAttemptId());
            _batches.put(id.getId(), tracked);
        }
        _coordCollector.setCurrBatch(tracked);
        
        //System.out.println("TRACKED: " + tracked + " " + tuple);
        
        TupleType t = getTupleType(tuple, tracked);
        if(t==TupleType.COMMIT) {
            tracked.receivedCommit = true;
            checkFinish(tracked, tuple, t);
        } else if(t==TupleType.COORD) {
            int count = tuple.getInteger(1);
            tracked.reportedTasks++;
            tracked.expectedTupleCount+=count;
            checkFinish(tracked, tuple, t);
        } else {
            tracked.receivedTuples++;
            boolean success = true;
            try {
                _bolt.execute(tracked.info, tuple);
                if(tracked.condition.expectedTaskReports==0) {
                    success = finishBatch(tracked, tuple);
                }
            } catch(FailedException e) {
                failBatch(tracked, e);
            }
            if(success) {
                _collector.ack(tuple);                   
            } else {
                _collector.fail(tuple);
            }
        }
        _coordCollector.setCurrBatch(null);
    }

    public static class TrackedBatch {
        int attemptId;
        BatchInfo info;
        CoordCondition condition;
        int reportedTasks = 0;
        int expectedTupleCount = 0;
        int receivedTuples = 0;
        Map<Integer, Integer> taskEmittedTuples = new HashMap<>();
        boolean failed = false;
        boolean receivedCommit;
        Tuple delayedAck = null;
        
        public TrackedBatch(BatchInfo info, CoordCondition condition, int attemptId) {
            this.info = info;
            this.condition = condition;
            this.attemptId = attemptId;
            receivedCommit = condition.commitStream == null;
        }

        @Override
        public String toString() {
            return ToStringBuilder.reflectionToString(this);
        }        
    }

    private void checkFinish(TrackedBatch tracked, Tuple tuple, TupleType type) {
        if(tracked.failed) {
            failBatch(tracked);
            _collector.fail(tuple);
            return;
        }
        CoordCondition cond = tracked.condition;
        boolean delayed = tracked.delayedAck==null &&
                              (cond.commitStream!=null && type==TupleType.COMMIT
                               || cond.commitStream==null);
        if(delayed) {
            tracked.delayedAck = tuple;
        }
        boolean failed = false;
        if(tracked.receivedCommit && tracked.reportedTasks == cond.expectedTaskReports) {
            if(tracked.receivedTuples == tracked.expectedTupleCount) {
                finishBatch(tracked, tuple);                
            } else {
                //TODO: add logging that not all tuples were received
                failBatch(tracked);
                _collector.fail(tuple);
                failed = true;
            }
        }
        
        if(!delayed && !failed) {
            _collector.ack(tuple);
        }
        
    }

    private boolean finishBatch(TrackedBatch tracked, Tuple finishTuple) {
        boolean success = true;
        try {
            _bolt.finishBatch(tracked.info);
            String stream = COORD_STREAM(tracked.info.batchGroup);
            for(Integer task: tracked.condition.targetTasks) {
                _collector.emitDirect(task, stream, finishTuple, new Values(tracked.info.batchId, Utils.get(tracked.taskEmittedTuples, task, 0)));
            }
            if(tracked.delayedAck!=null) {
                _collector.ack(tracked.delayedAck);
                tracked.delayedAck = null;
            }
        } catch(FailedException e) {
            failBatch(tracked, e);
            success = false;
        }
        _batches.remove(tracked.info.batchId.getId());
        return success;
    }

    private void failBatch(TrackedBatch tracked, FailedException e) {
        if(e!=null && e instanceof ReportedFailedException) {
            _collector.reportError(e);
        }
        tracked.failed = true;
        if(tracked.delayedAck!=null) {
            _collector.fail(tracked.delayedAck);
            tracked.delayedAck = null;
        }
    }
複製程式碼

TridentBoltExecutor使用RotatingMap(_batches)來存放batch的資訊，key為txid，而valute為TrackedBatch
在呼叫_bolt.execute(tracked.info, tuple)方法時，傳遞了BatchInfo，它裡頭的state值為_bolt.initBatchState(batchGroup, id)，通過_bolt的initBatchState得來的，這是在第一次_batches裡頭沒有該txid資訊的時候，第一次建立的時候呼叫
這裡的checkFinish也是根據batch對應的TrackedBatch資訊來進行判斷的；finishBatch的時候會呼叫_bolt.finishBatch(tracked.info)，傳遞batchInfo過去；failBatch也是對batch對應的TrackedBatch進行操作

BatchInfo

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/BatchInfo.java

public class BatchInfo {
    public IBatchID batchId;
    public Object state;
    public String batchGroup;
    
    public BatchInfo(String batchGroup, IBatchID batchId, Object state) {
        this.batchGroup = batchGroup;
        this.batchId = batchId;
        this.state = state;
    }
}
複製程式碼

BatchInfo裡頭包含了batchId，state以及batchGroup資訊

TridentSpoutExecutor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/TridentSpoutExecutor.java

    public Object initBatchState(String batchGroup, Object batchId) {
        return null;
    }

    public void execute(BatchInfo info, Tuple input) {
        // there won't be a BatchInfo for the success stream
        TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);
        if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {
            if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {
                ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);
                _activeBatches.remove(attempt.getTransactionId());
            } else {
                 throw new FailedException("Received commit for different transaction attempt");
            }
        } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {
            // valid to delete before what's been committed since 
            // those batches will never be accessed again
            _activeBatches.headMap(attempt.getTransactionId()).clear();
            _emitter.success(attempt);
        } else {            
            _collector.setBatch(info.batchId);
            _emitter.emitBatch(attempt, input.getValue(1), _collector);
            _activeBatches.put(attempt.getTransactionId(), attempt);
        }
    }

    public void finishBatch(BatchInfo batchInfo) {
    }
複製程式碼

TridentSpoutExecutor的execute方法，也是根據txid來區分各自batch的資訊

SubtopologyBolt

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/SubtopologyBolt.java

    public Object initBatchState(String batchGroup, Object batchId) {
        ProcessorContext ret = new ProcessorContext(batchId, new Object[_nodes.size()]);
        for(TridentProcessor p: _myTopologicallyOrdered.get(batchGroup)) {
            p.startBatch(ret);
        }
        return ret;
    }

    public void execute(BatchInfo batchInfo, Tuple tuple) {
        String sourceStream = tuple.getSourceStreamId();
        InitialReceiver ir = _roots.get(sourceStream);
        if(ir==null) {
            throw new RuntimeException("Received unexpected tuple " + tuple.toString());
        }
        ir.receive((ProcessorContext) batchInfo.state, tuple);
    }

    public void finishBatch(BatchInfo batchInfo) {
        for(TridentProcessor p: _myTopologicallyOrdered.get(batchInfo.batchGroup)) {
            p.finishBatch((ProcessorContext) batchInfo.state);
        }
    }

    protected static class InitialReceiver {
        List<TridentProcessor> _receivers = new ArrayList<>();
        RootFactory _factory;
        ProjectionFactory _project;
        String _stream;
        
        public InitialReceiver(String stream, Fields allFields) {
            // TODO: don't want to project for non-batch bolts...???
            // how to distinguish "batch" streams from non-batch streams?
            _stream = stream;
            _factory = new RootFactory(allFields);
            List<String> projected = new ArrayList<>(allFields.toList());
            projected.remove(0);
            _project = new ProjectionFactory(_factory, new Fields(projected));
        }
        
        public void receive(ProcessorContext context, Tuple tuple) {
            TridentTuple t = _project.create(_factory.create(tuple));
            for(TridentProcessor r: _receivers) {
                r.execute(context, _stream, t);
            }            
        }
        
        public void addReceiver(TridentProcessor p) {
            _receivers.add(p);
        }
        
        public Factory getOutputFactory() {
            return _project;
        }
    }
複製程式碼

SubtopologyBolt在initBatchState的時候，建立ProcessorContext的也是帶有batchId的標識，這樣子不同的batch並行的話，它們的ProcessorContext也是區分開來的
execute方法使用的是各自batch的ProcessorContext(batchInfo.state)，呼叫TridentProcessor的execute方法，使用的是各自batch的ProcessorContext
finishBatch方法也一樣，將(ProcessorContext) batchInfo.state傳遞給TridentProcessor.finishBatch

AggregateProcessor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/AggregateProcessor.java

    public void startBatch(ProcessorContext processorContext) {
        _collector.setContext(processorContext);
        processorContext.state[_context.getStateIndex()] = _agg.init(processorContext.batchId, _collector);
    }    

    public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) {
        _collector.setContext(processorContext);
        _agg.aggregate(processorContext.state[_context.getStateIndex()], _projection.create(tuple), _collector);
    }
    
    public void finishBatch(ProcessorContext processorContext) {
        _collector.setContext(processorContext);
        _agg.complete(processorContext.state[_context.getStateIndex()], _collector);
    }
複製程式碼

AggregateProcessor的startBatch、execute、finishBatch方法都使用了ProcessorContext的state，而該ProcessorContext從SubtopologyBolt傳遞過來的就是區分batch的

EachProcessor

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/EachProcessor.java

    public void prepare(Map conf, TopologyContext context, TridentContext tridentContext) {
        List<Factory> parents = tridentContext.getParentTupleFactories();
        if(parents.size()!=1) {
            throw new RuntimeException("Each operation can only have one parent");
        }
        _context = tridentContext;
        _collector = new AppendCollector(tridentContext);
        _projection = new ProjectionFactory(parents.get(0), _inputFields);
        _function.prepare(conf, new TridentOperationContext(context, _projection));
    }
    
    public void execute(ProcessorContext processorContext, String streamId, TridentTuple tuple) {
        _collector.setContext(processorContext, tuple);
        _function.execute(_projection.create(tuple), _collector);
    }

    public void startBatch(ProcessorContext processorContext) {
    }

    public void finishBatch(ProcessorContext processorContext) {
    }
複製程式碼

EachProcessor則是將ProcessorContext設定到_collector，然後呼叫_function.execute的時候，將_collector傳遞過去；這裡的_collector為AppendCollector

AppendCollector

storm-core-1.2.2-sources.jar!/org/apache/storm/trident/planner/processor/AppendCollector.java

public class AppendCollector implements TridentCollector {
    OperationOutputFactory _factory;
    TridentContext _triContext;
    TridentTuple tuple;
    ProcessorContext context;
    
    public AppendCollector(TridentContext context) {
        _triContext = context;
        _factory = new OperationOutputFactory(context.getParentTupleFactories().get(0), context.getSelfOutputFields());
    }
                
    public void setContext(ProcessorContext pc, TridentTuple t) {
        this.context = pc;
        this.tuple = t;
    }

    @Override
    public void emit(List<Object> values) {
        TridentTuple toEmit = _factory.create((TridentTupleView) tuple, values);
        for(TupleReceiver r: _triContext.getReceivers()) {
            r.execute(context, _triContext.getOutStreamId(), toEmit);
        }
    }

    @Override
    public void reportError(Throwable t) {
        _triContext.getDelegateCollector().reportError(t);
    } 
    
    public Factory getOutputFactory() {
        return _factory;
    }
}
複製程式碼

當_function.execute使用AppendCollector進行emit的時候，AppendCollector會將這些tuple交給TupleReceiver去處理，而傳遞過去的context為EachProcessor設定的ProcessorContext，即每個batch自己的ProcessorContext；TupleReceiver的execute方法可能對ProcessorContext進行存取，這個也是batch維度的，比如AggregateProcessor將聚合結果存放到自己batch的processorContext.state中

小結

storm的trident使用[id,count]資料來告訴下游的TridentBoltExecutor來結束一個batch；而TridentBoltExecutor在接收[id,count]資料的時候，會先判斷tracked.reportedTasks是否等於cond.expectedTaskReports(這個在上游的TridentBoltExecutor的parallelism大於1的時候用來聚合這些task的資料)，相等之後再判斷tracked.receivedTuples是否等於tracked.expectedTupleCount，相等才能進行finishBatch
storm的trident spout的_maxTransactionActive引數根據Config.TOPOLOGY_MAX_SPOUT_PENDING(topology.max.spout.pending)進行設定，配置檔案預設為null，在該值為null時_maxTransactionActive為1
MasterBatchCoordinator對同時處理的batches的數量進行了控制，只有_activeTx中的batches處理成功或失敗之後才能繼續下一個batch；而當並行有多個_activeTx的時候，下游的TridentBoltExecutor也能夠區分batch來進行處理，不會造成混亂；比如SubtopologyBolt在initBatchState的時候，建立ProcessorContext的也是帶有batchId的標識，這樣子不同的batch並行的話，它們的ProcessorContext也是區分開來的；SubtopologyBolt裡頭呼叫的TridentProcessor有的會使用ProcessorContext來儲存結果，比如AggregateProcessor將聚合結果存放到自己batch的processorContext.state中

doc

聊聊storm trident spout的_maxTransactionActive

序本文主要研究一下storm trident spout的_maxTransactionActive MasterBatchCoordinator storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/MasterBatchCo

聊聊storm trident batch的分流與聚合

序本文主要研究一下storm trident batch的分流與聚合例項 TridentTopology topology = new TridentTopology(); topology.newStream("spout1", spout)

storm trident merger

pos nds ride art func con clas meger tin import java.util.List; import backtype.storm.Config; import backtype.storm.LocalClus

Storm Trident狀態

分享機制不知道變化 stat prev 批次更多如果　　Trident中有對狀態數據進行讀取和寫入操作的一流抽象工具。狀態既可以保存在拓撲內部，比如保存在內容中並由HDFS存儲，也可以通過外部存儲（比如Memcached或Cassandra）存儲在數據庫中。而對

Storm Trident示例shuffle&parallelismHint

大並發 extends bool obj 輸出 bsp shuf shu private 本例包括Storm Trident中shuffle與parallelismHint的使用。代碼當中包括註釋 import java.util.Date; import java

Storm Trident示例partitionBy

fields number val orm 不同 col tails top b- 如下代碼使用partitionBy做repartition, partitionBy即根據相應字段的值按一定算法，把tuple分配到目標partition當中（Target Partitio

Storm Trident示例function, filter, projection

部分 tin keep class top collect storm topo .get 以下代碼演示function, filter, projection的使用，可結合註釋省略部分代碼，省略部分可參考：https://blog.csdn.net/nickta/art

Storm Trident示例ReducerAggregator

bug thread 一個 fields pan part 分區合並 use core ReducerAggregator首先在輸入流上運行全局重新分區操作(global)將同一批次的所有分區合並到一個分區中，然後在每個批次上運行的聚合功能，針對Batch操作。省略部

Storm Trident示例Aggregator

lds 分代 pos lob integer 所有 body AD news Aggregator首先在輸入流上運行全局重新分區操作(global)將同一批次的所有分區合並到一個分區中，然後在每個批次上運行的聚合功能，針對Batch操作。與ReduceAggregator很

storm trident State

storm AC 支持 updater ide ID 更新數據減少調用 State 是用來管理從數據存儲中查詢數據(使用batch中的tuple作為輸入來查詢) 插入和更新數據(把batch中的tuple更新或者插入到數據存儲) 裏面涉及到事務管理對於，數據存儲

storm trident 消息成功處理

都是 timeout ide 進行 play shuffle 並行消息 trident trident裏面 batch會被緩存，這樣失敗了可以重新發送多個batch可以並行被process，但是commit是嚴格按照txid順序來執行一個batch的狀態會存在zk裏

storm trident 一個batch多大

conf increase trident eas example part broker 有一個 storm You can increase the batch size by changing "tridentKafkaConfig.fetchSizeBytes" p

storm trident 事務和 spout和state有關

batch 寫到 eval BE prev storm 數據基類是否首先spout有三種：這些關系到相同的batchid裏面是否包含相同的tuple 事務性：相同模糊事務性：如果取不到原來的，則拿新的無事務：不一定所以只有事務性才能做到一個tuple唯一一次處理

storm trident 如何標記一個batch被處理——coordinator spout

tuples google setting IT sem com for oge ack Splitting a stream has no effect on the batch. If you join the stream back together, then ye

聊聊storm的CustomStreamGrouping

序本文主要研究一下storm的CustomStreamGrouping CustomStreamGrouping storm-2.0.0/storm-client/src/jvm/org/apache/storm/grouping/CustomStreamGrouping.java public i

聊聊storm的AssignmentDistributionService

序本文主要研究一下storm的AssignmentDistributionService AssignmentDistributionService storm-2.0.0/storm-server/src/main/java/org/apache/storm/nimbus/AssignmentDis

聊聊storm worker的executor與task

序本文主要研究一下storm worker的executor與task Worker storm-2.0.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java public static void

聊聊storm TridentBoltExecutor的finishBatch方法

序本文主要研究一下storm TridentBoltExecutor的finishBatch方法 MasterBatchCoordinator.nextTuple storm-core-1.2.2-sources.jar!/org/apache/storm/trident/topology/Maste

聊聊storm的OpaquePartitionedTridentSpoutExecutor

序本文主要研究一下storm的OpaquePartitionedTridentSpoutExecutor TridentTopology.newStream storm-core-1.2.2-sources.jar!/org/apache/storm/trident/TridentTopology.j

聊聊storm的ICommitterTridentSpout

序本文主要研究一下storm的ICommitterTridentSpout ICommitterTridentSpout storm-core-1.2.2-sources.jar!/org/apache/storm/trident/spout/ICommitterTridentSpout.java

聊聊storm trident spout的_maxTransactionActive

序

MasterBatchCoordinator

TridentSpoutCoordinator

TridentBoltExecutor

BatchInfo

TridentSpoutExecutor

SubtopologyBolt

AggregateProcessor

EachProcessor

AppendCollector

小結

doc

相關推薦