1. 程式人生 > >TensorRT 加速 資源整理

TensorRT 加速 資源整理

最近很想用TensorRT對一些目標檢測的模型進行加速,之前也看過一些資料,但是都很零散,收藏夾也不方便,特此在此部落格做一些記載,僅供參考。

In compuatational mode=FP16, TensorRT can accept input or output data in either FP32 or FP16 mode.
You can change to use any combinations below for input and output:
• Input FP32, output FP32
• Input FP16, output FP32
• Input FP16, output FP16
• Input FP32, output FP16

setAllNetworkInputsToHalf(network);

static void setAllNetworkInputsToHalf(INetworkDefinition* network){
    for (int i = 0; i < network->getNbInputs(); i++)
        network->getInput(i)->setType(DataType::kHALF);
}

在jetson 上的例子所在位置:
You can refer to our tensorRT sample which is located at ‘/usr/src/gie_samples/’.
可以利用如下方法進行自定義層的設定
For example,
Separate your network to: input -> networkA -> networkSelf -> networkB -> output

NetworkA and networkB can inference directly via tensorRT.
NetworkSelf needs to be implemented via CUDA.

So, the flow will be:

IExecutionContext *contextA = engineA->createExecutionContext(); //create networkA
IExecutionContext *contextB = engineB->createExecutionContext(); //create networkB
<...
> contextA.enqueue(batchSize, buffersA, stream, nullptr); //inference networkA myLayer(outputFromA, inputToB, stream); //inference networkSelf, your cuda code is here! contextB.enqueue(batchSize, buffersB, stream, nullptr); //inference networkB