TensorRT 加速 資源整理
阿新 • • 發佈:2018-12-22
最近很想用TensorRT對一些目標檢測的模型進行加速,之前也看過一些資料,但是都很零散,收藏夾也不方便,特此在此部落格做一些記載,僅供參考。
In compuatational mode=FP16, TensorRT can accept input or output data in either FP32 or FP16 mode.
You can change to use any combinations below for input and output:
• Input FP32, output FP32
• Input FP16, output FP32
• Input FP16, output FP16
• Input FP32, output FP16
setAllNetworkInputsToHalf(network);
static void setAllNetworkInputsToHalf(INetworkDefinition* network){
for (int i = 0; i < network->getNbInputs(); i++)
network->getInput(i)->setType(DataType::kHALF);
}
在jetson 上的例子所在位置:
You can refer to our tensorRT sample which is located at ‘/usr/src/gie_samples/’.
可以利用如下方法進行自定義層的設定
For example,
Separate your network to: input -> networkA -> networkSelf -> networkB -> output
NetworkA and networkB can inference directly via tensorRT.
NetworkSelf needs to be implemented via CUDA.
So, the flow will be:
IExecutionContext *contextA = engineA->createExecutionContext(); //create networkA
IExecutionContext *contextB = engineB->createExecutionContext(); //create networkB
<... >
contextA.enqueue(batchSize, buffersA, stream, nullptr); //inference networkA
myLayer(outputFromA, inputToB, stream); //inference networkSelf, your cuda code is here!
contextB.enqueue(batchSize, buffersB, stream, nullptr); //inference networkB