1. 程式人生 > 其它 >ONNX Runtime 原始碼閱讀:各類 ml-Values 在記憶體管理上的分類

ONNX Runtime 原始碼閱讀:各類 ml-Values 在記憶體管理上的分類

出處:程式碼的註釋內容 include/onnxruntime/core/framework/alloc_kind.h

ONNX Runtime 在推理流程中,存在以下幾類值(ml-Values):

  • inference inputs:由呼叫者(caller)分配以及釋放記憶體空間,預設情況下執行時(runtime)對它只讀不寫(read-only)
  • inference outputs:由執行時分配記憶體,並將所有權(ownership)轉移給呼叫者
  • weights(constant tensors,常量型別的張量):只分配一次,一個 InferenceSession 中的所有 Inference 可以複用該值
  • tensor values:這類張量值得生命週期是靜態確定的,用於記憶體複用、共享等優化。執行時將在正確的時間分配以及釋放記憶體空間。

以下是原文:

The ml-Values fall into the following categories with respect to their
memory management:

  • inference inputs: owned (allocated and freed) by caller, and is by
    default read-only by the runtime.
  • inference outputs: allocated by runtime, ownership transferred to
    caller. TODO: Make sure this semantics is clear in InferenceSession API.
  • weights (constant tensors): can be allocated once (statically), and
    reused by all inference calls within an InferenceSession.
  • tensor values: The lifetimes of these tensor-values are statically
    determined, which is used for memory reuse/sharing optimizations. The
    runtime allocates/frees these values at the right time (as determined
    by the static allocation plan). Note that this is simplified since we
    do not try to optimize for "slice" like ops, where we may be able to
    conditionally reuse memory/data in some cases but not others.
    Generalizing this is future work.