1. 程式人生 > 其它 >【分享】 Versal AIE 上手嚐鮮 -- Standalone例程

【分享】 Versal AIE 上手嚐鮮 -- Standalone例程

目錄

最近陸陸續續有工程師拿到了VCK190單板。 VCK190帶Xilinx的7nm AIE,有很強的處理能力。 本文介紹怎麼執行Xilinx AIE的例程,熟悉AIE開發流程。

本文先介紹Standalone(BareMetal)的例程, 它來自於Vitis-TutorialsAIE a2z

準備工作

License

在上手之前,需要注意是VCK190 Production單板,還是VCK190 ES單板。如果是VCK190 Production單板,使用VCK190 Voucher,在Xilinx網站,可以申請到License。安裝License後,License的狀態視窗下,能看到下列專案。

AIEBuild
AIESim
MEBuild
MESim

如果是VCK190 ES單板,需要在Lounge裡申請"Versal Tools Early Eacess"; "Versal Tools PDI Early Eacess"的License,並在Vivado裡使能ES器件。在Vivado/2020.2/scripts/init.tcl的檔案裡,新增“enable_beta_device xcvc*”,可以自動使能ES器件。

Platform

在進行開發之前,需要準備Platform。 VCK190 Production單板和VCK190 ES單板使用的Platform不一樣,可以從下面連結下載各自的Platform,再複製到目錄“Xilinx/Vitis/2020.2/platforms/”下。
VCK190 Production Platform
VCK190 ES Platform

準備好後,目錄結構與下面類似。

Common Images

Xilinx現在還提供了Common Images,包含對應單板的Linux啟動檔案,和編譯器、sysroots(標頭檔案、應用程式庫)等。可以在Xilinx Download下載Versal common image

測試環境

Host OS: Ubuntu 18.04
Vitis 2020.2
PetaLinux 2020.2
VCK190 Production

AIE Standalone Flow

例程AIE a2z 是Standalone (BareMetal)的例程,Versal的A72不執行Linux。 它很全面,包含建立Platform、建立AIE Kernel、建立PL Kernel、建立A72應用程式、除錯AIE Kernel。在Xilinx的文件中,AIE的程式,叫Kernel; 在Vitis裡使用HLS開發的PL設計,也叫Kernel。

注意,2021年7月份,Vitis Tutorials的"master"分支,才包含例程AIE a2z

AIE a2z 分析

檔案列表

AIE a2z 包含下列檔案。

aie_adder:
│  description.json
│  details.rst
│  Makefile
│  qor.json
│  README.rst
│  system.cfg
│  utils.mk
│  xrt.ini
│  
├─data
│      golden.txt
│      input0.txt
│      input1.txt
│      
└─src
        aie_adder.cc
        aie_graph.cpp
        aie_graph.h
        aie_kernel.h
        host.cpp
        pl_mm2s.cpp
        pl_s2mm.cpp

aie_adder.cc

aie_adder.cc是定義AIE Kernel的檔案,也是最重要的檔案,模擬和實際執行都需要。

AIE Kernel也很簡單,相當於是C語言程式設計的HelloWorld, 只是讀取2個向量,做加法運算後,再寫出去。

void aie_adder(input_stream_int32* in0, input_stream_int32* in1, output_stream_int32* out) {
    v4int32 a = readincr_v4(in0);
    v4int32 b = readincr_v4(in1);
    v4int32 c = operator+(a, b);
    writeincr_v4(out, c);
}

aie_graph.cpp

aie_graph.cpp定義和控制運算的graph,這個例子中,只用於模擬。

#include "aie_graph.h"

PLIO* in0 = new PLIO("DataIn0", adf::plio_32_bits, "data/input0.txt");
PLIO* in1 = new PLIO("DataIn1", adf::plio_32_bits, "data/input1.txt");
PLIO* out = new PLIO("DataOut", adf::plio_32_bits, "data/output.txt");

// Hank: only for simulation??
simulation::platform<2, 1> platform(in0, in1, out);

simpleGraph addergraph;

connect<> net0(platform.src[0], addergraph.in0);
connect<> net1(platform.src[1], addergraph.in1);

connect<> net2(addergraph.out, platform.sink[0]);

#ifdef __AIESIM__
int main(int argc, char** argv) {
    addergraph.init();
    addergraph.run(4);
    addergraph.end();
    return 0;
}
#endif

aie_graph.h

aie_graph.cpp定義了運算的graph,模擬和實際執行都需要。

#include <adf.h>
#include "aie_kernel.h"

using namespace adf;

class simpleGraph : public graph {
   private:
    kernel adder;

   public:
    port<input> in0, in1;
    port<output> out;

    simpleGraph() {
        adder = kernel::create(aie_adder);

        connect<stream>(in0, adder.in[0]);
        connect<stream>(in1, adder.in[1]);
        connect<stream>(adder.out[0], out);

        source(adder) = "aie_adder.cc";

        runtime<ratio>(adder) = 0.1;
    };
};

aie_kernel.h

aie_kernel.h最簡單,只聲明瞭aie_adder的原型,模擬和實際執行都需要。

void aie_adder(input_stream_int32* in0, input_stream_int32* in1, output_stream_int32* out);

host.cpp

host.cpp會申請記憶體,載入資料, 載入xclbin, 執行AIE Kernel。

simpleGraph addergraph;

static std::vector<char> load_xclbin(xrtDeviceHandle device, const std::string& fnm) {

    // load bit stream
    std::ifstream stream(fnm);
    stream.seekg(0, stream.end);
    size_t size = stream.tellg();
    stream.seekg(0, stream.beg);

    std::vector<char> header(size);
    stream.read(header.data(), size);

    auto top = reinterpret_cast<const axlf*>(header.data());
    xrtDeviceLoadXclbin(device, top);

    return header;
}

int main(int argc, char** argv) {

    // Open xclbin
    auto dhdl = xrtDeviceOpen(0); // Open Device the local device
    auto xclbin = load_xclbin(dhdl, "krnl_adder.xclbin");
    auto top = reinterpret_cast<const axlf*>(xclbin.data());
    adf::registerXRT(dhdl, top->m_header.uuid);

    int DataInput0[sizeIn], DataInput1[sizeIn];
    for (int i = 0; i < sizeIn; i++) {
        DataInput0[i] = rand() % 100;
        DataInput1[i] = rand() % 100;
    }

    // input memory
    // Allocating the input size of sizeIn to MM2S
    // This is using low-level XRT call xclAllocBO to allocate the memory

    xrtBufferHandle in_bohdl0 = xrtBOAlloc(dhdl, sizeIn * sizeof(int), 0, 0);
    auto in_bomapped0 = reinterpret_cast<uint32_t*>(xrtBOMap(in_bohdl0));
    memcpy(in_bomapped0, DataInput0, sizeIn * sizeof(int));
    printf("Input memory virtual addr 0x%px\n", in_bomapped0);

    xrtBufferHandle in_bohdl1 = xrtBOAlloc(dhdl, sizeIn * sizeof(int), 0, 0);
    auto in_bomapped1 = reinterpret_cast<uint32_t*>(xrtBOMap(in_bohdl1));
    memcpy(in_bomapped1, DataInput1, sizeIn * sizeof(int));
    printf("Input memory virtual addr 0x%px\n", in_bomapped1);

    // output memory
    // Allocating the output size of sizeOut to S2MM
    // This is using low-level XRT call xclAllocBO to allocate the memory

    xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, sizeOut * sizeof(int), 0, 0);
    auto out_bomapped = reinterpret_cast<uint32_t*>(xrtBOMap(out_bohdl));
    memset(out_bomapped, 0xABCDEF00, sizeOut * sizeof(int));
    printf("Output memory virtual addr 0x%px\n", out_bomapped);

    // mm2s ip
    // Using the xrtPLKernelOpen function to manually control the PL Kernel
    // that is outside of the AI Engine graph

    xrtKernelHandle mm2s_khdl1 = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_mm2s:{pl_mm2s_1}");
    // Need to provide the kernel handle, and the argument order of the kernel arguments
    // Here the in_bohdl is the input buffer, the nullptr is the streaming interface and must be null,
    // lastly, the size of the data. This info can be found in the kernel definition.
    xrtRunHandle mm2s_rhdl1 = xrtKernelRun(mm2s_khdl1, in_bohdl0, nullptr, sizeIn);
    printf("run pl_mm2s_1\n");

    xrtKernelHandle mm2s_khdl2 = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_mm2s:{pl_mm2s_2}");
    xrtRunHandle mm2s_rhdl2 = xrtKernelRun(mm2s_khdl2, in_bohdl1, nullptr, sizeIn);
    printf("run pl_mm2s_2\n");

    // s2mm ip
    // Using the xrtPLKernelOpen function to manually control the PL Kernel
    // that is outside of the AI Engine graph

    xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_s2mm");
    // Need to provide the kernel handle, and the argument order of the kernel arguments
    // Here the out_bohdl is the output buffer, the nullptr is the streaming interface and must be null,
    // lastly, the size of the data. This info can be found in the kernel definition.
    xrtRunHandle s2mm_rhdl = xrtKernelRun(s2mm_khdl, out_bohdl, nullptr, sizeOut);
    printf("run pl_s2mm\n");

    // graph execution for AIE
    printf("graph init. This does nothing because CDO in boot PDI already configures AIE.\n");
    addergraph.init();

    printf("graph run\n");
    addergraph.run(N_ITER);

    addergraph.end();
    printf("graph end\n");

    // wait for mm2s done
    auto state = xrtRunWait(mm2s_rhdl1);
    std::cout << "mm2s_1 completed with status(" << state << ")\n";
    xrtRunClose(mm2s_rhdl1);
    xrtKernelClose(mm2s_khdl1);

    state = xrtRunWait(mm2s_rhdl2);
    std::cout << "mm2s_2 completed with status(" << state << ")\n";
    xrtRunClose(mm2s_rhdl2);
    xrtKernelClose(mm2s_khdl2);

    // wait for s2mm done
    state = xrtRunWait(s2mm_rhdl);
    std::cout << "s2mm completed with status(" << state << ")\n";
    xrtRunClose(s2mm_rhdl);
    xrtKernelClose(s2mm_khdl);

    // Comparing the execution data to the golden data

    // clean up XRT
    std::cout << "Releasing remaining XRT objects...\n";
    xrtBOFree(in_bohdl0);
    xrtBOFree(in_bohdl1);
    xrtBOFree(out_bohdl);
    xrtDeviceClose(dhdl);

    return errorCount;
}

pl_mm2s.cpp

pl_mm2s.cpp是利用HLS做的PL設計,用於從記憶體搬移資料到AIE Kernel。

void pl_mm2s(ap_int<32>* mem, hls::stream<qdma_axis<32, 0, 0, 0> >& s, int size) {
data_mover:
    for (int i = 0; i < size; i++) {
        qdma_axis<32, 0, 0, 0> x;
        x.data = mem[i];
        x.keep_all();
        s.write(x);
    }
}

pl_mm2s.cpp

pl_mm2s.cpp也是利用HLS做的PL設計,用於從AIE Kernel搬移資料到記憶體。

void pl_s2mm(ap_int<32>* mem, hls::stream<qdma_axis<32, 0, 0, 0> >& s, int size) {
data_mover:
    for (int i = 0; i < size; i++) {
        qdma_axis<32, 0, 0, 0> x = s.read();
        mem[i] = x.data;
    }
}

經驗

AIE a2z 做得相當完善,基本可以順利完成。 在實驗過程中,可能遇到下列問題。

AXI Interrupt

建立平臺(Platform)時,AXI中斷控制器(axi_intc)沒有連線中斷源。Vitis編譯工程時,會連線HLS設計的IP模組的中斷輸出到AXI中斷控制器(axi_intc)。 如果驗證平臺(Platform)的Block Design時,Vivado會報告下列關於中斷控制器訊息,提示沒有中斷源,可以忽略。

[BD 41-759] The input pins (listed below) are either not connected or do not have a source port, and they don't have a tie-off specified. These pins are tied-off to all 0's to avoid error in Implementation flow.
Please check your design and connect them as needed: 
/axi_intc_0/intr

sys_clk0

Vivado也會對輸入時鐘報告下列時鐘不匹配的訊息。Vivado建立Block Design時,預設的時鐘是100MHz。單板上的實際時鐘是200MHz。選中sys_clk0_0,在屬性中,把它更改為200MHz。

[xilinx.com:ip:axi_noc:1.0-1] /ps_nocClock frequency of the connected clock (/ps_noc/sys_clk0) is 100.000 MHz while "Input System Clock Frequency" is 200.000 MHz. Please either reconfigure the parameter "Input System Clock Period" of the axi_noc (in DDR Basic tab) or change frequency of the connected clock (CONFIG.FREQ_HZ) within the range of 199920031.987 to 200080032.013 Hz.

AIE license

如果Vitis編譯工程時,報告“AIE license not found”,請申請license。

AIE license not found !
/opt/Xilinx/Vitis/2020.2/aietools/bin/aieir_be: line 96: kill: (-28000) - No such process
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
  >> aieir_be --time-passes=0  --trace-plio-width=64  --pl-freq=0  --use-real-noc=true  --show-loggers=false  --high-performance=false  --kernel-address-location=false  --target=x86sim --swfifo-threshold=40  --single-mm2s-channel=false  --workdir=./Work  --exit-after=complete  --event-trace-config=  --test-iterations=-1  --stacksize=1024  --platform=/proj/hankf/vck190/vck190_aie_a2z/vitis/base_pfm_vck190_aie_a2z/export/base_pfm_vck190_aie_a2z/base_pfm_vck190_aie_a2z.xpfm  --event-trace-custom-config=  --disable-dma-cmd-alignment=false  --enable-ecc-scrubbing=false  --write-partitioned-file=true  --schemafile=AIEGraphSchema.json  --include="/opt/Xilinx/Vitis/2020.2/aietools/include" --include="/opt/Xilinx/Vitis_HLS/2020.2/include" --include="../" --include="../src" --include="../data" --include="../src/kernels" --device=  --write-unified-data=false  --fastmath=false  --event-trace-advanced-mapping=0  --log-level=1  --enable-reconfig=false  --aiesim-xrt-api=false  --gen-graph-cleanup=false  --use-canonical-net-names=false  --event-trace-port=plio --new-placer=true  --use-phy-shim=true  --xlopt=0  --pre-compile-kernels=false  --validate-only=false  --trace-aiesim-option=0  --aiearch=aie  --mapped-soln-udm=  --optimize-pktids=false  --no-init=false  --num-trace-streams=1  --aie-heat-map=false  --phydevice=  --exec-timed=0  --pl-auto-restart=false  --routed-soln-udm=  --enable-profiling=false  --disable-transform-merge-broadcast=false  --verbose=true  --use-async-rtp-locks=true  --repo-path=  --genArchive=false  --pl-axi-lite=false  --new-router=true  --aie-driver-v1=false  --logcfg-file=  --event-trace-bounding-box=  --enable-reconfig-dma-autostart=false  --heapsize=1024  --logical-arch=  --nodot-graph=false  --shim-constraints=  --disable-dma-autostart=false  --disable-transform-broadcast-split=true  -json ./Work/temp/project.json -sdf-graph /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.cpp.

安裝dot

Vitis在編譯過程中,會用到工具dot。如果沒有安裝sudo apt install graphviz,會得到錯誤"sh: 1: dot: not foun"。 在Ubuntu 18.04下,如果有管理員許可權,使用命令“sudo apt install graphviz”能安裝dot。

DEBUG:MapperPartitioner: Adding Edge : Name=D_net2 SrcPort=i1_po0 DstPort=i3_pi0 EdgeType=mem
DEBUG:MapperPartitioner:Done--Add Double Buffer Edge SrcPort=i1_po0 DstPort=i3_pi0 type=mem Edge=net2:i1-(buf2)->i3
DEBUG:MapperPartitioner:Graph After Adding Double Edges
sh: 1: dot: not found

ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
  >> dot ./Work/reports/project.dot -Tpng -o ./Work/reports/project.png
.
Please check the output log for errors and fix those before you run the application.
/opt/Xilinx/Vitis/2020.2/aietools/bin/aieir_be: line 96: kill: (-44668) - No such process
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
  >> aieir_be --time-passes=0  --trace-plio-width=64  --pl-freq=0  --use-real-noc=true  --show-loggers=false  --high-performance=false  --kernel-address-location=false  --target=hw --swfifo-threshold=40  --single-mm2s-channel=false  --workdir=./Work  --exit-after=complete  --event-trace-config=  --test-iterations=-1  --stacksize=1024  --platform=/proj/hankf/vck190/vck190_aie_a2z/vitis/base_pfm_vck190_aie_a2z/export/base_pfm_vck190_aie_a2z/base_pfm_vck190_aie_a2z.xpfm  --event-trace-custom-config=  --disable-dma-cmd-alignment=false  --enable-ecc-scrubbing=false  --write-partitioned-file=true  --schemafile=AIEGraphSchema.json  --include="/opt/Xilinx/Vitis/2020.2/aietools/include" --include="/opt/Xilinx/Vitis_HLS/2020.2/include" --include="../" --include="../src" --include="../data" --include="../src/kernels" --device=  --write-unified-data=false  --fastmath=false  --event-trace-advanced-mapping=0  --log-level=1  --enable-reconfig=false  --aiesim-xrt-api=false  --gen-graph-cleanup=false  --use-canonical-net-names=false  --event-trace-port=plio --new-placer=true  --use-phy-shim=true  --xlopt=0  --pre-compile-kernels=false  --validate-only=false  --trace-aiesim-option=0  --aiearch=aie  --mapped-soln-udm=  --optimize-pktids=false  --no-init=false  --num-trace-streams=1  --aie-heat-map=false  --phydevice=  --exec-timed=0  --pl-auto-restart=false  --routed-soln-udm=  --enable-profiling=false  --disable-transform-merge-broadcast=false  --verbose=true  --use-async-rtp-locks=true  --repo-path=  --genArchive=false  --pl-axi-lite=false  --new-router=true  --aie-driver-v1=false  --logcfg-file=  --event-trace-bounding-box=  --enable-reconfig-dma-autostart=false  --heapsize=1024  --logical-arch=  --nodot-graph=false  --shim-constraints=  --disable-dma-autostart=false  --disable-transform-broadcast-split=true  -json ./Work/temp/project.json -sdf-graph /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.cpp.

軟體Emulation

執行軟體Emulation的時候,要選擇AIE工程,不選擇system project。如果選擇system project執行軟體Emulation,會出現下列錯誤。

Error while launching program: 
The selected system project 'simple_application_system' contains applications (simple_application) that doesn't support launching software emulation.
The selected system project 'simple_application_system' contains applications (simple_application) that doesn't support launching software emulation.

硬體Emulation

先執行軟體Emulation,再執行硬體Emulation。
如果直接執行硬體Emulation,會出現下列錯誤。

Failed to start emulator on the project 'simple_application_system' using the build configuration 'Emulation-HW'.
Launch emulator script doesn't exist at location '/proj/hankf/vck190/vck190_aie_a2z_script_hw_prj/custom_pfm_vck190/vitis/simple_application_system/Emulation-HW/package/launch_hw_emu.sh'.

另外Vitis裡,先選擇AIE工程,再編譯AIE工程,然後去啟動硬體Emulation,選單裡可能沒有目標。編譯後,要重新選擇system project,再選擇AIE工程,再去啟動硬體Emulation,選單裡就會有目標。

A72軟體沒有ap_int.h

檔案mm2s.cpp和s2mm.cpp時給HLS設計用的,不能新增到A72的軟體工程裡。如果把它們加到了A72的軟體工程裡,會遇到錯誤“ap_int.h: No such file or directory”。

aarch64-none-elf-g++ -Wall -O0 -g3 -I"/opt/Xilinx/Vitis/2020.2/aietools/include" 
-I"/proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src" 
-I"/../include" -c -fmessage-length=0 -MT"src/mm2s.o" -mcpu=cortex-a72 -I/proj/hankf/vck190/vck190_aie_a2z/vitis/vck190_aie_a2z_aie_output_platform/export/vck190_aie_a2z_aie_output_platform/sw/vck190_aie_a2z_aie_output_platform/standalone_domain/bspinclude/include -MMD -MP -MF"src/mm2s.d" -MT"src/mm2s.o" -o "src/mm2s.o" "../src/mm2s.cpp"
../src/mm2s.cpp:33:10: fatal error: ap_int.h: No such file or directory
   33 | #include <ap_int.h>
      |          ^~~~~~~~~~

A72軟體工程找不到simple(input_window, output_window)

A72軟體要控制AIE Kernel,需要相關資訊。因此預先把AIE工程編譯後產生的檔案“Hardware/Work/ps/c_rts/aie_control.cpp“,新增到 A72軟體工程。

如果忘記新增,可能會得到錯誤資訊,“undefined reference to `simple(input_window, output_window)'”

aarch64-none-elf-g++ -L/opt/Xilinx/Vitis/2020.2/aietools/lib/aarchnone64.o -mcpu=cortex-a72 -Wl,-T -Wl,../src/lscript.ld -L/proj/hankf/vck190/vck190_aie_a2z/vitis/vck190_aie_a2z_aie_output_platform/export/vck190_aie_a2z_aie_output_platform/sw/vck190_aie_a2z_aie_output_platform/standalone_domain/bsplib/lib -o "aie_a2z_vck190_a72_ctrl_app.elf"  ./src/main.o ./src/platform.o   -ladf_api -Wl,--start-group,-lxil,-lgcc,-lc,-lstdc++,--end-group
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: ./src/main.o: in function `simpleGraph::simpleGraph()':
/proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:17: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:17: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:18: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
makefile:48: recipe for target 'aie_a2z_vck190_a72_ctrl_app.elf' failed
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:18: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
collect2.real: error: ld returned 1 exit status
make: *** [aie_a2z_vck190_a72_ctrl_app.elf] Error 1

Package

編譯A72程式後,要編譯system project,將所有模組打包再一起。這時候,要根據04-ps_application_creation_run_all.md的Step 3. Build the Full System,新增打包選項,“--package.ps_elf ../../A-to-Z_app/Debug/A-to-Z_app.elf,a72-0 --package.defer_aie_run”。

如果沒有新增,會報告錯誤“no xclbin input is found”。

Package step cannot be performed since the platform has a VPP link generated XSA and no xclbin input is found. Please provide a valid xclbin location in system project package options
11:21:21 Build Finished (took 646ms)