【分享】 Versal AIE 上手嚐鮮 -- Standalone例程
最近陸陸續續有工程師拿到了VCK190單板。 VCK190帶Xilinx的7nm AIE,有很強的處理能力。 本文介紹怎麼執行Xilinx AIE的例程,熟悉AIE開發流程。
本文先介紹Standalone(BareMetal)的例程, 它來自於Vitis-Tutorials 的 AIE a2z。
準備工作
License
在上手之前,需要注意是VCK190 Production單板,還是VCK190 ES單板。如果是VCK190 Production單板,使用VCK190 Voucher,在Xilinx網站,可以申請到License。安裝License後,License的狀態視窗下,能看到下列專案。
AIEBuild
AIESim
MEBuild
MESim
如果是VCK190 ES單板,需要在Lounge裡申請"Versal Tools Early Eacess"; "Versal Tools PDI Early Eacess"的License,並在Vivado裡使能ES器件。在Vivado/2020.2/scripts/init.tcl的檔案裡,新增“enable_beta_device xcvc*”,可以自動使能ES器件。
Platform
在進行開發之前,需要準備Platform。 VCK190 Production單板和VCK190 ES單板使用的Platform不一樣,可以從下面連結下載各自的Platform,再複製到目錄“Xilinx/Vitis/2020.2/platforms/”下。
VCK190 Production Platform
VCK190 ES Platform
準備好後,目錄結構與下面類似。
Common Images
Xilinx現在還提供了Common Images,包含對應單板的Linux啟動檔案,和編譯器、sysroots(標頭檔案、應用程式庫)等。可以在Xilinx Download下載Versal common image
測試環境
Host OS: Ubuntu 18.04
Vitis 2020.2
PetaLinux 2020.2
VCK190 Production
AIE Standalone Flow
例程AIE a2z 是Standalone (BareMetal)的例程,Versal的A72不執行Linux。 它很全面,包含建立Platform、建立AIE Kernel、建立PL Kernel、建立A72應用程式、除錯AIE Kernel。在Xilinx的文件中,AIE的程式,叫Kernel; 在Vitis裡使用HLS開發的PL設計,也叫Kernel。
注意,2021年7月份,Vitis Tutorials的"master"分支,才包含例程AIE a2z 。
AIE a2z 分析
檔案列表
AIE a2z 包含下列檔案。
aie_adder:
│ description.json
│ details.rst
│ Makefile
│ qor.json
│ README.rst
│ system.cfg
│ utils.mk
│ xrt.ini
│
├─data
│ golden.txt
│ input0.txt
│ input1.txt
│
└─src
aie_adder.cc
aie_graph.cpp
aie_graph.h
aie_kernel.h
host.cpp
pl_mm2s.cpp
pl_s2mm.cpp
aie_adder.cc
aie_adder.cc是定義AIE Kernel的檔案,也是最重要的檔案,模擬和實際執行都需要。
AIE Kernel也很簡單,相當於是C語言程式設計的HelloWorld, 只是讀取2個向量,做加法運算後,再寫出去。
void aie_adder(input_stream_int32* in0, input_stream_int32* in1, output_stream_int32* out) {
v4int32 a = readincr_v4(in0);
v4int32 b = readincr_v4(in1);
v4int32 c = operator+(a, b);
writeincr_v4(out, c);
}
aie_graph.cpp
aie_graph.cpp定義和控制運算的graph,這個例子中,只用於模擬。
#include "aie_graph.h"
PLIO* in0 = new PLIO("DataIn0", adf::plio_32_bits, "data/input0.txt");
PLIO* in1 = new PLIO("DataIn1", adf::plio_32_bits, "data/input1.txt");
PLIO* out = new PLIO("DataOut", adf::plio_32_bits, "data/output.txt");
// Hank: only for simulation??
simulation::platform<2, 1> platform(in0, in1, out);
simpleGraph addergraph;
connect<> net0(platform.src[0], addergraph.in0);
connect<> net1(platform.src[1], addergraph.in1);
connect<> net2(addergraph.out, platform.sink[0]);
#ifdef __AIESIM__
int main(int argc, char** argv) {
addergraph.init();
addergraph.run(4);
addergraph.end();
return 0;
}
#endif
aie_graph.h
aie_graph.cpp定義了運算的graph,模擬和實際執行都需要。
#include <adf.h>
#include "aie_kernel.h"
using namespace adf;
class simpleGraph : public graph {
private:
kernel adder;
public:
port<input> in0, in1;
port<output> out;
simpleGraph() {
adder = kernel::create(aie_adder);
connect<stream>(in0, adder.in[0]);
connect<stream>(in1, adder.in[1]);
connect<stream>(adder.out[0], out);
source(adder) = "aie_adder.cc";
runtime<ratio>(adder) = 0.1;
};
};
aie_kernel.h
aie_kernel.h最簡單,只聲明瞭aie_adder的原型,模擬和實際執行都需要。
void aie_adder(input_stream_int32* in0, input_stream_int32* in1, output_stream_int32* out);
host.cpp
host.cpp會申請記憶體,載入資料, 載入xclbin, 執行AIE Kernel。
simpleGraph addergraph;
static std::vector<char> load_xclbin(xrtDeviceHandle device, const std::string& fnm) {
// load bit stream
std::ifstream stream(fnm);
stream.seekg(0, stream.end);
size_t size = stream.tellg();
stream.seekg(0, stream.beg);
std::vector<char> header(size);
stream.read(header.data(), size);
auto top = reinterpret_cast<const axlf*>(header.data());
xrtDeviceLoadXclbin(device, top);
return header;
}
int main(int argc, char** argv) {
// Open xclbin
auto dhdl = xrtDeviceOpen(0); // Open Device the local device
auto xclbin = load_xclbin(dhdl, "krnl_adder.xclbin");
auto top = reinterpret_cast<const axlf*>(xclbin.data());
adf::registerXRT(dhdl, top->m_header.uuid);
int DataInput0[sizeIn], DataInput1[sizeIn];
for (int i = 0; i < sizeIn; i++) {
DataInput0[i] = rand() % 100;
DataInput1[i] = rand() % 100;
}
// input memory
// Allocating the input size of sizeIn to MM2S
// This is using low-level XRT call xclAllocBO to allocate the memory
xrtBufferHandle in_bohdl0 = xrtBOAlloc(dhdl, sizeIn * sizeof(int), 0, 0);
auto in_bomapped0 = reinterpret_cast<uint32_t*>(xrtBOMap(in_bohdl0));
memcpy(in_bomapped0, DataInput0, sizeIn * sizeof(int));
printf("Input memory virtual addr 0x%px\n", in_bomapped0);
xrtBufferHandle in_bohdl1 = xrtBOAlloc(dhdl, sizeIn * sizeof(int), 0, 0);
auto in_bomapped1 = reinterpret_cast<uint32_t*>(xrtBOMap(in_bohdl1));
memcpy(in_bomapped1, DataInput1, sizeIn * sizeof(int));
printf("Input memory virtual addr 0x%px\n", in_bomapped1);
// output memory
// Allocating the output size of sizeOut to S2MM
// This is using low-level XRT call xclAllocBO to allocate the memory
xrtBufferHandle out_bohdl = xrtBOAlloc(dhdl, sizeOut * sizeof(int), 0, 0);
auto out_bomapped = reinterpret_cast<uint32_t*>(xrtBOMap(out_bohdl));
memset(out_bomapped, 0xABCDEF00, sizeOut * sizeof(int));
printf("Output memory virtual addr 0x%px\n", out_bomapped);
// mm2s ip
// Using the xrtPLKernelOpen function to manually control the PL Kernel
// that is outside of the AI Engine graph
xrtKernelHandle mm2s_khdl1 = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_mm2s:{pl_mm2s_1}");
// Need to provide the kernel handle, and the argument order of the kernel arguments
// Here the in_bohdl is the input buffer, the nullptr is the streaming interface and must be null,
// lastly, the size of the data. This info can be found in the kernel definition.
xrtRunHandle mm2s_rhdl1 = xrtKernelRun(mm2s_khdl1, in_bohdl0, nullptr, sizeIn);
printf("run pl_mm2s_1\n");
xrtKernelHandle mm2s_khdl2 = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_mm2s:{pl_mm2s_2}");
xrtRunHandle mm2s_rhdl2 = xrtKernelRun(mm2s_khdl2, in_bohdl1, nullptr, sizeIn);
printf("run pl_mm2s_2\n");
// s2mm ip
// Using the xrtPLKernelOpen function to manually control the PL Kernel
// that is outside of the AI Engine graph
xrtKernelHandle s2mm_khdl = xrtPLKernelOpen(dhdl, top->m_header.uuid, "pl_s2mm");
// Need to provide the kernel handle, and the argument order of the kernel arguments
// Here the out_bohdl is the output buffer, the nullptr is the streaming interface and must be null,
// lastly, the size of the data. This info can be found in the kernel definition.
xrtRunHandle s2mm_rhdl = xrtKernelRun(s2mm_khdl, out_bohdl, nullptr, sizeOut);
printf("run pl_s2mm\n");
// graph execution for AIE
printf("graph init. This does nothing because CDO in boot PDI already configures AIE.\n");
addergraph.init();
printf("graph run\n");
addergraph.run(N_ITER);
addergraph.end();
printf("graph end\n");
// wait for mm2s done
auto state = xrtRunWait(mm2s_rhdl1);
std::cout << "mm2s_1 completed with status(" << state << ")\n";
xrtRunClose(mm2s_rhdl1);
xrtKernelClose(mm2s_khdl1);
state = xrtRunWait(mm2s_rhdl2);
std::cout << "mm2s_2 completed with status(" << state << ")\n";
xrtRunClose(mm2s_rhdl2);
xrtKernelClose(mm2s_khdl2);
// wait for s2mm done
state = xrtRunWait(s2mm_rhdl);
std::cout << "s2mm completed with status(" << state << ")\n";
xrtRunClose(s2mm_rhdl);
xrtKernelClose(s2mm_khdl);
// Comparing the execution data to the golden data
// clean up XRT
std::cout << "Releasing remaining XRT objects...\n";
xrtBOFree(in_bohdl0);
xrtBOFree(in_bohdl1);
xrtBOFree(out_bohdl);
xrtDeviceClose(dhdl);
return errorCount;
}
pl_mm2s.cpp
pl_mm2s.cpp是利用HLS做的PL設計,用於從記憶體搬移資料到AIE Kernel。
void pl_mm2s(ap_int<32>* mem, hls::stream<qdma_axis<32, 0, 0, 0> >& s, int size) {
data_mover:
for (int i = 0; i < size; i++) {
qdma_axis<32, 0, 0, 0> x;
x.data = mem[i];
x.keep_all();
s.write(x);
}
}
pl_mm2s.cpp
pl_mm2s.cpp也是利用HLS做的PL設計,用於從AIE Kernel搬移資料到記憶體。
void pl_s2mm(ap_int<32>* mem, hls::stream<qdma_axis<32, 0, 0, 0> >& s, int size) {
data_mover:
for (int i = 0; i < size; i++) {
qdma_axis<32, 0, 0, 0> x = s.read();
mem[i] = x.data;
}
}
經驗
AIE a2z 做得相當完善,基本可以順利完成。 在實驗過程中,可能遇到下列問題。
AXI Interrupt
建立平臺(Platform)時,AXI中斷控制器(axi_intc)沒有連線中斷源。Vitis編譯工程時,會連線HLS設計的IP模組的中斷輸出到AXI中斷控制器(axi_intc)。 如果驗證平臺(Platform)的Block Design時,Vivado會報告下列關於中斷控制器訊息,提示沒有中斷源,可以忽略。
[BD 41-759] The input pins (listed below) are either not connected or do not have a source port, and they don't have a tie-off specified. These pins are tied-off to all 0's to avoid error in Implementation flow.
Please check your design and connect them as needed:
/axi_intc_0/intr
sys_clk0
Vivado也會對輸入時鐘報告下列時鐘不匹配的訊息。Vivado建立Block Design時,預設的時鐘是100MHz。單板上的實際時鐘是200MHz。選中sys_clk0_0,在屬性中,把它更改為200MHz。
[xilinx.com:ip:axi_noc:1.0-1] /ps_nocClock frequency of the connected clock (/ps_noc/sys_clk0) is 100.000 MHz while "Input System Clock Frequency" is 200.000 MHz. Please either reconfigure the parameter "Input System Clock Period" of the axi_noc (in DDR Basic tab) or change frequency of the connected clock (CONFIG.FREQ_HZ) within the range of 199920031.987 to 200080032.013 Hz.
AIE license
如果Vitis編譯工程時,報告“AIE license not found”,請申請license。
AIE license not found !
/opt/Xilinx/Vitis/2020.2/aietools/bin/aieir_be: line 96: kill: (-28000) - No such process
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
>> aieir_be --time-passes=0 --trace-plio-width=64 --pl-freq=0 --use-real-noc=true --show-loggers=false --high-performance=false --kernel-address-location=false --target=x86sim --swfifo-threshold=40 --single-mm2s-channel=false --workdir=./Work --exit-after=complete --event-trace-config= --test-iterations=-1 --stacksize=1024 --platform=/proj/hankf/vck190/vck190_aie_a2z/vitis/base_pfm_vck190_aie_a2z/export/base_pfm_vck190_aie_a2z/base_pfm_vck190_aie_a2z.xpfm --event-trace-custom-config= --disable-dma-cmd-alignment=false --enable-ecc-scrubbing=false --write-partitioned-file=true --schemafile=AIEGraphSchema.json --include="/opt/Xilinx/Vitis/2020.2/aietools/include" --include="/opt/Xilinx/Vitis_HLS/2020.2/include" --include="../" --include="../src" --include="../data" --include="../src/kernels" --device= --write-unified-data=false --fastmath=false --event-trace-advanced-mapping=0 --log-level=1 --enable-reconfig=false --aiesim-xrt-api=false --gen-graph-cleanup=false --use-canonical-net-names=false --event-trace-port=plio --new-placer=true --use-phy-shim=true --xlopt=0 --pre-compile-kernels=false --validate-only=false --trace-aiesim-option=0 --aiearch=aie --mapped-soln-udm= --optimize-pktids=false --no-init=false --num-trace-streams=1 --aie-heat-map=false --phydevice= --exec-timed=0 --pl-auto-restart=false --routed-soln-udm= --enable-profiling=false --disable-transform-merge-broadcast=false --verbose=true --use-async-rtp-locks=true --repo-path= --genArchive=false --pl-axi-lite=false --new-router=true --aie-driver-v1=false --logcfg-file= --event-trace-bounding-box= --enable-reconfig-dma-autostart=false --heapsize=1024 --logical-arch= --nodot-graph=false --shim-constraints= --disable-dma-autostart=false --disable-transform-broadcast-split=true -json ./Work/temp/project.json -sdf-graph /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.cpp.
安裝dot
Vitis在編譯過程中,會用到工具dot。如果沒有安裝sudo apt install graphviz,會得到錯誤"sh: 1: dot: not foun"。 在Ubuntu 18.04下,如果有管理員許可權,使用命令“sudo apt install graphviz”能安裝dot。
DEBUG:MapperPartitioner: Adding Edge : Name=D_net2 SrcPort=i1_po0 DstPort=i3_pi0 EdgeType=mem
DEBUG:MapperPartitioner:Done--Add Double Buffer Edge SrcPort=i1_po0 DstPort=i3_pi0 type=mem Edge=net2:i1-(buf2)->i3
DEBUG:MapperPartitioner:Graph After Adding Double Edges
sh: 1: dot: not found
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
>> dot ./Work/reports/project.dot -Tpng -o ./Work/reports/project.png
.
Please check the output log for errors and fix those before you run the application.
/opt/Xilinx/Vitis/2020.2/aietools/bin/aieir_be: line 96: kill: (-44668) - No such process
ERROR: [aiecompiler 77-753] This application has discovered an exceptional condition from which it cannot recover while executing the following command
>> aieir_be --time-passes=0 --trace-plio-width=64 --pl-freq=0 --use-real-noc=true --show-loggers=false --high-performance=false --kernel-address-location=false --target=hw --swfifo-threshold=40 --single-mm2s-channel=false --workdir=./Work --exit-after=complete --event-trace-config= --test-iterations=-1 --stacksize=1024 --platform=/proj/hankf/vck190/vck190_aie_a2z/vitis/base_pfm_vck190_aie_a2z/export/base_pfm_vck190_aie_a2z/base_pfm_vck190_aie_a2z.xpfm --event-trace-custom-config= --disable-dma-cmd-alignment=false --enable-ecc-scrubbing=false --write-partitioned-file=true --schemafile=AIEGraphSchema.json --include="/opt/Xilinx/Vitis/2020.2/aietools/include" --include="/opt/Xilinx/Vitis_HLS/2020.2/include" --include="../" --include="../src" --include="../data" --include="../src/kernels" --device= --write-unified-data=false --fastmath=false --event-trace-advanced-mapping=0 --log-level=1 --enable-reconfig=false --aiesim-xrt-api=false --gen-graph-cleanup=false --use-canonical-net-names=false --event-trace-port=plio --new-placer=true --use-phy-shim=true --xlopt=0 --pre-compile-kernels=false --validate-only=false --trace-aiesim-option=0 --aiearch=aie --mapped-soln-udm= --optimize-pktids=false --no-init=false --num-trace-streams=1 --aie-heat-map=false --phydevice= --exec-timed=0 --pl-auto-restart=false --routed-soln-udm= --enable-profiling=false --disable-transform-merge-broadcast=false --verbose=true --use-async-rtp-locks=true --repo-path= --genArchive=false --pl-axi-lite=false --new-router=true --aie-driver-v1=false --logcfg-file= --event-trace-bounding-box= --enable-reconfig-dma-autostart=false --heapsize=1024 --logical-arch= --nodot-graph=false --shim-constraints= --disable-dma-autostart=false --disable-transform-broadcast-split=true -json ./Work/temp/project.json -sdf-graph /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.cpp.
軟體Emulation
執行軟體Emulation的時候,要選擇AIE工程,不選擇system project。如果選擇system project執行軟體Emulation,會出現下列錯誤。
Error while launching program:
The selected system project 'simple_application_system' contains applications (simple_application) that doesn't support launching software emulation.
The selected system project 'simple_application_system' contains applications (simple_application) that doesn't support launching software emulation.
硬體Emulation
先執行軟體Emulation,再執行硬體Emulation。
如果直接執行硬體Emulation,會出現下列錯誤。
Failed to start emulator on the project 'simple_application_system' using the build configuration 'Emulation-HW'.
Launch emulator script doesn't exist at location '/proj/hankf/vck190/vck190_aie_a2z_script_hw_prj/custom_pfm_vck190/vitis/simple_application_system/Emulation-HW/package/launch_hw_emu.sh'.
另外Vitis裡,先選擇AIE工程,再編譯AIE工程,然後去啟動硬體Emulation,選單裡可能沒有目標。編譯後,要重新選擇system project,再選擇AIE工程,再去啟動硬體Emulation,選單裡就會有目標。
A72軟體沒有ap_int.h
檔案mm2s.cpp和s2mm.cpp時給HLS設計用的,不能新增到A72的軟體工程裡。如果把它們加到了A72的軟體工程裡,會遇到錯誤“ap_int.h: No such file or directory”。
aarch64-none-elf-g++ -Wall -O0 -g3 -I"/opt/Xilinx/Vitis/2020.2/aietools/include"
-I"/proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src"
-I"/../include" -c -fmessage-length=0 -MT"src/mm2s.o" -mcpu=cortex-a72 -I/proj/hankf/vck190/vck190_aie_a2z/vitis/vck190_aie_a2z_aie_output_platform/export/vck190_aie_a2z_aie_output_platform/sw/vck190_aie_a2z_aie_output_platform/standalone_domain/bspinclude/include -MMD -MP -MF"src/mm2s.d" -MT"src/mm2s.o" -o "src/mm2s.o" "../src/mm2s.cpp"
../src/mm2s.cpp:33:10: fatal error: ap_int.h: No such file or directory
33 | #include <ap_int.h>
| ^~~~~~~~~~
A72軟體工程找不到simple(input_window, output_window )
A72軟體要控制AIE Kernel,需要相關資訊。因此預先把AIE工程編譯後產生的檔案“Hardware/Work/ps/c_rts/aie_control.cpp“,新增到 A72軟體工程。
如果忘記新增,可能會得到錯誤資訊,“undefined reference to `simple(input_window
aarch64-none-elf-g++ -L/opt/Xilinx/Vitis/2020.2/aietools/lib/aarchnone64.o -mcpu=cortex-a72 -Wl,-T -Wl,../src/lscript.ld -L/proj/hankf/vck190/vck190_aie_a2z/vitis/vck190_aie_a2z_aie_output_platform/export/vck190_aie_a2z_aie_output_platform/sw/vck190_aie_a2z_aie_output_platform/standalone_domain/bsplib/lib -o "aie_a2z_vck190_a72_ctrl_app.elf" ./src/main.o ./src/platform.o -ladf_api -Wl,--start-group,-lxil,-lgcc,-lc,-lstdc++,--end-group
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: ./src/main.o: in function `simpleGraph::simpleGraph()':
/proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:17: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:17: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:18: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
makefile:48: recipe for target 'aie_a2z_vck190_a72_ctrl_app.elf' failed
/opt/Xilinx/Vitis/2020.2/gnu/aarch64/lin/aarch64-none/x86_64-oesdk-linux/usr/bin/aarch64-xilinx-elf/aarch64-xilinx-elf-ld.real: /proj/hankf/vck190/vck190_aie_a2z/vitis/simple_application_vck190_aie_a2z/src/project.h:18: undefined reference to `simple(input_window<cint16>*, output_window<cint16>*)'
collect2.real: error: ld returned 1 exit status
make: *** [aie_a2z_vck190_a72_ctrl_app.elf] Error 1
Package
編譯A72程式後,要編譯system project,將所有模組打包再一起。這時候,要根據04-ps_application_creation_run_all.md的Step 3. Build the Full System,新增打包選項,“--package.ps_elf ../../A-to-Z_app/Debug/A-to-Z_app.elf,a72-0 --package.defer_aie_run”。
如果沒有新增,會報告錯誤“no xclbin input is found”。
Package step cannot be performed since the platform has a VPP link generated XSA and no xclbin input is found. Please provide a valid xclbin location in system project package options
11:21:21 Build Finished (took 646ms)