1. 程式人生 > >[工作積累] UE4 並行渲染的同步 - Sync between FParallelCommandListSet & FRHICommandListImmediate calls

[工作積累] UE4 並行渲染的同步 - Sync between FParallelCommandListSet & FRHICommandListImmediate calls

nco sets pan commit rrh setup event dia 自己

UE4 的渲染分為兩個模式1.編輯器是同步繪制的 2.遊戲裏是FParallelCommandListSet並行派發的。

mesh渲染也分兩類,static mesh 使用TStaticMeshDrawList 來繪制, skinned mesh是用DrawingPolicyFactory::DrawDynamicMesh來畫。這兩類繪制不管是異步還是同步都會調用。具體可以參考DepthRendering.cpp

實際上,有在DX12/Vulkan/Metal 這些支持paralle commit的API上才會真正並行派發,否則GRHIThread為nullptr,還是在最後某個時刻把所有Task阻塞式提交的,Task的執行順序不確定,但不是並發的。

一般來說每個DrawList的DrawVisibleParallel會自己創建一個Task, DrawDynamicMesh都是自己創建的Task。這些task按不確定的順序執行,因為有pre depth或者depth buffer,所以亂序繪制沒有問題,只有半透明物體需需要按順序繪制,所以只有一個DrawList,對應一個Task。Task內部的繪制都是按先後順序的。

工作中遇到的問題:

目前自定義的部分流程是

1.draw objects A (parallel)

2.copy scene color (immediate)

3.draw objects B (parallel)

其中第二部必須在第一步結束之後才能開始。因為使用了FParallelCommandListSet, 並仿照DeferredShadingRenderer.cpp 裏, 在向CommandList裏添加調用以後,使用了ServiceLocalQueue() 來同步。

代碼:

 1 class FCustomPassDynamicDataThreadTask : public FRenderTask {...};
 2 class FCustomParallelCommandListSet : public FParallelCommandListSet {...};
 3 
 4
... 5 //Step 1 6 FCustomParallelCommandListSet ParallelSet(View, RHICmdList, true, CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() == 0); 7 //Render Static Mesh 8 Scene->CustomDrawList.DrawVisibleParallel(ParallelSet.View.StaticMeshVisibilityMap, ParallelSet.View.StaticMeshBatchVisibility, ParallelSet); 9 10 // Render dynamic mesh 11 FRHICommandList* CmdList = ParallelSet.NewParallelCommandList(); 12 FGraphEventRef AnyThreadCompletionEvent = TGraphTask<FCustomPassDynamicDataThreadTask>::CreateTask(ParallelSet.GetPrereqs(), ENamedThreads::RenderThread) 13 .ConstructAndDispatchWhenReady(*this, *CmdList, View, ParallelSet.DrawRenderState); 14 ParallelSet.AddParallelCommandList(CmdList, AnyThreadCompletionEvent); 15 16 ServiceLocalQueue(); 17 18 //Step 2 19 RHICmdList.CopyToResolveTarget(...); // or CopySubTextureRegion 20 ...

發現並不能同步結果, 第二步復制的SceneColor裏並沒有第一步繪制的物體。

仔細查看代碼,發現FParallelCommandListSet的dispatch都是在析構函數裏執行的,比如DepthRendering.cpp 繪制depth pre pass: 而我的FCustomParallelCommandListSet也是(部分)抄他的(他註釋裏說了不要無腦復制粘貼),類似。

DepthRendering.cpp:

 1 class FPrePassParallelCommandListSet : public FParallelCommandListSet
 2 {
 3 public:
 4     FPrePassParallelCommandListSet(const FViewInfo& InView, FRHICommandListImmediate& InParentCmdList, bool bInParallelExecute, bool bInCreateSceneContext)
 5         : FParallelCommandListSet(GET_STATID(STAT_CLP_Prepass), InView, InParentCmdList, bInParallelExecute, bInCreateSceneContext)
 6     {
 7         // Do not copy-paste. this is a very unusual FParallelCommandListSet because it is a prepass and we want to do some work after starting some tasks
 8     }
 9 
10     virtual ~FPrePassParallelCommandListSet()
11     {
12         // Do not copy-paste. this is a very unusual FParallelCommandListSet because it is a prepass and we want to do some work after starting some tasks
13         SetStateOnCommandList(ParentCmdList);
14         Dispatch(true);
15     }
16 
17     virtual void SetStateOnCommandList(FRHICommandList& CmdList) override
18     {
19         FParallelCommandListSet::SetStateOnCommandList(CmdList);
20         FSceneRenderTargets::Get(CmdList).BeginRenderingPrePass(CmdList, false);
21         SetupPrePassView(CmdList, View, DrawRenderState);
22     }
23 };

也就是說,需要FCustomParallelCommandListSet()析構以後,同步才有用,否則的話,還沒有任務dispatch,sync什麽。於是代碼修改如下:

 1 class FCustomPassDynamicDataThreadTask : public FRenderTask {...};
 2 class FCustomParallelCommandListSet : public FParallelCommandListSet {...};
 3 
 4 ...
 5 //Step 1
 6 //Note: the local scope is necessary because FCustomParallelCommandListSet dispatches in dector.
 7 {
 8     FCustomParallelCommandListSet ParallelSet(View, RHICmdList, true, CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() == 0);
 9     //Render Static Mesh
10     Scene->CustomDrawList.DrawVisibleParallel(ParallelSet.View.StaticMeshVisibilityMap, ParallelSet.View.StaticMeshBatchVisibility, ParallelSet);
11 
12     // Render dynamic mesh
13     FRHICommandList* CmdList = ParallelSet.NewParallelCommandList();
14     FGraphEventRef AnyThreadCompletionEvent = TGraphTask<FCustomPassDynamicDataThreadTask>::CreateTask(ParallelSet.GetPrereqs(), ENamedThreads::RenderThread)
15         .ConstructAndDispatchWhenReady(*this, *CmdList, View, ParallelSet.DrawRenderState);
16     ParallelSet.AddParallelCommandList(CmdList, AnyThreadCompletionEvent);
17 }
18 
19 ServiceLocalQueue();
20 RHICmdList.ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);
21 
22 //Step 2
23 RHICmdList.CopyToResolveTarget(...); //or CopySubTextureRegion 
24 ...

是的,就是加了個花括號,問題就解決了一半。

另外,第二部Copy SceneColor的時候,可能在其他一個線程的Command還沒完全派發到GPU,如果不同步的話,復制出來的SceneColor copy,在采樣時會閃爍。

然而使用了ServiceLocalQueue()以後,結果仍然不正確。這樣以來ServiceLocalQueue()的意義感覺不明 - 並不是在等待task執行結束。但可以確定的是DrawVisibleParallel/DrawDynamicMesh使用的異步模式,而傳入的RHICmdList是FRHICommandListImmediate,也就是立即執行的,兩種方式肯定需要同步。

既然ServiceLocalQueue()和預想的等待或者同步不同,所以嘗試在ServiceLocalQueue()後面加上

RHICmdList.ImmediateFlush(EImmediateFlushType::DispatchToRHIThread);

結果才正確。

還有一個方式就是把一系列DrawVisibleParallel/DrawDynamic和CopyResovleTarget放在一個Task裏,因為Task的內部執行是按順序的,不需要同步,但是只有一個Task,在支持並行發射Command的GPU下就沒有並發了。而且每個DrawVisibleParallel會創建一個Task,需要把這些所有操作合並到一個task裏,具體沒有試過。

如果把CopyResolveTarget放到另外一個Task,使用異步模式,結果也是不對的。雖然這些Task在非DX12/Vulkan/Metal下是非並發的,按順序的,但是執行順序是不確定的。

至於並發+同步開銷大還是單一task效率更高,依賴於draw call的數量,具體需要profiling。

[工作積累] UE4 並行渲染的同步 - Sync between FParallelCommandListSet & FRHICommandListImmediate calls