Stratkit Fog Of War¶
Fog of war module introduces concept of Fog Of War(FoW) and FoW removers. FoW covers the whole map to simulate visibility for the player.
Main Fog Of War prefab is placed on the terrain scene. Size and scaling values of that plain matter since they directly affect FoW in the game.
On the technical side we have two different types of removers: Mesh and radius removers.
Mesh removers represent static objects (usually provinces) that reveal map area based on their mesh. Every mesh FoW remover has to be tagged with MeshFogOfWarRemoverTag and contain a buffer of FogOfWarTriangles, that represent mesh vertex positions in world space.
Radius removers are dynamic and reveal area based on their position and radius (usually armies). They don't need to have anything in the buffer since we reveal area around them.
FoW concept and FogOfWarShaderRefreshSystem in general are independent from provinces and armies, it just works with removers. That means, that every object on the map can have one of the remover components on it and it should work. Other systems in the module are dealing with adding and removing these remover components.
The main system (FogOfWarShaderRefreshSystem) runs only if FogOfWarDoRefreshTag is attached to main Fog Of War entity. Right now only one FoW entity is supported, all necessary components to it are attached in FogOfWarAuthoring script that is attached to FoW plane prefab.
With the questions about internal compute shader structure please contact Aleksandra Rutkowska.
Actual drawing of the shader can be described as process of remapping and replicating of all excluded FoW areas to the render textures and applying it to the compute shader.
Fog of War design¶
- For each army on screen, we need to render two circles, the inner circle with full alpha, the the outer with some alpha.
- The circles overlap but shouldn't be additive.
- The circles follow the surface of what it touches, such as terrain, enemies armies, buildings, props, etc...
Previous Implementation¶
flowchart LR
GatherArmies{Gather Army} --> ArmyArray
ArmyArray --> PixelShader{Pixel Shader}
ScreenWidth --> PixelShader
ScreenHeight --> PixelShader
The total cost is Army_Count X Screen_Width X Screen_Height
. This is not scalable for late game
- Profiling result shows 18%
GPU cost for early-game on Mac build
Binary Space Partitioning¶
GPU cost estimation for late game¶
- Army count is 200
- Screen width is 2000 (2556 for iPhone 15)
- Screen height is 1000 (1179 for iPhone 15)
- Total cost =
400 million
.
Algorithm reasoning¶
- The screen size is significantly larger than army size
- The screen should be divided to smaller parts
- Each part should only be covered by a small amount of armies
Algorithm cost estimation¶
- The cost of 1-step BSP:
- Army count 100
- Screen size: 1000x1000
- Cost:
(100 * 1000 x 1000) x 2 = 200 mil
- The cost of 2-step BSP:
- Army count 50
- Screen size: 1000x500
- Cost:
(50 x 1000 x 500) x 4 = 100 mil
- 3-step:
(25 x 500 x 500) x 8 = 50 mil
- The cost is
exponentially
reduced for every division step
Algorithm steps¶
flowchart LR
GatherArmies{Gather Army} --> ArmyArray
ScreenWidth --> ArmyQuad
ScreenHeight --> ArmyQuad
ArmyArray --> ArmyQuad
ArmyQuad --> BSP{BSP}
BSP --> IsArmyLarge{Is Army Count\nlarger than X}
IsArmyLarge -- no --> AddQuadToList{Add to\nquad list}
IsArmyLarge -- yes --> DivideByHalf{Divide\nby half}
DivideByHalf --> ArmyQuadA
DivideByHalf --> ArmyQuadB
ArmyQuadA --> BSP
ArmyQuadB --> BSP
AddQuadToList --> QuadList
QuadList --> PixelShader{Pixel Shader}
flowchart LR
subgraph DivideByHalf
DivisionPlane
subgraph ForEachArmy
Army -- check --> DivisionPlane
DivisionPlane -- positive --> QuadA
DivisionPlane -- negative --> QuadB
DivisionPlane -- middle --> BothQuads
end
end
Burstify and Jobify BSP¶
Challenges¶
- Bursted code does not support recursion
- No nested container allowed in Jobs
- 3D Geometric math
3D Geometric Math¶
- Check the view circle is on which side of the split
- Check the distance from camera-ray to circle-center against the split plane
- DOTS-enable types are added
- DotsCamera
- DotsRay
- DotsPlane
Remove recursion¶
- Luckily, BSP is a tail-recursion
flowchart LR End{End} ResultList subgraph Loop IsInputEmpty{Is Input\nEmpty} -- yes --> End IsInputEmpty -- no --> InputQuadList InputQuadList --> BSP{BSP} BSP -- divide --> OutputQuadList BSP -- addTo --> ResultList end subgraph Swap InputQuadList2[InputQuadList] OutputQuadList2[OutputQuadList] InputQuadList2 --> OutputQuadList2 OutputQuadList2 --> InputQuadList2 end Loop --> Swap Swap --> Loop class InputQuadList,OutputQuadList,InputQuadList2,OutputQuadList2 important; classDef important fill:red
Parallelization¶
flowchart LR
subgraph Input
InputQuadList
Index
end
subgraph Output
OutputQuadList
ResultQuadList
end
OutParallelWriter --> OutputQuadList
ResultParallelWriter --> ResultQuadList
subgraph IJobParallelFor
InputQuadList --> InputQuad
Index --> InputQuad
InputQuad --> BSP
BSP -- divide --> OutParallelWriter
BSP -- addTo --> ResultParallelWriter
end
Schedule --> IJobParallelFor --> Swap
Swap --> Schedule
Work around nested container¶
- Unfortunately,
QuadList
is a nested container - Nested container can't be used in Jobs
- Nested container can be flatten using
Slice
flowchart LR subgraph QuadList Viewport ArmyList end subgraph QuadList2[QuadList] Viewport2[Viewport] subgraph ArmySlice StartIndex Length end end QuadList -- refactor --> QuadList2 ArmySlice --> ArmyList2 ArmyList2[ArmyList]
- Slice doesn't support
ParallelWriter
Parallel write¶
- Use attribute
NativeDisableParallelForRestriction
to allow parallel writing to native container - Different Job needs to write to different index of the container
flowchart LR subgraph RaceCondition JobX -- write --> IndexX JobY -- write --> IndexX IndexY end subgraph ShouldDo JobA -- write --> IndexA JobB -- write --> IndexB end style RaceCondition fill:red
- Race condition
- Won't give compile errors
- Won't crash
- Can't be debugged
- Can only be avoided by theoretically check all possibilities of the code
- Easier to manifest on larger set of data
Parallel read and write¶
- Use an additional scary attribute
NativeDisableContainerSafetyRestriction
- Race condition on read & write is possible due to different speed of thread
- Each thread has different workload
- Each thread could be executed by different type of physical core
Further improvements¶
Early exit¶
- When reach enough recursion depth, we will get diminishing result
- Early exit even though the amount of armies are still large
- Use a version of shader that could accept a larger army size
Empty quad¶
- When army size is zero
- Use a shader that simply draw the pixel out!
Quad with higher unit count¶
- These quads are normally small visually
- Sort armies by distance to quad center
- Discard the armies that exceed the shader's army size
Move allocations out of critical code path¶
Allocator.TempJob
is required by JobsAllocator.TempJob
is expensive- We could use
Allocator.Permanent
instead - Even though it is more expensive, it's used only once
- Allocate with the largest supported size
Complete coverage¶
- The result quad list could be filtered
- If one quad in the list cover the entire viewport, we could discard it and render nothing!
Filter out armies outside of screen¶
- Check each army circle against the main viewport
- This helps when zooming in
Entity Graphics to batch draw command¶
- TO BE CONTINUED