Skip to content

Commit

Permalink
Update samples for DirectStorage 1.1
Browse files Browse the repository at this point in the history
Added GpuDecompressionBenchmark, to measure performance and demonstrate the effects of
varying staging buffer sizes.

BulkLoadDemo (replaces the MiniEngine/ModelViewer sample) demonstrates loading
assets using GPU decomrpession.
  • Loading branch information
damyanp committed Nov 7, 2022
1 parent 71f940a commit 714d15a
Show file tree
Hide file tree
Showing 765 changed files with 6,377 additions and 36,157 deletions.
166 changes: 166 additions & 0 deletions .clang-format
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
Language: Cpp
# BasedOnStyle: Microsoft
AccessModifierOffset: -2
AlignAfterOpenBracket: Align
AlignConsecutiveMacros: None
AlignConsecutiveAssignments: None
AlignConsecutiveBitFields: None
AlignConsecutiveDeclarations: None
AlignEscapedNewlines: Right
AlignOperands: Align
AlignTrailingComments: true
AllowAllArgumentsOnNextLine: true
AllowAllConstructorInitializersOnNextLine: true
AllowAllParametersOfDeclarationOnNextLine: true
AllowShortEnumsOnASingleLine: false
AllowShortBlocksOnASingleLine: Never
AllowShortCaseLabelsOnASingleLine: false
AllowShortFunctionsOnASingleLine: None
AllowShortLambdasOnASingleLine: All
AllowShortIfStatementsOnASingleLine: Never
AllowShortLoopsOnASingleLine: false
AlwaysBreakAfterDefinitionReturnType: None
AlwaysBreakAfterReturnType: None
AlwaysBreakBeforeMultilineStrings: false
AlwaysBreakTemplateDeclarations: MultiLine
AttributeMacros:
- __capability
BinPackArguments: true
BinPackParameters: true
BraceWrapping:
AfterCaseLabel: false
AfterClass: true
AfterControlStatement: Always
AfterEnum: true
AfterFunction: true
AfterNamespace: true
AfterObjCDeclaration: true
AfterStruct: true
AfterUnion: false
AfterExternBlock: true
BeforeCatch: true
BeforeElse: true
BeforeLambdaBody: false
BeforeWhile: false
IndentBraces: false
SplitEmptyFunction: true
SplitEmptyRecord: true
SplitEmptyNamespace: true
BreakBeforeBinaryOperators: None
BreakBeforeConceptDeclarations: true
BreakBeforeBraces: Custom
BreakBeforeInheritanceComma: false
BreakInheritanceList: BeforeColon
BreakBeforeTernaryOperators: true
BreakConstructorInitializersBeforeComma: false
BreakConstructorInitializers: BeforeColon
BreakAfterJavaFieldAnnotations: false
BreakStringLiterals: true
ColumnLimit: 120
CommentPragmas: '^ IWYU pragma:'
CompactNamespaces: false
ConstructorInitializerAllOnOneLineOrOnePerLine: false
ConstructorInitializerIndentWidth: 4
ContinuationIndentWidth: 4
Cpp11BracedListStyle: true
DeriveLineEnding: true
DerivePointerAlignment: false
DisableFormat: false
EmptyLineBeforeAccessModifier: LogicalBlock
ExperimentalAutoDetectBinPacking: false
FixNamespaceComments: true
ForEachMacros:
- foreach
- Q_FOREACH
- BOOST_FOREACH
StatementAttributeLikeMacros:
- Q_EMIT
IncludeBlocks: Preserve
IncludeCategories:
- Regex: '^"(llvm|llvm-c|clang|clang-c)/'
Priority: 2
SortPriority: 0
CaseSensitive: false
- Regex: '^(<|"(gtest|gmock|isl|json)/)'
Priority: 3
SortPriority: 0
CaseSensitive: false
- Regex: '.*'
Priority: 1
SortPriority: 0
CaseSensitive: false
IncludeIsMainRegex: '(Test)?$'
IncludeIsMainSourceRegex: ''
IndentCaseLabels: false
IndentCaseBlocks: false
IndentGotoLabels: true
IndentPPDirectives: None
IndentExternBlock: AfterExternBlock
IndentRequires: false
IndentWidth: 4
IndentWrappedFunctionNames: false
InsertTrailingCommas: None
JavaScriptQuotes: Leave
JavaScriptWrapImports: true
KeepEmptyLinesAtTheStartOfBlocks: true
MacroBlockBegin: ''
MacroBlockEnd: ''
MaxEmptyLinesToKeep: 1
NamespaceIndentation: None
ObjCBinPackProtocolList: Auto
ObjCBlockIndentWidth: 2
ObjCBreakBeforeNestedBlockParam: true
ObjCSpaceAfterProperty: false
ObjCSpaceBeforeProtocolList: true
PenaltyBreakAssignment: 2
PenaltyBreakBeforeFirstCallParameter: 19
PenaltyBreakComment: 300
PenaltyBreakFirstLessLess: 120
PenaltyBreakString: 1000
PenaltyBreakTemplateDeclaration: 10
PenaltyExcessCharacter: 1000000
PenaltyReturnTypeOnItsOwnLine: 1000
PenaltyIndentedWhitespace: 0
PointerAlignment: Right
ReflowComments: true
SortIncludes: true
SortJavaStaticImport: Before
SortUsingDeclarations: true
SpaceAfterCStyleCast: false
SpaceAfterLogicalNot: false
SpaceAfterTemplateKeyword: true
SpaceBeforeAssignmentOperators: true
SpaceBeforeCaseColon: false
SpaceBeforeCpp11BracedList: false
SpaceBeforeCtorInitializerColon: true
SpaceBeforeInheritanceColon: true
SpaceBeforeParens: ControlStatements
SpaceAroundPointerQualifiers: Default
SpaceBeforeRangeBasedForLoopColon: true
SpaceInEmptyBlock: false
SpaceInEmptyParentheses: false
SpacesBeforeTrailingComments: 1
SpacesInAngles: false
SpacesInConditionalStatement: false
SpacesInContainerLiterals: true
SpacesInCStyleCastParentheses: false
SpacesInParentheses: false
SpacesInSquareBrackets: false
SpaceBeforeSquareBrackets: false
BitFieldColonSpacing: Both
Standard: Latest
StatementMacros:
- Q_UNUSED
- QT_REQUIRE_VERSION
TabWidth: 4
UseCRLF: false
UseTab: Never
WhitespaceSensitiveMacros:
- STRINGIZE
- PP_STRINGIZE
- BOOST_PP_STRINGIZE
- NS_SWIFT_NAME
- CF_SWIFT_NAME
...

2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@
*.sln eol=crlf

# Explicitly declare resource files as binary
*.bin binary
*.bin binary
175 changes: 165 additions & 10 deletions Docs/DeveloperGuidance.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,45 @@ As with D3D12, the DirectStorage API is a "nano-COM" API. The following interfa
* IDStorageQueue - all operations are enqueued on a queue
* IDStorageStatusArray - an object that can store results of operations
* IDStorageCustomDecompressionQueue - if the game uses custom CPU decompression, then this is how DirectStorage communicates with the game.
* IDStorageCompressionCodec - an object that can be used to compress/decompress buffers using built-in compression formats.

### Creating instances of DirectStorage interfaces

DStorageGetFactory() can be used to return:
* IDStorageFactory
* IDStorageCustomDecompressionQueue / IDStorageCustomDecompressionQueue1

DStorageCreateCompressionCodec() can be used to return:
* IDStorageCompressionCodec

Example:

```cpp
IDStorageFactory* factory;
HRESULT hr = S_OK;
hr = DStorageGetFactory(IID_PPV_ARGS(&factory));

IDStorageCompressionCodec* compression;
constexpr uint32_t DEFAULT_THREAD_COUNT = 0;
hr = DStorageCreateCompressionCodec(DSTORAGE_COMPRESSION_FORMAT_GDEFLATE, DEFAULT_THREAD_COUNT, IID_PPV_ARGS(&compression));
```
using cppwinrt:
```cpp
com_ptr<IDStorageFactory> factory;
check_hresult(DStorageGetFactory(IID_PPV_ARGS(factory.put())));

com_ptr<IDStorageCompressionCodec> compression;
constexpr uint32_t DEFAULT_THREAD_COUNT = 0;
check_hresult(DStorageCreateCompressionCodec(DSTORAGE_COMPRESSION_FORMAT_GDEFLATE, DEFAULT_THREAD_COUNT, IID_PPV_ARGS(compression.put())));
```
A IDStorageFactory instance can be retrieved using the DStorageGetFactory() function.
## Queues and Operations
An application enqueues operations onto an IDStorageQueue. There are three types of operations:
* "Request" - load some data from a file
An application enqueues operations onto an IDStorageQueue. There are four types of operations:
* "Request" - load some data from a file or memory
* "Signal" - signal an ID3D12Fence
* "Status" - record the status of the last batch
* "SetEvent" - sets an event object to a signaled state
Each enqueued operation takes up on slot in the queue, with the number of slots being specified at creation time. This slot remains in use until the operation has completed. If Enqueue is called when there are no free slots then it will block until a slot becomes available.
Expand Down Expand Up @@ -48,16 +79,100 @@ As for texture regions, the data is expected to be laid out as described by GetC
This destination type is for populating a region of tiles in a tiled resource. This takes a resource and a region (in the form of a start coordinate and size). The data is expected to be arranged as suitable for passing to CopyTiles().
## Compressed Assets
The data that is read for each request must be possible to decompress in its entirety. This means that it is not, in general, possible to apply a compression algorithm over a complete file, but instead each section of the file must be compressed in isolation. See the [MiniEngine (with DirectStorage support)](../Samples/MiniEngine/DirectStorage/README.md) sample for details of how a file containing data to be read by multiple requests might be arranged.

## Staging Buffers and Copying

![Staging Buffer Diagram](stagingbufferdiagram.png)
The data that is read for each request must be possible to decompress in its entirety. This means that it is not, in general, possible to apply a compression algorithm over a complete file, but instead each section of the file must be compressed in isolation. See the [Bulk Load Demo](../Samples/BulkLoadDemo/README.md) sample for details of how a file containing data to be read by multiple requests might be arranged.
## Uncompressed Data Flow
![Staging Buffer Diagram](uncompressed.png)
DirectStorage maintains two staging buffers - one system memory buffer, and another in an upload heap. The above diagram shows the possible ways the data for a request can flow between staging buffers.
DirectStorage requires that the uncompressed data for a request can fit into a staging buffer. Requests that need to go through a staging buffer with an uncompressed size that is larger than the staging buffer size will fail. The staging buffer can be resized with the SetStagingBufferSize() method on the factory.
## Custom Compressed Data Flow
![Staging Buffer Diagram](customcompression.png)
Custom decompression always reads the compressed data into a system memory staging buffer which will be used as a source buffer by the title's supplied decompression logic. The decompressed result is placed into an upload heap if the destination is VRAM or directly into the caller's buffer for memory destinations.
## Compressed Data Flow
![Staging Buffer Diagram](compression.png)
DirectStorage supports decompressing built-in formats (example: DSTORAGE_COMPRESSION_FORMAT_GDEFLATE) using the GPU which frees up the CPU for other tasks.
DirectStorage maintains two additional staging buffers in VRAM to coordinate the GPU decompression workload. Each of these staging buffers is allocated to the size set via IDStorageFactory::SetStagingBufferSize(). The above diagram shows the possible ways the data for a request can flow between staging buffers.
* "Input Staging Buffer" - source buffer filled with compressed data.
* "Output Staging Buffer" - destination buffer for the resulting uncompressed data.
## Choosing a Staging Buffer size
Choosing a good Staging Buffer size is key to getting the best performance out of DirectStorage. A too small of a size could greatly reduce performance because requests will end up waiting for staging memory to become available before being able to be processed. Choosing a too large of a size may take away from your application's rendering budget.
Setting a staging buffer size is done by calling:
```cpp
IDStorageFactory::SetStagingBufferSize(UINT32 size)
```

**Important:** Even though dstorage.h contains an enum called **DSTORAGE_STAGING_BUFFER_SIZE** you are __not__ limited to 32MB! The enum is providing a small set of _common_ sizes.

**Note:** Performance diagrams like the one shown below can be generated on your system by building and running the [GpuDecompressionBenchmark Sample](../Samples/GpuDecompressionBenchmark/README.md)

The following diagram shows how staging buffer sizes directly impact IO bandwidth.

![Staging Buffer Size vs Bandwidth](../Samples/GpuDecompressionBenchmark/stagingbuffersizevsbandwidth.png)

The following diagram shows how staging buffer sizes directly impact CPU usage.

![Staging Buffer Size vs Process Cycles](../Samples/GpuDecompressionBenchmark/stagingbuffersizevsprocesscycles.png)


## Getting Started Compressing Content for GPU Decompression
DirectStorage provides a compression codec interface IDStorageCompressionCodec which is used for general purpose compression/decompression and is obtained by calling DStorageCreateCompressionCodec( ).

IDStorageCompressionCodec has the following methods: ( See the dstorage.h header for more details. )

```cpp
size_t CompressBufferBound(size_t uncompressedDataSize)

HRESULT CompressBuffer(
const void* uncompressedData,
size_t uncompressedDataSize,
DSTORAGE_COMPRESSION compressionSetting,
void* compressedBuffer,
size_t compressedBufferSize,
size_t* compressedDataSize)

HRESULT DecompressBuffer(
const void* compressedData,
size_t compressedDataSize,
void* uncompressedBuffer,
size_t uncompressedBufferSize,
size_t* uncompressedDataSize)
```
The following snippet is an example on how to compress content using IDStorageCompressionCodec:
using cppwinrt:
```cpp
com_ptr<IDStorageCompressionCodec> compression;
constexpr uint32_t DEFAULT_THREAD_COUNT = 0;
check_hresult(DStorageCreateCompressionCodec(DSTORAGE_COMPRESSION_FORMAT_GDEFLATE, DEFAULT_THREAD_COUNT, IID_PPV_ARGS(compression.put())));
// Read uncompressed content into a buffer
std::vector<uint8_t> uncompressedContent = ReadUncompressedContent(...);
// Allocate a compressed buffer by calling CompressBound to get a supported size
auto bound = compression->CompressBufferBound(uncompressedContent.size());
std::vector<uint8_t> compressedContent;
compressedContents.resize(static_cast<size_t>(bound));
// Compress content
size_t compressedContentSize = 0;
check_hresult(compression->CompressBuffer(uncompressedContent.data(),
uncompressedContent.size(),
DSTORAGE_COMPRESSION_DEFAULT,
compressedContent.data(),
compressedContent.size(),
&compressedContentSize));
```

## Runtime Configuration
The default behavior of DirectStorage aims to provide the best performance on the system it is running on. However, there are cases where games may want to change this behavior.

Expand All @@ -69,14 +184,23 @@ A DSTORAGE_CONFIGURATION structure is passed to DStorageSetConfiguration(). The
### NumSubmitThreads
Submitting IO requests can sometimes take a long time. To enable the DirectStorage worker thread to do other work during this time, DirectStorage uses a separate submission thread. By default DirectStorage uses 1 submission thread. When running on Windows 10 it may be desirable to allow DirectStorage to use more submission threads to achieve a higher bandwidth/request count(by using additional CPU time).

### NumBuiltInCpuDecompressionThreads
DirectStorage will always use CPU decompression for DSTORAGE_REQUEST_DESTINATION_MEMORY requests.

This can be used to specify the maximum number of threads the runtime will use. Specifying 0 means to use the system's best guess at a good value.

Specifying DSTORAGE_DISABLE_BUILTIN_CPU_DECOMPRESSION means no decompression threads will be created and the title is fully responsible for performing the decompression. To do this, use IDStorageCustomDecompressionQueue1::GetRequests1() with the DSTORAGE_GET_REQUEST_FLAG_SELECT_BUILTIN or DSTORAGE_GET_REQUEST_FLAG_SELECT_ALL flag.

### ForceMappingLayer and DisableBypassIO
During development it may be useful to force DirectStorage to only make use of the Windows 10 I/O stack. Forcing the mapping layer to be used and toggling off BypassIO can be done to achieve this.
During development it may be useful to force DirectStorage to only make use of the Windows 10 I/O stack. Forcing the mapping layer to be used and toggling of BypassIO can be done to achieve this.

# Best Practices
The recommendations and best practices can be grouped into the following list of Do's and Don'ts when using DirectStorage.

## Do's
The pipeline model along with the use of notifications leads to several Do's for how to use DirectStorage.
* Choose a large enough staging buffer to ensure that you get the optimal IO bandwidth.
* Looking at your individual request sizes and amount of requests can help with choosing a good value.
* Submit as many requests at a time as you can to DirectStorage
* The only limit on the number of requests in flight is the size of the queues the title creates.
* Submit requests in batches
Expand All @@ -87,7 +211,9 @@ The pipeline model along with the use of notifications leads to several Do's for
* Size your queues correctly
* There is a significant penalty when a read is enqueued and the queue is full. The Enqueue(Request/Status/Signal) functions suspend the thread until a slot becomes available. The suspension could easily be several milliseconds.
* The recommendation is ~2x your expected maximum number of elements in the queue at a single point in time. This allows enough buffer space to handle possible variations in timing.
* Remember as soon as a read is completed its slot is available for a new request.
* Remember as soon as a request is completed its slot is available for a new request.
* If built-in formats requiring CPU decompression are being decompressed by your own job system, always use GetRequests1 to ensure that these requests get serviced.
* Specifying DSTORAGE_GET_REQUEST_FLAG_SELECT_ALL will allow your system to get both built-in and custom formats in a single call which could be more efficient.

## Dont's
Win32 and DirectStorage have slightly different usage cases. Some patterns that were common in Win32 will negatively impact performance in DirectStorage.
Expand All @@ -111,4 +237,33 @@ Win32 and DirectStorage have slightly different usage cases. Some patterns that
* Win32 has a restriction that asynchronous requests be aligned on a 4-KiB boundary and be a multiple of 4-KiB in size.
* DirectStorage does not have a 4-KiB alignment or size restriction. This means you don't need to pad your data which just adds extra size to your package and internal buffers.

# SDK Path

By default, DirectStorage expects `dstoragecore.dll` to be in the same folder as the game executable.

If you want to place the dll in another folder, you will need to specify the path to the dll by setting the symbol `DStorageSDKPath` to the path and exporting it in your exe.

For example, if your dll is stored in the folder `.\DirectStorage`, here are two methods you could use:

## Method A: __declspec(dllexport) keyword

You can export the constant in your code through the `__desclspec(dllexport)` keyword:

```
extern "C" {
__declspec(dllexport) extern const char* DStorageSDKPath = u8".\\DirectStorage\\";
}
```

## Method B: Module-definition file

You can also export the constant via a `.def` file:
```
EXPORTS
DStorageSDKPath DATA PRIVATE
```

And then declare the constant in code:
```
extern "C" extern LPCSTR DStorageSDKPath = ".\\DirectStorage\\";
```
Binary file added Docs/compression.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Docs/customcompression.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 714d15a

Please sign in to comment.