# Bringing HLSL ray tracing shaders to Metal

Convert HLSL ray tracing pipeline to Metal IR using the Metal shader converter 

## Overview  
This sample code project demonstrates a DXR 1.0 tier ray tracing pipeline containing ray generation, closest-hit, any-hit, and miss HLSL shaders, tracing rays against an instance acceleration structure that contains triangles.
The sample generates a metal compute pipeline for tracing rays, and a metal render pipeline to present the result to the screen. 
It demonstrates how to set up the ray generation dispatch call and how to emulate DXR Shader Tables with Metal Intersection Function Buffer (IFB) support in Metal Shader Converter 3.0.

## Configure the sample code project
This project depends on **Metal Shader Converter 3.0** or later. It searches for header files in `/usr/local/include` and the dynamic library under `/usr/local/lib/`. It also relies on the **metal-shaderconverter** command line tool to convert DXIL to Metal IR. 

### Install the DirectXShaderCompiler (Optional)
If the DirectXShaderCompiler command line tool, `dxc`, is available in your path, the sample project recompiles the HLSL files to DXIL.

Follow these steps to clone and build `dxc` on your device:

```bash
git clone https://github.com/microsoft/DirectXShaderCompiler.git
cd DirectXShaderCompiler
git submodule update --init --recursive
mkdir build
cd build
cmake .. -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -C ../cmake/caches/PredefinedParams.cmake -DCMAKE_OSX_ARCHITECTURES="x86_64;arm64"
make -j8
```

After the build process completes, copy the files `bin/dxc` and `lib/libdxcompiler.dylib` to `/usr/local/bin/` and `/usr/local/lib` respectively, and create appropriate symlinks to each if required.

```bash
sudo ditto bin/dxc-3.7 /usr/local/bin/
sudo ditto lib/libdxcompiler.dylib /usr/local/lib/
sudo ln -s /usr/local/bin/dxc-3.7 /usr/local/bin/dxc
```

## Build-time 
At build time, the project compiles all HLSL shaders to DXIL (DirectX intermediate language), copying the results in the application bundle. 
Only DXIL bytecode intended for presenting the ray tracing result to screen (i.e. rendering) are immediately converted to a MetalLib binary using the `metal-shaderconverter` command line tool. 
See the Xcode project's 'Build Phases:
* Compile HLSL to DXIL
* Convert DXIL to MetalLib
The ray tracing pipeline is converted at runtime.

## Runtime 
At runtime, the application converts the DXIL bytecode intended for ray tracing into Metal libraries, which are then used to create the compute pipeline state needed for ray tracing. All of this is managed in `Renderer.m`.
Deep-diving into `Renderer.m`, we have:
* Present render pipeline creation          `-[Renderer _createPresentRenderPipelineWithResources]`
* Ray tracing compute pipeline creation     `-[Renderer _createRaytracingComputePipelineWithResources]`
    * Defining the DXR pipeline
    * Creating the shader binding table
    * Creating the acceleration structures
* Metal shader converter resource binding   `-[Renderer _createShaderConverterRuntimeBindings]`
* Dispatch & Draw                           `-[Renderer drawInMTKView:]`

### Creating the Ray Tracing Pipeline
The sample introduces a helper class `DXRCompiler` which encapsulates the conversion and compilation of DXR shaders to Metal. 
From a description of a DXR pipeline, it produces a `DXRPipelineState` object that contains a metal compute pipeline state for raytracing. This `DXRPipelineState` object provides access to resources required for residency when dispatching rays, as well as access to shader identifiers required to emulate DXR Shader Tables. 
The `DXRCompiler` can also, optionally, produce a `DXRPipelineReflection` object which contains reflection information that includes the metal compute pipeline reflection and conversion information - mapping input DXR shader description to the output `MTLFunction` objects used to create the pipeline.
```objc
@interface DXRPipelineState : NSObject
@property(nonatomic, readonly) id<MTLComputePipelineState> computePipelineState;
@property(nonatomic, readonly, nullable) id<MTLVisibleFunctionTable> visibleFunctionTable;
@property(nonatomic, readonly, nullable) id<MTLIntersectionFunctionTable> intersectionFunctionTable;
@property(nonatomic, readonly, nullable) NSArray<id<MTLFunctionHandle>> *intersectionFunctionBufferHandles;
- (uint64_t)shaderIdentifierForName:(NSString *)name;
@end
```

The availability of these resources are dependent on metal shader converter compilation modes. The default modes used by `DXRCompiler` are:
* `IRRayGenerationCompilationKernel` - convert the DXR ray generation shader directly to a Metal dispatch `kernel`.
* `IRIntersectionFunctionCompilationIntersectionFunctionBufferFunction` - convert DXR intersection and any-hit shaders into intersection function handles to be encoded into a Metal IFB.
Metal IFBs are supported on MTLGPUFamilyApple9 and later GPU devices only. For older devices, `DXRCompiler` falls back to:
* `IRIntersectionFunctionCompilationVisibleFunction` - synthesizes an intersection function table and converts DXR intersection and any-hit shaders into visible functions to be encoded into a visible function table.

To experiment with these modes, see `-[DXRCompiler _defaultCompilationMode]`.
The details of the compilation process in `DXRCompiler.m` are encapsulated in `-[DXRCompiler newDXRPipelineWithDescriptor:reflection:error:]` which is broken down into:
* DXIL to MetalLib conversion                   `-[DXRCompiler _convertDXRShadersWithDescriptor:compilationMode:error:]`
* Metal compute pipeline creation               `-[DXRCompiler _createComputePipelineStateWithIntermediates:reflection:error:]` 
* Shader identifier mapping for shader tables   `-[DXRCompiler _createDXRPipelineStateWithComputePipelineState:intermediates:compilationMode:error:]`

In Renderer.m, the sample describes its ray DXR pipeline as follows:
```objc
    // Describe the DXR pipeline options
    DXRPipelineOptions *options = [[DXRPipelineOptions alloc] init];
    options.maxAttributeSizeInBytes = 16;
    options.maxTraceRecursionDepth = 1;
    
    // Describe the DXR raytracing pipeline
    DXRShader *rayGenerationShader = [[DXRShader alloc] initWithName:@"MainRayGen" DXIL:DXIL];
    DXRShader *missShader = [[DXRShader alloc] initWithName:@"MainMiss" DXIL:DXIL];
    DXRHitgroup *hitGroup = [[DXRHitgroup alloc] init]; {
        hitGroup.anyHitShader = [[DXRShader alloc] initWithName:@"TriangleAnyHit" DXIL:DXIL];
        hitGroup.closestHitShader = [[DXRShader alloc] initWithName:@"TriangleClosestHit" DXIL:DXIL];
    }

    // Describe the pipeline
    DXRPipelineDescriptor *dxrDescriptor = [[DXRPipelineDescriptor alloc] init];
    dxrDescriptor.rayGenerationShader = rayGenerationShader;
    dxrDescriptor.hitGroups = @[hitGroup];
    dxrDescriptor.missShaders = @[missShader];
    dxrDescriptor.options = options;
    dxrDescriptor.globalRootSignatureDescriptor = globalRootSignatureDescriptor;
    dxrDescriptor.localRootSignatureDescriptor = localRootSignatureDescriptor;
```
Then with the DXR pipeline defined, the sample creates the DXR pipeline state with the `DXRCompiler`:
```objc
    DXRPipelineReflection *dxrPipelineReflection = nil;
    DXRPipelineState *dxrPipelineState = [_dxrCompiler newDXRPipelineWithDescriptor:dxrDescriptor reflection:&dxrPipelineReflection error:&error];
``` 

### Creating the Shader Binding Table
The DXR Shader Table for the scene is described as:
```c
    typedef struct ShaderRecord {
        IRShaderIdentifier shaderIdentifier;
    } ShaderRecord;

    typedef struct ShaderRecordWithData {
        IRShaderIdentifier shaderIdentifier;
        int32_t numHoles; // local root signature data
    } ShaderRecordTriangle;

    typedef struct ShaderTable {
        ShaderRecord rayGenRecord;
        ShaderRecord missRecord;
        ShaderRecordTriangle hitGroupRecords[kNumTriangles] __attribute__((aligned(64)));
    } ShaderTable;
```
The sample first accesses required shader identifiers from the `DXRPipelineState` object
```objc
    uint64_t rayGenShaderIdentifier = [dxrPipelineState shaderIdentifierForName:rayGenerationShader.uniqueName];
    uint64_t missShaderIdentifier = [dxrPipelineState shaderIdentifierForName:missShader.uniqueName];
    uint64_t anyHitShaderIdentifier = [dxrPipelineState shaderIdentifierForName:hitGroup.anyHitShader.uniqueName];
    uint64_t closestHitShaderIdentifier = [dxrPipelineState shaderIdentifierForName:hitGroup.closestHitShader.uniqueName];
```
Once all DXR shader identifiers are determined from their correspoinding Metal functions, the sample generates a Shader Binding Table backed by an `MTLBuffer`.
The sample builds the shader records for ray generation and miss shaders consisting of a `IRShaderIdentifier` - metal shader converter's representation of shader identifiers for Shader Binding Tables. And for hitgroups, the shader record includes local root signature data `numHoles` which determines the number of holes in each rendered triangle instance in the scene.  
```objc
    id<MTLBuffer> shaderTableBuffer = [_device newBufferWithLength:sizeof(ShaderTable) options:MTLResourceStorageModeShared];
    ShaderTable *shaderTable = (ShaderTable *)shaderTableBuffer.contents;
            
    // Ray generation shader record
    IRShaderIdentifierInit(&(shaderTable->rayGenRecord.shaderIdentifier), rayGenShaderIdentifier);
    
    // Miss shader record
    IRShaderIdentifierInit(&(shaderTable->missRecord.shaderIdentifier), missShaderIdentifier);
    
    // Hit group records
    for (int i = 0; i < kNumTriangles; ++i)
    {
        IRShaderIdentifierInitWithCustomIntersection(&(shaderTable->hitGroupRecords[i].shaderIdentifier), closestHitShaderIdentifier, anyHitShaderIdentifier);
        shaderTable->hitGroupRecords[i].lrsData = i + 1;
    }
```
As described above, with `IRIntersectionFunctionCompilationIntersectionFunctionBufferFunction`, the `anyHitShaderIdentifier` represents a metal function handle encoded into a metal IFB. The `shaderTableBuffer` hosts the IFB and the start of each entry is required to be 64-byte aligned. This is why the alignment specification for the hitgroup table is defined as:
```c
    ShaderRecordTriangle hitGroupRecords[kNumTriangles] __attribute__((aligned(64)));
``` 

### Creating Acceleration Structures
The sample builds a Metal primitive acceleration structure that defines the geometry of the each triangle. This acceleration is analogous to DXR's bottom-level acceleration structure (BLAS). 
The sample then defines instances of this primitive acceleration structure (aka BLAS) as basics of a Metal instance acceleration structure which is analogous to DXR's top-level acceleration structure (TLAS). 
NOTE: that with `IRIntersectionFunctionCompilationIntersectionFunctionBufferFunction`, the `MTLAccelerationStructureInstanceDescriptor.intersectionFunctionTableOffset` for each instance must be set to an instance index. The sample does this by checking if the `DXRPipelineState` object has a non-nil intersection function buffer handles array object. 
```objc
    BOOL setInstanceIndex = (dxrPipelineState.intersectionFunctionBufferHandles != nil);
```

### Metal Shader Converter resource binding
The sample ensures all pipeline generated resources, acceleration structures, and shader resources are available when the ray tracing pipeline is executed. 
`-[Renderer _createShaderConverterRuntimeBindings]` creates all the neccessary resources as required by the HLSL shaders and well as setting up the binding model expected by the metal shader converter runtime.

For example, teh sample binds the instance acceleration structure (TLAS) through an acceleration structure header (as required by metal shader converter) that also provides each instance's contribution to the hit index.
```objc
    // SRV
    size_t asSRVBufferSize = sizeof(IRRaytracingAccelerationStructureGPUHeader) + (sizeof(uint32_t) * kNumTriangles);
    id<MTLBuffer> asSRVBuffer = [_device newBufferWithLength:asSRVBufferSize options:MTLResourceStorageModeShared];
    IRRaytracingAccelerationStructureGPUHeader *header = (IRRaytracingAccelerationStructureGPUHeader *)asSRVBuffer.contents;
    header->accelerationStructureID = _accelerationStructureTLAS.gpuResourceID._impl;
    header->addressOfInstanceContributions = asSRVBuffer.gpuAddress + sizeof(IRRaytracingAccelerationStructureGPUHeader);
    uint32_t *instanceContributions = (uint32_t *)(asSRVBuffer.contents + sizeof(IRRaytracingAccelerationStructureGPUHeader));
    for (uint32_t i = 0; i < kNumTriangles; ++i) { instanceContributions[i] = i; }
    _asSRVBuffer = asSRVBuffer;
```
The `_resultTexture` is the destination for ray tracing compute pass (UAV) (and is subsequently the source texture for the present render pass (SRV)).
```objc
    // UAV 
    IRDescriptorTableEntry uavEntry;
    IRDescriptorTableSetTexture(&uavEntry, _resultTexture, 0, 0);
    id<MTLBuffer> uavTableBuffer = [_device newBufferWithBytes:&uavEntry length:sizeof(uavEntry) options:MTLResourceStorageModeShared];
    uavTableBuffer.label = @"UAV (Texture)";
    _textureUAVBuffer = uavTableBuffer;
```    
Both have to be encoded into a top-level argument buffer as described by the root signature and required by metal shader converter
```objc
    // Create top-level argument buffer for compute pipeline (ray tracing)
    uint64_t TLAB[2] = { _asSRVBuffer.gpuAddress, _textureUAVBuffer.gpuAddress };
    id<MTLBuffer> topLevelArgumentBuffer = [_device newBufferWithBytes:TLAB length:sizeof(TLAB) options:MTLResourceStorageModeShared];
    topLevelArgumentBuffer.label = @"TLAB (Compute)";
    _computeTLAB = topLevelArgumentBuffer;
```

### Dispatching Rays
The sample at this point has all the required resources needed to issue a ray dispatch. 
The sample then creates a metal command encoder, binds the ray tracing compute pipeline state, and makes resident all the required resources, then dispatches a grid size equal to the result texture. 

First the sample construct a `IRDispatchRaysArgument` object which describes the Shader Binding Table to the metal shader converter runtime. 
```objc
    IRDispatchRaysArgument dispatchRaysArg;
    {
        ShaderTable *shaderTable = (ShaderTable *)_shaderTable.contents;
        uint64_t shaderTableGPUAddress = _shaderTable.gpuAddress;
        
        dispatchRaysArg.VisibleFunctionTable = _visibleFunctionTable.gpuResourceID;
        dispatchRaysArg.IntersectionFunctionTable = _intersectionFunctionTable.gpuResourceID;
        dispatchRaysArg.GRS = _computeTLAB.gpuAddress;
        
        IRDispatchRaysDescriptor *dispatchRaysDesc = &dispatchRaysArg.DispatchRaysDesc;
        dispatchRaysDesc->RayGenerationShaderRecord.StartAddress = shaderTableGPUAddress;
        dispatchRaysDesc->RayGenerationShaderRecord.SizeInBytes = sizeof(ShaderRecord);
        dispatchRaysDesc->HitGroupTable.StartAddress = shaderTableGPUAddress + offsetof(ShaderTable, hitGroupRecords);
        dispatchRaysDesc->HitGroupTable.SizeInBytes = sizeof(shaderTable->hitGroupRecords);
        dispatchRaysDesc->HitGroupTable.StrideInBytes = sizeof(shaderTable->hitGroupRecords[0]);
        dispatchRaysDesc->MissShaderTable.StartAddress = shaderTableGPUAddress + offsetof(ShaderTable, missRecord);
        dispatchRaysDesc->MissShaderTable.SizeInBytes = sizeof(shaderTable->missRecord);
        dispatchRaysDesc->MissShaderTable.StrideInBytes = sizeof(shaderTable->missRecord);
        dispatchRaysDesc->CallableShaderTable.StartAddress = 0;
        dispatchRaysDesc->CallableShaderTable.SizeInBytes = 0;
        dispatchRaysDesc->CallableShaderTable.StrideInBytes = 0;
        dispatchRaysDesc->Width = (uint)threads.width;
        dispatchRaysDesc->Height = (uint)threads.height;
        dispatchRaysDesc->Depth = (uint)threads.depth;
    }
```
Then dispatches a grid size equal to the result texture, taking care to make appropriate resources resident.
```objc  
    id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
    [computeEncoder setComputePipelineState:_raytracingPipelineState];
    [computeEncoder useResource:_textureUAVBuffer usage:MTLResourceUsageRead];
    [computeEncoder useResource:_asSRVBuffer usage:MTLResourceUsageRead];
    [computeEncoder useResource:_computeTLAB usage:MTLResourceUsageRead];
    [computeEncoder useResource:_resultTexture usage:MTLResourceUsageWrite];
    [computeEncoder useResource:_accelerationStructureTLAS usage:MTLResourceUsageRead];
    [computeEncoder useResource:_shaderTable usage:MTLResourceUsageRead];
    [computeEncoder useResource:_visibleFunctionTable usage:MTLResourceUsageRead];
    if (_intersectionFunctionTable) {
        [computeEncoder useResource:_intersectionFunctionTable usage:MTLResourceUsageRead];
    }
    [computeEncoder setBytes:&dispatchRaysArg length:sizeof(dispatchRaysArg) atIndex:kIRRayDispatchArgumentsBindPoint];
    [computeEncoder dispatchThreads:threads threadsPerThreadgroup:threadsPerThreagroup];
    [computeEncoder endEncoding];
```
