Hello, I'm tracking down a bug where useResource doesn't seem to apply proper synchronization when a resource is produced by the render pass then consumed by the compute pass, but when I use MTLFence between the to signal and wait between the render/compute encoders, the artifact goes away.
The resource is created with MTLHazardTrackingModeTracked and useResource is called on the compute encoder after the render pass. Metal API Validation doesn't report any warnings/errors.
Am I misunderstanding the difference between the two APIs? I dug through the Metal documentation and it looks like useResource should handle synchronization given the resource has MTLHazardTrackingModeTracked but on the other hand, MTLFence should be used to ensure proper synchronization between command encoders. Can someone can clarify the difference between the two APIs and when to use them.
Metal
RSS for tagRender advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi there,
Is it possible to customize the Metal Performance HUD on Apple TV, similar to how it can be done on iPhone & iPad?
Would like to see things like Compiled Shaders for my Apps on tvOS
.
I mean…I want to use defaults rather than launching apps via open with the saved environment variables.
This is pretty easy on iOS and other platforms. So what about in macOS?
I am trying to learn the new Metal Peformance Primitives APIs. I have added the MetalPeformancePrimitives framework and included the header in my shader code as per documentation
#include <MetalPeformancePrimitives/MetalPeformancePrimitives.h>
Unfortunately, Xcode complains that the header cannot be found. How do I include it properly?
I am using Xcode 26 on Tahoe. The MetalPeformancePrimitives framework is present on my machine and I can inspect the headers in the filesystem.
Topic:
Graphics & Games
SubTopic:
Metal
I work on a Qt/QML app that uses Esri Maps SDK for Qt and that is deployed to both Windows and iPads. With a recent iPad OS upgrade to 26.1, many iPad users are reporting the application freezing after panning and/or identifying features in the map. It runs fine for our Windows users.
I was able to reproduce this and grabbed the following error messages when the freeze happens:
IOGPUMetalError: Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
IOGPUMetalError: Invalid Resource (00000009:kIOGPUCommandBufferCallbackErrorInvalidResource)
Environment:
Qt 6.5.4 (Qt for iOS)
Esri Maps SDK for Qt 200.3
iPadOS 26.1
Because it appears to be a Metal error, I tried using OpenGL (Qt offers a way to easily set hte target graphics api):
QQuickWindow::setGraphicsApi(QSGRendererInterface::GraphicsApi::OpenGL)
Which worked! No more freezing. But I'm seeing many posts that OpenGL has been deprecated by Apple.
I've seen posts that Apple deprecated OpenGL ES. But it seems to still be available with iPadOS 26.1. If so, will this fix (above) just cause problems with a future iPadOS update?
Any other suggestions to address this issue? Upgrading our version of Qt + Esri SDK to the latest version is not an option for us. We are in the process to upgrade the full application, but it is a year or two out. So, we just need a fix to buy us some time for now.
Appreciate any thoughts/insights....
I noticed that when the render command encoder adds no draw calls an apps memory usage seems to grow unboundedly. Using a super simple MTKView-based drawing with the following delegate (code at end).
If I add the simplest of draw calls, e.g., a single vertex, the app's memory usage is normal, around 100-ish MBs.
I am attaching a couple screenshot, one from Xcode and one from Instruments.
What's going on here? Is this an illegal program? If yes, why does it not crash, such as if the encode or command buffer weren't ended.
Or is there some race condition at play here due to the lack of draws?
class Renderer: NSObject, MTKViewDelegate {
var device: MTLDevice
var commandQueue: MTL4CommandQueue
var commandBuffer: MTL4CommandBuffer
var allocator: MTL4CommandAllocator
override init() {
guard let d = MTLCreateSystemDefaultDevice(),
let queue = d.makeMTL4CommandQueue(),
let cmdBuffer = d.makeCommandBuffer(),
let alloc = d.makeCommandAllocator()
else {
fatalError("unable to create metal 4 objects")
}
self.device = d
self.commandQueue = queue
self.commandBuffer = cmdBuffer
self.allocator = alloc
super.init()
}
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {}
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable else { return }
commandBuffer.beginCommandBuffer(allocator: allocator)
guard let descriptor = view.currentMTL4RenderPassDescriptor,
let encoder = commandBuffer.makeRenderCommandEncoder(
descriptor: descriptor
)
else {
fatalError("unable to create encoder")
}
encoder.endEncoding()
commandBuffer.endCommandBuffer()
commandQueue.waitForDrawable(drawable)
commandQueue.commit([commandBuffer])
commandQueue.signalDrawable(drawable)
drawable.present()
}
}
Topic:
Graphics & Games
SubTopic:
Metal
Unable to find intelgpu_kbl_gt2r0 slice or a compatible one in binary archive 'file:///System/Library/PrivateFrameworks/IconRendering.framework/Resources/binary.metallib'
available slices: applegpu_g13g, applegpu_g13s, applegpu_g13d, applegpu_g14g, applegpu_g14s, applegpu_g14d, applegpu_g15g, applegpu_g15s, applegpu_g15d, applegpu_g16g, applegpu_g16s, applegpu_g17g, applegpu_g15g, applegpu_g15s, applegpu_g15d, applegpu_g16s
Is it related to performance of applications in macOS 26.2 on Intel Macs?
I'm a newbee at Vulkan and Xcode.
I have my project on github https://github.com/flocela/OrangeSpider/
Whenever I run, two windows open instead of only one.
I added testing, which means I have an OrangeSpider.xctestplan in the OrangeSpider/TestsOrangeSpider/ folder.
This is my first time adding testing to an XCode project, so I think this may be where the problem is.
I also get this error message:
ViewBridge to RemoteViewService Terminated: Error Domain=com.apple.ViewBridge Code=18 "(null)" UserInfo={com.apple.ViewBridge.error.hint=this process disconnected remote view controller -- benign unless unexpected, com.apple.ViewBridge.error.description=NSViewBridgeErrorCanceled}
Topic:
Graphics & Games
SubTopic:
Metal
Hello everyone,
I must have missed something but why isn't there a depthAttachmentPixelFormat to the new Metal 4 MTL4RenderPipelineDescriptor, unlike the old MTLRenderPipelineDescriptor?
So how do you set the depth pixel format?
Thanks in advance!
I would love to use Background GPU Access to do some video processing in the background.
However the documentation of BGContinuedProcessingTaskRequest.Resources.gpu clearly states:
Not all devices support background GPU use. For more information, see Performing long-running tasks on iOS and iPadOS.
Is there a list available of currently released devices that do (or don't) support GPU background usage? That would help to understand what part of our user base can use this feature. (And what hardware we need to test this on as developers.)
For example it seems that it isn't supported on an iPad Pro M1 with the current iOS 26 beta. The simulators also seem to not support the background GPU resource. So would be great to understand what hardware is capable of using this feature!
Hi, developers,
I maintain a shipped app that uses string concatenation to construct Metal shader and compile on-device. Beta 4 seems disabled __asm keyword, resulting the compilation failure.
The error is:
v2/GEMMKernel.cpp:229: error: program_source:23:9: error: illegal string literal in 'asm'
__asm("air.simdgroup_async_copy_1d.p3i8.p1i8");
The relevant code is available at https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mfa/v2/GEMMHeaders.cpp#L30 although any __asm will trip this.
Please give us guidance on whether this is a regression or this will be something enforced in 26 release. Personally, I would consider this as a bug given it won't impact anything "compiled" shaders.
Thanks for your patience reading this!
I have really enjoyed looking through the code and videos related to Metal 4. Currently, my interest is to update a ReSTIR Project and take advantage of more robust ways to refit acceleration Structures and more powerful ways to access resources.
I am working in Swift and have encountered a couple of puzzles:
What is the 'accepted' way to create a MTL4BufferRange to store indices and vertices?
How do I properly rewrite Swift code to build and compact an Acceleration Structure?
I do realize that this is all in Beta and will happily look through Code Samples this Fall. If other guidance is available earlier, that would be fabulous!
Thank you
I am developing a macOS terminal app, running on an M4 Pro, and using Metal.
I am not able use float8 or float16, both reporting Variable has incomplete type 'float16' (aka '__Reserved_Name__Do_not_use_float16').
Based on the system I should be able to use these. Either it is because it is also compiling to Intel, which they are not allowed, or something else. Either way I have not been able to figure out how to get past this.
IIs there a compiler setting I need to set to make this work? if so which one and what setting do I need? I only want to run this on M processes, on the latest version of OS so not interested in Intel version or backward compatibility.
Hello,
I recently watched the WWDC2025 session titled “Combine Metal 4 machine learning and graphics” (https://developer.apple.com/videos/play/wwdc2025/262/ ), and I’m very excited about the new Metal 4 features that integrate machine learning with graphics—such as neural ambient occlusion, shader-based ML inference, and the use of MTLTensor and MTL4MachineLearningCommandEncoder.
While the session includes helpful code snippets and a compelling debug demo (e.g., the neural ambient occlusion example), the implementation details are not fully shown, and I haven’t been able to find a complete, runnable sample project that demonstrates end-to-end integration of ML and rendering in Metal 4.
Would Apple be able to provide a full, working example—such as an Xcode project—that shows how to:
Export a model to an .mlpackage,
Convert it to an .mtlpackage,
Use MTL4MachineLearningCommandEncoder alongside render passes,
Or embed small neural networks directly in shaders using Shader ML?
Having such a sample would greatly help developers like me adopt these powerful new capabilities correctly and efficiently.
Thank you very much for your time and support!
Best regards,
I am integrating MetalFX FrameInterpolator into a custom Unity RenderGraph–based render pipeline (C++ native plugin + C# render passes), and I am hitting the following assertion at runtime:
/MetalFXDebugError.h:29: failed assertion `Color texture width mismatch from descriptor'
What makes this confusing is that all input/output textures have the correct width and height, and they exactly match the values specified in the MTLFXFrameInterpolatorDescriptor.
Setup
Input resolution: 1024 x 512
Output resolution: 2048 x 1024
MTLFXTemporalScaler is created first and then passed into MTLFXFrameInterpolator
The TemporalScaler and FrameInterpolator descriptors use the same input/output sizes and formats
All Metal textures:
Have no parentTexture
Are 2D textures
Match the descriptor sizes exactly (verified via logging)
Texture bindings at encode time
frameInterpolator.colorTexture = mtlTexColor; // 1024 x 512
frameInterpolator.prevColorTexture = mtlTexPrevColor; // 1024 x 512
frameInterpolator.motionTexture = mtlTexMotion; // 1024 x 512
frameInterpolator.depthTexture = mtlTexDepth; // 1024 x 512
frameInterpolator.uiTexture = mtlTexUI; // 2048 x 1024
frameInterpolator.outputTexture = mtlTexOutput; // 2048 x 1024
All widths/heights are logged and match:
Color : 1024 x 512 (input)
PrevColor : 1024 x 512 (input)
Motion : 1024 x 512 (input)
Depth : 1024 x 512 (input)
UI : 2048 x 1024 (output)
Output : 2048 x 1024 (output)
The TemporalScaler works correctly on its own.
The assertion only occurs when using FrameInterpolator.
Important detail about colorTexture
Originally, colorTexture was copied from BuiltinRenderTextureType.CurrentActive.
After reading that this might violate MetalFX semantics, I changed the pipeline so that:
colorTexture now comes from a dedicated private RenderGraph texture
It is not the backbuffer
It is not a drawable
It is not used as a final output
It is created before UI rendering
Despite this, the assertion still occurs.
Question
Can uiTexture for MTLFXFrameInterpolator legally come from a texture copied from BuiltinRenderTextureType.CurrentActive?
More generally:
Are there additional hidden constraints on colorTexture / prevColorTexture (such as Metal usage, storageMode, aliasing, or hazard tracking) that could cause this assertion, even when sizes match?
Does FrameInterpolator require colorTexture and prevColorTexture to be created in a very specific way (e.g. non-aliased, ShaderRead usage, identical Metal resource properties)?
Any clarification on the exact semantic requirements for colorTexture, prevColorTexture, or uiTexture in MetalFX FrameInterpolator would be greatly appreciated.
Hi,
Introducing Swift Concurrency to my Metal app has been a bit challenging as Swift Concurrency is limited by the cooperative thread pool.
GPU work is obviously not CPU bound and can block forward moving progress, especially when using waitUntilCompleted on the command buffer. For concurrent render work this has the potential of under utilizing the CPU and even creating dead locks.
My question is, what is the Metal's teams general recommendation when it comes to concurrency? It seems to me that Dispatch or OperationQueues are still the preferred way for Metal bound tasks in order to gain maximum performance?
To integrate with Swift Concurrency my idea is to use continuations that kick off render jobs via Dispatch or Queues? Would this be the best solution to bridge async tasks with Metal work?
Thanks!
In my project I need to do the following:
In runtime create metal Dynamic library from source.
In runtime create metal Executable library from source and Link it with my previous created Dynamic library.
Create compute pipeline using those two libraries created above.
But I get the following error at the third step:
Error Domain=AGXMetalG15X_M1 Code=2 "Undefined symbols:
_Z5noisev, referenced from: OnTheFlyKernel
" UserInfo={NSLocalizedDescription=Undefined symbols:
_Z5noisev, referenced from: OnTheFlyKernel
}
import Foundation
import Metal
class MetalShaderCompiler {
let device = MTLCreateSystemDefaultDevice()!
var pipeline: MTLComputePipelineState!
func compileDylib() -> MTLDynamicLibrary {
let source = """
#include <metal_stdlib>
using namespace metal;
half3 noise() {
return half3(1, 0, 1);
}
"""
let option = MTLCompileOptions()
option.libraryType = .dynamic
option.installName = "@executable_path/libFoundation.metallib"
let library = try! device.makeLibrary(source: source, options: option)
let dylib = try! device.makeDynamicLibrary(library: library)
return dylib
}
func compileExlib(dylib: MTLDynamicLibrary) -> MTLLibrary {
let source = """
#include <metal_stdlib>
using namespace metal;
extern half3 noise();
kernel void OnTheFlyKernel(texture2d<half, access::read> src [[texture(0)]],
texture2d<half, access::write> dst [[texture(1)]],
ushort2 gid [[thread_position_in_grid]]) {
half4 rgba = src.read(gid);
rgba.rgb += noise();
dst.write(rgba, gid);
}
"""
let option = MTLCompileOptions()
option.libraryType = .executable
option.libraries = [dylib]
let library = try! self.device.makeLibrary(source: source, options: option)
return library
}
func runtime() {
let dylib = self.compileDylib()
let exlib = self.compileExlib(dylib: dylib)
let pipelineDescriptor = MTLComputePipelineDescriptor()
pipelineDescriptor.computeFunction = exlib.makeFunction(name: "OnTheFlyKernel")
pipelineDescriptor.preloadedLibraries = [dylib]
pipeline = try! device.makeComputePipelineState(descriptor: pipelineDescriptor, options: .bindingInfo, reflection: nil)
}
}
I am trying to load some PNG data with MTKTextureLoader newTextureWithData,but the result shows wrong at the alpha area.
Here is the code. I have an image URL, after it downloads successfully, I try to use the data or UIImagePNGRepresentation (image), they all show wrong.
UIImage *tempImg = [UIImage imageWithData:data];
CGImageRef cgRef = tempImg.CGImage;
MTKTextureLoader *loader = [[MTKTextureLoader alloc] initWithDevice:device];
id<MTLTexture> temp1 = [loader newTextureWithData:data options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
NSData *tempData = UIImagePNGRepresentation(tempImg);
id<MTLTexture> temp2 = [loader newTextureWithData:tempData options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
id<MTLTexture> temp3 = [loader newTextureWithCGImage:cgRef options:@{MTKTextureLoaderOptionSRGB: @(NO), MTKTextureLoaderOptionTextureUsage: @(MTLTextureUsageShaderRead), MTKTextureLoaderOptionTextureCPUCacheMode: @(MTLCPUCacheModeWriteCombined)} error:nil];
}] resume];
I'm updating our app to support metal 4, but the metal 4 types don't seem to get recognized when targeting simulator. Is it known if metal 4 will be supported in the near future, or am I setting up the app wrong?
[CRITICAL] Metal API Memory Leak - Heap Memory Never Released to OS (CWE-400)
Security Classification
This issue constitutes a resource exhaustion vulnerability (CWE-400):
Aspect
Details
Type
Uncontrolled Resource Consumption
CWE
CWE-400
Vector
Local (any Metal application)
Impact
System instability, denial of service
User Control
None - no mitigation available
Recovery
Requires application restart
Summary
Metal heap allocations are never released back to macOS, even when the memory is entirely unused. This causes continuous, unbounded memory growth until system instability or crash. The issue affects any application using Metal API heap allocation.
This was discovered in Unreal Engine 5, but reproduces in a completely blank UE5 project with zero application code - confirming this is Metal framework behavior, not application-level.
Environment
OS: macOS Tahoe 26.2
Hardware: Apple Silicon M4 Max (also reproduced on M1, M2, M3)
API: Metal
Reproduction Steps
Run any Metal application that allocates and deallocates GPU buffers via Metal heaps
Open Activity Monitor and observe the application's memory usage
Let the application run idle (no user interaction required)
Observe memory growing continuously at ~1-2 MB per second
Memory never plateaus or stabilizes
Eventually system becomes unstable
For testing: Any Unreal Engine 5.4+ project on macOS will reproduce this. Even a blank project with no gameplay code exhibits the leak. (Tested on UE 5.7.1)
Observed Behavior
Memory Analysis
Using Unreal's memreport -full command, two reports taken 86 seconds apart:
Metric
Report 1 (183s)
Report 2 (269s)
Delta
Process Physical
4373.64 MB
4463.39 MB
+89.75 MB
Metal Heap Buffer
7168 MB
8192 MB
+1024 MB
Unused Heap
3453 MB
4477 MB
+1024 MB
Object Count
73,840
73,840
0 (no change)
Key Finding
Metal Heap grew by exactly 1 GB while "Unused Heap" also grew by 1 GB. This demonstrates:
Metal is allocating new heap blocks in ~1 GB increments
Previously allocated heap memory becomes "unused" but is never released
The unused memory accumulates indefinitely
No application-level objects are leaking (count remains constant)
Memory Growth Pattern
Continuous growth while idle (no user interaction)
Growth rate: approximately 1-2 MB per second
No plateau or stabilization occurs
Metal allocates new 1 GB heap blocks rather than reusing freed space
Eventually leads to system instability and crash
What is NOT Causing This
We verified the following are NOT the source:
Application objects - Object count remains constant
Application code - Blank project with no code reproduces the issue
Texture streaming - Disabling texture streaming had no effect
CPU garbage collection - Running GC has no effect (this is GPU memory)
Mitigations Attempted (None Worked)
setPurgeableState
Setting resources to purgeable state before release:
[buffer setPurgeableState:MTLPurgeableStateEmpty];
Result: Metal ignores this hint and does not reclaim heap memory.
Avoiding Heap Pooling
Forcing individual buffer allocations instead of heap-based pooling.
Result: Leak persists - Metal still manages underlying allocations.
Aggressive Buffer Compaction
Attempting to compact/defragment buffers within heaps every frame.
Result: Only moves data between existing heaps. Does NOT release heaps back to OS.
Reducing Pool Sizes
Minimizing all buffer pool sizes to force more frequent reuse.
Result: Slightly slows the leak rate but does not stop it.
Root Cause Analysis
How Metal Heap Allocation Appears to Work
Metal allocates GPU heap blocks in large chunks (~1 GB observed)
Application requests buffers from these heaps
When application releases buffers, memory becomes "unused" within the heap
Metal does NOT release heap blocks back to macOS, even when entirely unused
When fragmentation prevents reuse, Metal allocates new heap blocks
Result: Continuous memory growth with no upper bound
The Core Problem
There appears to be no Metal API to force heap memory release. The only way to reclaim this memory is to destroy the Metal device entirely, which requires restarting the application.
Expected Behavior
Metal should:
Release unused heaps - When a heap block is entirely unused, release it back to macOS
Respect purgeable hints - Honor setPurgeableState calls from applications
Compact allocations - Defragment heap allocations to reduce fragmentation
Provide control APIs - Allow applications to request heap compaction or release
Enforce limits - Have configurable maximum heap memory consumption
Security Implications
Local Denial of Service - Any Metal application can exhaust system memory, causing instability affecting all running applications
Memory Pressure Attack - Forces other applications to swap to disk, degrading system-wide performance
No Upper Bound - Memory consumption continues until system failure
Unmitigable - End users have no way to prevent or limit the leak
Affects All Metal Apps - Any application using Metal heaps is potentially affected
Impact
Applications become unstable after extended use
System-wide performance degrades as memory pressure increases
Users must periodically restart applications
Developers cannot work around this at the application level
Long-running applications (games, creative tools, servers) are particularly affected
Request
Investigate Metal heap memory management behavior
Implement heap release when blocks become entirely unused
Honor setPurgeableState hints from applications
Consider providing an API for applications to request heap compaction
Document any intended behavior or workarounds
Additional Notes
This issue has been observed across multiple Unreal Engine versions (5.4, 5.7) and multiple Apple Silicon generations (M1 through M4). The behavior is consistent and reproducible.
The Unreal Engine team has implemented various CVars to attempt mitigation (rhi.Metal.HeapBufferBytesToCompact, rhi.Metal.ResourcePurgeInPool, etc.) but none successfully address the issue because the root cause is at the Metal framework level.
Tested: January 2026
Platform: macOS Tahoe 26.2, Apple Silicon (M1/M2/M3/M4)