Explore the integration of media technologies within your app. Discuss working with audio, video, camera, and other media functionalities.

All subtopics
Posts under Media Technologies topic

Post

Replies

Boosts

Views

Activity

AVFoundationErrorDomain Code=-11819 in AVPlayer – Causes & Fixes?
We are encountering an issue where AVPlayer throws the error: Error Domain=AVFoundationErrorDomain Code=-11819 "Cannot Complete Action" > Underlying Error Domain[(null)]:Code[0]:Desc[(null)] This error seems to occur intermittently during video playback, especially after extended usage or when switching between different streams. We observe Error 11819 (AVFoundationErrorDomain) in Conviva platform that some of our users experience it but we couldn't reproduce it so far and we’re need support to determine the root cause and/or best practices to prevent it. Some questions we have: What typically triggers this error? Could it be related to memory/resource constraints, network instability, or backgrounding? Are there any recommended ways to handle or recover from this error gracefully? Any insights or guidance would be greatly appreciated. Thanks!
0
0
173
Apr ’25
How to synchronize the clock sources of two audio devices
I created a virtual audio device to capture system audio with a sample rate of 44.1 kHz. After capturing the audio, I forward it to the hardware sound card using AVAudioEngine, also with a sample rate of 44.1 kHz. However, due to the clock sources being unsynchronized, problems occur after a period of playback. How can I retrieve the clock source of the hardware device and set it for the virtual device?
2
0
299
May ’25
Add icon to DEXT based on AudioDriverKit
Dear Sirs, I'd like to add an icon to my audio driver based on AudioDriverKit. This icon should show up left of my audio device in the audio devices dialog. For an Audio Server Plugin I managed to do this using the property kAudioDevicePropertyIcon and CFBundleCopyResourceURL(...) but how would you do this with AudioDriverKit? Should I use IOUserAudioCustomProperty or IOUserAudioControl and how would I refer to the Bundle? Is there an example available somewhere? Thanks and best regards, Johannes
7
0
1.2k
Jul ’25
MusicKit developer token issue
I'm reaching out regarding a recurring issue I'm experiencing with MusicKit developer tokens. I'm using a valid .p8 private key to sign JWTs for Apple MusicKit integration. Each token I generate includes the appropriate claims (iss, iat, exp) and is signed with the ES256 algorithm, with an expiration date set approximately 6 months ahead. Everything works as expected immediately after generating the token. However, after a few days, the same JWT (still well within its expiration period) suddenly begins returning invalid/unauthorized responses when used in Postman and other API clients. Importantly: I did not delete or revoke the .p8 key during this time. I verified the JWT contains valid claims and a proper structure. The issue consistently resolves only when I create a new .p8 file and regenerate a fresh JWT with it—after which the cycle repeats. This issue occurs even when the environment and app identifiers remain unchanged. I would greatly appreciate it if you could help me understand: Why these tokens become invalid after a few days, despite having a long exp value and an unchanged key. Whether there's any automatic revocation or timeout policy on .p8 keys that could explain this behavior. If there's a better way to maintain long-lived developer tokens without requiring new .p8 key generation every few days. Thank you for your help and clarification on this issue. Best regards, Liad Altif
0
0
149
Jun ’25
How do I find the catalogID for songs in a apple music playlist? (MusicKit & Apple Music API)
Hello, How do I find the apple music catalogID for songs in a apple music playlist? Im building an iOS app that uses MusicKit/Apple Music API. For this example, you can assume that my iOS app simply allows users to upload their apple music playlists. And when users open a specific playlist, I want to find the catalogID for each song. Currently all the songs are returning song IDs in this format “i.PkdJvPXI2AJgm8”. I believe these are libraryIDs, not catalogIDs. I’d like to find a front end solution using MusicKit. But perhaps a back end solution using the Apple Music Rest API is required. Any recommendations would be appreciated!
1
0
121
Jun ’25
Memory leak when performing DetectHumanBodyPose3DRequest request
Hi, I'm developing an application for macos and ios that has to run DetectHumanBodyPose3DRequest model in real time for retrieving the 3d skeleton from the camera. I'm experiencing a memory leak every time the model is used (when i comment that line, the memory stays constant). After a minute it uses about 1GB of ram running with mac catalyst. I attached a minimal project that has this problem Code Camera View import SwiftUI import Combine import Vision struct CameraView: View { @StateObject private var viewModel = CameraViewModel() var body: some View { HStack { ZStack { GeometryReader { geometry in if let image = viewModel.currentFrame { Image(decorative: image, scale: 1) .resizable() .scaledToFill() .frame(width: geometry.size.width, height: geometry.size.height) .clipped() } else { ProgressView() } } } } } } class CameraViewModel: ObservableObject { @Published var currentFrame: CGImage? @Published var frameRate: Double = 0 @Published var currentVisionBodyPose: HumanBodyPose3DObservation? // Store current body pose @Published var currentImageSize: CGSize? // Store current image size private var cameraManager: CameraManager? private var humanBodyPose = HumanBodyPose3DDetector() private var lastClassificationTime = Date() private var frameCount = 0 private var lastFrameTime = Date() private let classificationThrottleInterval: TimeInterval = 1.0 private var lastPoseSendTime: Date = .distantPast init() { cameraManager = CameraManager() startPreview() startClassification() } private func startPreview() { Task { guard let previewStream = cameraManager?.previewStream else { return } for await frame in previewStream { let size = CGSize(width: frame.width, height: frame.height) Task { @MainActor in self.currentFrame = frame self.currentImageSize = size self.updateFrameRate() } } } } private func startClassification() { Task { guard let classificationStream = cameraManager?.classificationStream else { return } for await pixelBuffer in classificationStream { self.classifyFrame(pixelBuffer: pixelBuffer) } } } private func classifyFrame(pixelBuffer: CVPixelBuffer) { humanBodyPose.runHumanBodyPose3DRequestOnImage(pixelBuffer: pixelBuffer) { [weak self] observation in guard let self = self else { return } DispatchQueue.main.async { if let observation = observation { self.currentVisionBodyPose = observation print(observation) } else { self.currentVisionBodyPose = nil } } } } private func updateFrameRate() { frameCount += 1 let now = Date() let elapsed = now.timeIntervalSince(lastFrameTime) if elapsed >= 1.0 { frameRate = Double(frameCount) / elapsed frameCount = 0 lastFrameTime = now } } } HumanBodyPose3DDetector import Foundation import Vision class HumanBodyPose3DDetector: NSObject, ObservableObject { @Published var humanObservation: HumanBodyPose3DObservation? = nil private let queue = DispatchQueue(label: "humanbodypose.queue") private let request = DetectHumanBodyPose3DRequest() private struct SendablePixelBuffer: @unchecked Sendable { let buffer: CVPixelBuffer } public func runHumanBodyPose3DRequestOnImage(pixelBuffer: CVPixelBuffer, completion: @escaping (HumanBodyPose3DObservation?) -> Void) { let sendableBuffer = SendablePixelBuffer(buffer: pixelBuffer) queue.async { [weak self] in Task { [weak self, sendableBuffer] in do { guard let self = self else { return } let result = try await self.request.perform(on: sendableBuffer.buffer) //process result DispatchQueue.main.async { if result.isEmpty { completion(nil) } else { completion(result[0]) } } } catch { DispatchQueue.main.async { completion(nil) } } } } } }
1
0
164
Jun ’25
SFSpeechRecognizer throws User denied access to speech recognition
I have created an app where you can speak using SFSpeechRecognizer and it will recognize you speech into text, translate it and then return it back using speech synthesis. All locales for SFSpeechRecognizer and switching between them work fine when the app is in the foreground but after I turn off my screen(the app is still running I just turned off the screen) and try to create new recognitionTask it it receives this error inside the recognition task: User denied access to speech recognition. The weird thing about this is it only happens with some languages. The error happens with Croatian or Hungarian locale for speech recognition but doesn't with English or Spanish locale.
1
0
393
Mar ’25
ImageIO failed to encode HECIS in macOS 15.5
ImageIO encoding to HEICS fails in macOS 15.5. log writeImageAtIndex:1246: *** CMPhotoCompressionSessionAddImageToSequence: err = kCMPhotoError_UnsupportedOperation [-16994] (codec: 'hvc1') seems to be related with https://github.com/SDWebImage/SDWebImage/issues/3732 affected version iOS 18.4 (sim and device), macOS 15.5 unaffected version iOS 18.3 (sim and device), macOS 15.3
1
0
183
Jun ’25
Issue with Audio Sample Rate Conversion in Video Calls
Hey everyone, I'm encountering an issue with audio sample rate conversion that I'm hoping someone can help with. Here's the breakdown: Issue Description: I've installed a tap on an input device to convert audio to an optimal sample rate. There's a converter node added on top of this setup. The problem arises when joining Zoom or FaceTime calls—the converter gets deallocated from memory, causing the program to crash. Symptoms: The converter node is being deallocated during video calls. The program crashes entirely when this happens. Traditional methods of monitoring sample rate changes (tracking nominal or actual sample rates) aren't working as expected. The Big Challenge: I can't figure out how to properly monitor sample rate changes. Listeners set up to track these changes don't trigger when the device joins a Zoom or FaceTime call. Please, if anyone has experience with this or knows a solution, I'd really appreciate your help. Thanks in advance! ⁠
0
0
110
Apr ’25
Personas and Lighting in a visionOS 26 (or earlier) SharePlay Session
Use case: When SharePlay -ing a fully immersive 3D scene (e.g. a virtual stage), I would like to shine lights on specific Personas, so they show up brighter when someone in the scene is recording the feed (think a camera person in the scene wearing Vision Pro). Note: This spotlight effect only needs to render in the camera person's headset and does NOT need to be journaled or shared. Before I dive into this, my technical question: Can environmental and/or scene lighting affect Persona brightness in a SharePlay? If not, is there a way to programmatically make Personas "brighter" when recording? My screen recordings always seem to turn out darker than what's rendered in environment, and manually adjusting the contrast tends to blow out the details in a Persona's face (especially in visionOS 26).
0
0
116
Jun ’25
In Speech framework is SFTranscriptionSegment timing supposed to be off and speechRecognitionMetadata nil until isFinal?
I'm working in Swift/SwiftUI, running XCode 16.3 on macOS 15.4 and I've seen this when running in the iOS simulator and in a macOS app run from XCode. I've also seen this behaviour with 3 different audio files. Nothing in the documentation says that the speechRecognitionMetadata property on an SFSpeechRecognitionResult will be nil until isFinal, but that's the behaviour I'm seeing. I've stripped my class down to the following: private var isAuthed = false // I call this in a .task {} in my SwiftUI View public func requestSpeechRecognizerPermission() { SFSpeechRecognizer.requestAuthorization { authStatus in Task { self.isAuthed = authStatus == .authorized } } } public func transcribe(from url: URL) { guard isAuthed else { return } let locale = Locale(identifier: "en-US") let recognizer = SFSpeechRecognizer(locale: locale) let recognitionRequest = SFSpeechURLRecognitionRequest(url: url) // the behaviour occurs whether I set this to true or not, I recently set // it to true to see if it made a difference recognizer?.supportsOnDeviceRecognition = true recognitionRequest.shouldReportPartialResults = true recognitionRequest.addsPunctuation = true recognizer?.recognitionTask(with: recognitionRequest) { (result, error) in guard result != nil else { return } if result!.isFinal { //speechRecognitionMetadata is not nil } else { //speechRecognitionMetadata is nil } } } } Further, and this isn't documented either, the SFTranscriptionSegment values don't have correct timestamp and duration values until isFinal. The values aren't all zero, but they don't align with the timing in the audio and they change to accurate values when isFinal is true. The transcription otherwise "works", in that I get transcription text before isFinal and if I wait for isFinal the segments are correct and speechRecognitionMetadata is filled with values. The context here is I'm trying to generate a transcription that I can then highlight the spoken sections of as audio plays and I'm thinking I must be just trying to use the Speech framework in a way it does not work. I got my concept working if I pre-process the audio (i.e. run it through until isFinal and save the results I need to json), but being able to do even a rougher version of it 'on the fly' - which requires segments to have the right timestamp/duration before isFinal - is perhaps impossible?
1
0
164
Jul ’25
iOS 18 Picture-in-Picture dynamically changes UIScreen.main.bounds on iPhone 11 causing keyboard clipping and layout issues
When creating and activating AVPictureInPictureController on iPhone 11 running iOS 18.x, the main screen viewport unexpectedly changes from the native 375×812 logical points to 414×896 points (the size of a different device such as iPhone XR/11 Plus). Additionally, a separate UIWindow with zero CGRect is created. This causes UIKit layout calculations, especially related to the keyboard and safeAreaInsets, to malfunction and results in issues like keyboard clipping and general UI misalignment. The problem appears tied specifically to iOS 18+ behavior and does not manifest on earlier iOS versions or other devices consistently. Steps to Reproduce: Run the app on iPhone 11 with iOS 18.x (including the latest minor updates). Instantiate and start an AVPictureInPictureController during app runtime. Observe that UIScreen.main.bounds changes from 375×812 (native iPhone 11 size) to 414×896. Notice a secondary UIWindow appears with frame CGRectZero. Interact with text inputs that cause the keyboard to appear. The keyboard is clipped or partially obscured, causing UI usability issues. Layout elements dependent on UIScreen.main.bounds or safeAreaInsets behave incorrectly. Expected Behavior: UIScreen.main.bounds should not dynamically change upon PiP controller creation. The coordinate space and viewport should remain accurate to device native dimensions. The keyboard and UI layout should adjust properly with respect to the actual on-screen size and safe areas. Actual Behavior: UIScreen.main.bounds changes to 414×896 upon PiP controller creation on iPhone 11 iOS 18+. A zero-rect secondary UIWindow is created, impacting layout queries. Keyboard clipping and UI layout bugs occur. The app viewport and coordinate space are inconsistent with device hardware.
0
0
262
Jul ’25
WWDC25 Camera & Photos group lab summary (Part 1 of 3)
(Note: this is part 1 of a 3 part posting. See Part 2 or Part 3) At WWDC25 we launched a new type of Lab event for the developer community - Group Labs. A Group Lab is a panel Q&A designed for a large audience of developers. Group Labs are a unique opportunity for the community to submit questions directly to a panel of Apple engineers and designers. Here are the highlights from the WWDC25 Group Lab for Camera & Photos. WWDC25 Camera & Photos group lab ran for one hour at 6 PM PST on Tuesday June 10th, 2025 Introductory kick-off questions Question 1 Tell us a little about the new AVFoundation Capture APIs we've made available in the new iOS 26 developer preview? Cinematic Capture API (strong/weak focus, tracking focus)(scene monitoring)(simulated aperture)(dog/cat heads/groupIDs) Camera Controls and AirPod Stem Clicks Spatial Audio and Studio Quality AirPod Mics in Camera Lens Smudge Detection Exposure and Focus Rect of Interest Question 2 I built QR code scanning into my app, but on newer iPhones I have to hold the phone very far away from the QR code, otherwise the image is blurry and it doesn't scan. Why is this happening and how can I fix it? Every year, the cameras get better and better as we push the state of the art on iPhone photography and videography. This sometimes results in changes to the characteristics of the lenses. min focus distance newer phones have multiple lenses automatic switching behavior Use virtual device like the builtInDualWide or built in Triple, rather than just the builtInWide Set the videoZoomFactor to 2. You're done. Question 3 Last year, we saw some exciting new APIs introduced in AVFoundation in the health space. With Constant Color photography, developers can take pictures that have constant color regardless of ambient lighting. There are some further advancements this year. Davide, could you tell us about them? constant color photography is mean to remove the "tone mapping" applied to photograph captured with camera app, usually incldsuing artistic intent, and instead try to be a close as possible to the real color of the scene, regardless of the illumination constant color images could be captured in HEIF anf JPEG laste year. this year we are adding Support for the DICOM medical imaging photo format. It is a fomrat used by the health industry to store images related to medical subjects like MRI, skin problems, xray and so on. It's writable and also readable format on all OS26, supported through AVCapturePhotoOutput APIs and through the coregraphics api. for coregrapphics there is a new DICOM entry in the property dictionary which includes all the dicom availbale and defined propertie in a file. finder will also display all those in the info panel (Address why a developer would want to use it) - not for regualr picture taking apps. for those HEIF and JPEG are the preferred delivery format. use dicom if your app produces output that are health related, that you can also share with health providers or your doctors Main session developer questions Question 1 LiDAR vs. Dual Camera depth generation: Which resolution does the LiDAR sensor natively have (iPhone 16 Pro) and when to prefer LiDAR over Dual Camera? Both report formats with output resolutions (we don't advertise sensor resolution) Lidar vs Dual, etc: Lidar: Best for absolute depth, real world scale and computer vision Dual, etc: relative, disparity-based, less power, photo effects Also see: 2022 WWDC session "Discovery advancements in iOS camera capture: Depth, focus and multitasking" Question 2 Can true depth and lidar camera run at 60fps? Lidar can do 30fps (edited) Front true depth can do 60fps. Question 3 What’s the first class way to use PhotoKit to reimplement a high performance photo grid? We’ve been using a LazyVGrid and the photos caching manager, but are never able to hit the holy trinity (60hz, efficient memory footprint, minimal flashes of placeholder/empty cells) use the PHCachingImageManager to get media content delivered before you need to display it specify the size you need for grid sized display set the options PHVideoRequestOptionsDeliveryModeFastFormat, PHImageRequestOptionsDeliveryModeFastFormat and PHImageRequestOptionsResizeModeFast Question 4 For rending live preview of video stream, Is there performance overhead from using async and Swift UI for image updates vs UIViewRepresentable + AVCaptureVideoPreviewLayer.self? AVCaptureVideoPreviewLayer is the most efficient display path Use VDO + AVSampleBufferDisplayLayer if you need to modify the image data Swift UI image is optimized for static image content Question 5 Is there a way to configure the AVFoundation BuiltInLiDarDepthCamera mode to provide a depth map as accurate as ARKit at close range? The AVCaptureDepthDataOutput supports filtering that reduces noise and fills in invalid values. Consider using this for smoother depth maps Question 6 Pyramid-based photo editing in core image (such as adobe camera raw highlights and shadows)? First off you may want to look a the builtin filter called CIHighlightShadowAdjust Also the noise reduction in the CIRawFilter uses a pyramid-based algorithm. You can also write your own pyramid-based algorithms by taking an input image: down sample it by two multiply times using imageByApplyingAffineTransform apply additional CIKernels to each downsampled image as needed. use a custom CIKernel to combine the results. Question 7 Is the best way to integrate an in-app camera for a “non-camera” app UIImagePickerController? Yes, UIImagePickerController provides system-provided UI for capturing photos and movies. Question 8 Hello, my question is on Deferred Photo Processing? Say I have a photo capture app that adds a CIFilter to the capture. How can I take advantage of Deferred Photo Processing? Since I don’t know how to detect when the deferred captured photo is ready CIFilter can be called on the final at that point Photo will have to be re-inserted into the Photo library as adjustment Question 9 For shipping photo style assets in the app that need transparency what is the best format to use? JPEG2000? will moving to this save a lot of space comapred to PNG or other options? If you want lossless compression PNG is good and supports unpremutiplied alpha If you want lossy compression HEIF supports premutiplied or unpremutiplied alpha (Note: this is part 1 of a 3 part posting. See Part 2 or Part 3)
0
0
506
Jul ’25
Race issue - Custom AVAssetResourceLoaderDelegate cannot work with EXT-X-SESSION-KEY
TL;DR How to solve possible racing issue of EXT-X-SESSION-KEY request and encrypted media segment request? I'm having trouble using custom AVAssetResourceLoaderDelegate with video manifest containing VideoProtectionKey(VPK). My master manifest contains rendition manifest url and VPK url. When not using custom resource delegate, everything works fine. My custom resource delegate is implemented in way where it first append prefix to scheme of the master manifest url before creating the asset. And during handling master manifest, it puts back original scheme, make the request, modify the scheme for rendition manifest url in the response content by appending the same prefix again, so that rendition manifest request also goes into custom resource loader delegate. Same goes for VPK request. The AES-128 key is stored in memory within custom resource loader delegate object. So far so good. The VPK is requested before segment request. But the problem comes where the media segment requests happen. The media segment request url from rendition manifest goes into custom resource loader as well and those are encrypted. I can see segment request finish first then the related VPK requests kick in after a few seconds. The previous VPK value is cached in memory so it is not network causing the delay but some mechanism that I'm not aware of causing this. So could anyone tell me what would be the proper way of handling this situation? The native library is handling it well so I just want to know how. Thanks in advance!
2
0
158
Jun ’25
ffmpeg xcframework not working on Mac, but working correctly on iOS
I have an app (currently in development stage) which needs to use ffmpeg, so I tried searching how to embed ffmpeg in apple apps and found this article https://doc.qt.io/qt-6/qtmultimedia-building-ffmpeg-ios.html It is working correctly for iOS but not for macOS ( I have made changes macOS specific using chatgpt and traditional web searching) Drive link for the file and instructions which I'm following: https://drive.google.com/drive/folders/11wqlvb8SU2thMSfII4_Xm3Kc2fPSCZed?usp=share_link Please can someone from apple or in general help me to figure out what I'm doing wrong?
1
0
202
Jun ’25
Background recording app getting killed by watch dog.. how to avoid?
We have the necessary background recording entitlements, and for many users... do not run into any issues. However, there is a subset of users that routinely get recordings ending.. we have narrowed this down and believe it to be the work of the watch dog. First we removed the entire view hierarchy when app is backgrounded. There is just 'Text("Recording")' This got the CPU usage in profiler down to 0%. We saw massive improvements to recording success rate. We walked away assuming that was enough. However we are still seeing the same sort of crashes. All in the background. We're using Observation to drive audio state changes to a Live Activity. Are those Observations causing the problem? Why doesn't apple provide a better API to background audio? The internet is full of weird issues https://stackoverflow.com/questions/76010213/why-is-my-react-native-app-sometimes-terminated-in-the-background-while-tracking https://stackoverflow.com/questions/71656047/why-is-my-react-native-app-terminating-in-the-background-while-recording-ios-r https://github.com/expo/expo/issues/16807 This is such a terrible user experience. And we have very little visibility into what is happening and why. No where in apple documentation states that in order for background recording to work, the app can only be 'Text("Recording")' It does not outline a CPU or memory threshold. It just kills us.
2
0
426
Mar ’25
Best way to stream audio from file system
I am trying to stream audio from local filesystem. For that, I am trying to use an AVAssetResourceLoaderDelegate for an AVURLAsset. However, Content-Length is not known at the start. To overcome this, I tried several methods: Set content length as nil, in the AVAssetResourceLoadingContentInformationRequest Set content length to -1, in the ContentInformationRequest Both of these cause the AVPlayerItem to fail with an error. I also tried setting Content-Length as INT_MAX, and setting a renewalDate = Date(timeIntervalSinceNow: 5). However, that seems to be buggy. Even after updating the Content-Length to the correct value (e.g. X bytes) and finishing that loading request, the resource loader keeps getting requests with requestedOffset = X with dataRequest.requestsAllDataToEndOfResource = true. These requests keep coming indefinitely, and as a result it seems that the next item in the queue does not get played. Also, .AVPlayerItemDidPlayToEndTime notification does not get called. I wanted to check if this is an expected behavior or is there a bug in this implementation. Also, what is the recommended way to stream audio of unknown initial length from local file system? Thanks!
1
0
186
Mar ’25