Hi, i just wanna ask, Is it possible to run YOLOv3 on visionOS using the main camera to detect objects and show bounding boxes with labels in real-time? I’m wondering if camera access and custom models work for this, or if there’s a better way. Any tips?
General
RSS for tagExplore the power of machine learning within apps. Discuss integrating machine learning features, share best practices, and explore the possibilities for your app.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
Hi, DataScannerViewController does't recognize currencies less than 1.00 (e.g. 0.59 USD, 0.99 EUR, etc.). Why? How to solve the problem?
This feature is not described in Apple documentation, is there a solution?
This is my code:
func makeUIViewController(context: Context) -> DataScannerViewController {
let dataScanner = DataScannerViewController(recognizedDataTypes: [ .text(textContentType: .currency)])
return dataScanner
}
Hi everyone! 👋
I'm working on a C++ project using TensorFlow Lite and was wondering if anyone has a prebuilt TensorFlow Lite C++ library (libtensorflowlite) for macOS (Apple Silicon M1/M2) that they’d be willing to share.
I’m looking specifically for the TensorFlow Lite C++ API — something that lets me use tflite::Interpreter, tflite::FlatBufferModel, etc. Building it from source using Bazel on macOS has been quite challenging and time-consuming, so a ready-to-use .dylib or .a build along with the required headers would be incredibly helpful.
TensorFlow Lite version: v2.18.0 preferred
Target: macOS arm64 (Apple Silicon)
What I need:
libtensorflowlite.dylib or .a
Corresponding headers (ideally organized in a clean include/ folder)
If you have one available or know where I can find a reliable prebuilt version, I’d be super grateful. Thanks in advance! 🙏
From tensorflow-metal example:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: )
I know that Apple silicon uses UMA, and that memory copies are typical of CUDA, but wouldn't the GPU memory still be faster overall?
I have an iMac Pro with a Radeon Pro Vega 64 16 GB GPU and an Intel iMac with a Radeon Pro 5700 8 GB GPU.
But using tensorflow-metal is still WAY faster than using the CPUs. Thanks for that. I am surprised the 5700 is twice as fast as the Vega though.
When calling NLTagger.requestAssets with some languages, it hangs indefinitely both in the simulator and a device. This happens consistently for some languages like greek. An example call is NLTagger.requestAssets(for: .greek, tagScheme: .lemma). Other languages like french return immediately. I captured some logs from Console and found what looks like the repeated attempts to download the asset. I would expect the call to eventually terminate, either loading the asset or failing with an error.
Following WWDC24 video "Discover Swift enhancements in the Vision framework" recommendations (cfr video at 10'41"), I used the following code to perform multiple new iOS 18 `RecognizedTextRequest' in parallel.
Problem: if more than 2 request are run in parallel, the request will hang, leaving the app in a state where no more requests can be started. -> deadlock
I tried other ways to run the requests, but no matter the method employed, or what device I use: no more than 2 requests can ever be run in parallel.
func triggerDeadlock() {}
try await withThrowingTaskGroup(of: Void.self) { group in
// See: WWDC 2024 Discover Siwft enhancements in the Vision framework at 10:41
// ############## THIS IS KEY
let maxOCRTasks = 5 // On a real-device, if more than 2 RecognizeTextRequest are launched in parallel using tasks, the request hangs
// ############## THIS IS KEY
for idx in 0..<maxOCRTasks {
let url = ... // URL to some image
group.addTask {
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
}
var nextIndex = maxOCRTasks
for try await _ in group { // Wait for the result of the next child task that finished
if nextIndex < pageCount {
group.addTask {
let url = ... // URL to some image
// Perform OCR
let _ = await performOCRRequest(on: url: url)
}
nextIndex += 1
}
}
}
}
// MARK: - ASYNC/AWAIT version with iOS 18
@available(iOS 18, *)
func performOCRRequest(on url: URL) async throws -> [RecognizedText] {
// Create request
var request = RecognizeTextRequest() // Single request: no need for ImageRequestHandler
// Configure request
request.recognitionLevel = .accurate
request.automaticallyDetectsLanguage = true
request.usesLanguageCorrection = true
request.minimumTextHeightFraction = 0.016
// Perform request
let textObservations: [RecognizedTextObservation] = try await request.perform(on: url)
// Convert [RecognizedTextObservation] to [RecognizedText]
return textObservations.compactMap { observation in
observation.topCandidates(1).first
}
}
I also found this Swift forums post mentioning something very similar.
I also opened a feedback: FB17240843
How do I test the new RecognizeDocumentRequest API. Reference: https://www.youtube.com/watch?v=H-GCNsXdKzM
I am running Xcode Beta, however I only have one primary device that I cannot install beta software on.
Please provide a strategy for testing. Will simulator work?
The new capability is critical to my application, just what I need for structuring document scans and extraction.
Thank you.
I generate an array of random floats using the code shown below. However, I would like to do this with Double instead of Float. Are there any BNNS random number generators for double values, something like BNNSRandomFillUniformDouble? If not, is there a way I can convert BNNSNDArrayDescriptor from float to double?
import Accelerate
let n = 100_000_000
let result = Array<Float>(unsafeUninitializedCapacity: n) { buffer, initCount in
var descriptor = BNNSNDArrayDescriptor(data: buffer, shape: .vector(n))!
let randomGenerator = BNNSCreateRandomGenerator(BNNSRandomGeneratorMethodAES_CTR, nil)
BNNSRandomFillUniformFloat(randomGenerator, &descriptor, 0, 1)
initCount = n
}
Hey guys 👋
I’ve been thinking about a feature idea for iOS that could totally change the way we interact with apps like Twitter/X.
Imagine if we could define our own recommendation algorithm, and have an AI on the iPhone that replaces the suggested tweets in the feed with ones that match our personal interests — based on public tweets, and without hacking anything.
Kinda like a personalized "AI skin" over the app that curates content you actually care about. Feels like this would make content way more relevant and less algorithmically manipulative.
Would love to know what you all think — and if Apple could pull this off 🔥
Topic:
Machine Learning & AI
SubTopic:
General
Hello Apple Team,
Thank you for the recent Group Lab and for your continued work on advancing Xcode and developer tools.
I’d like to submit a feature request:
Are there any plans to introduce support for Agentic AI Mode (MCP protocol) in future versions of iOS or Xcode?
As developer tools evolve toward more intelligent and context-aware environments, the integration of agentic AI capabilities could significantly enhance productivity and unlock new creative workflows.
Looking forward to your consideration, and thank you again for the excellent session.
Best regards
Does CoreML object detection only support AABB (Axis-Aligned Bounding Boxes) or also OBB (Oriented Bounded Boxes)? If not, any way to do it using Apple frameworks?
Topic:
Machine Learning & AI
SubTopic:
General
I have a question. In China, long pressing a picture in the album can segment the target. Is this model a local model? Is there any information? Can developers use it?
Hey Devs,
I'm trying to create my own Real Time Text detection like this Apple project. https://developer.apple.com/documentation/vision/extracting-phone-numbers-from-text-in-images
I want to use the new iOS18 RecognizeTextRequest instead of the old VNRecognizeTextRequest in my SwiftUI project.
This is my delegate code with the camera setup. I removed region of interest for debugging but I'm trying to scan English words in books. The idea is to get one word in the ROI in the future. But I can't even get proper words so testing without ROI incase my math is wrong.
@Observable
class CameraManager: NSObject, AVCapturePhotoCaptureDelegate
...
override init() {
super.init()
setUpVisionRequest()
}
private func setUpVisionRequest() {
textRequest = RecognizeTextRequest(.revision3)
}
...
func setup() -> Bool {
captureSession.beginConfiguration()
guard
let captureDevice = AVCaptureDevice.default(
.builtInWideAngleCamera, for: .video, position: .back)
else {
return false
}
self.captureDevice = captureDevice
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice)
else {
return false
}
/// Check whether the session can add input.
guard captureSession.canAddInput(deviceInput) else {
print("Unable to add device input to the capture session.")
return false
}
/// Add the input and output to session
captureSession.addInput(deviceInput)
/// Configure the video data output
videoDataOutput.setSampleBufferDelegate(
self, queue: videoDataOutputQueue)
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
videoDataOutput.connection(with: .video)?
.preferredVideoStabilizationMode = .off
} else {
return false
}
// Set zoom and autofocus to help focus on very small text
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return false
}
captureSession.commitConfiguration()
// potential issue with background vs dispatchqueue ??
Task(priority: .background) {
captureSession.startRunning()
}
return true
}
}
// Issue here ???
extension CameraManager: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(
_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection
) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
Task {
textRequest.recognitionLevel = .fast
textRequest.recognitionLanguages = [Locale.Language(identifier: "en-US")]
do {
let observations = try await textRequest.perform(on: pixelBuffer)
for observation in observations {
let recognizedText = observation.topCandidates(1).first
print("recognized text \(recognizedText)")
}
} catch {
print("Recognition error: \(error.localizedDescription)")
}
}
}
}
The results I get look like this ( full page of English from a any book)
recognized text Optional(RecognizedText(string: e bnUI W4, confidence: 0.5))
recognized text Optional(RecognizedText(string: ?'U, confidence: 0.3))
recognized text Optional(RecognizedText(string: traQt4, confidence: 0.3))
recognized text Optional(RecognizedText(string: li, confidence: 0.3))
recognized text Optional(RecognizedText(string: 15,1,#, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllÈ, confidence: 0.3))
recognized text Optional(RecognizedText(string: vtrll, confidence: 0.3))
recognized text Optional(RecognizedText(string: 5,1,: 11, confidence: 0.5))
recognized text Optional(RecognizedText(string: 1141, confidence: 0.3))
recognized text Optional(RecognizedText(string: jllll ljiiilij41, confidence: 0.3))
recognized text Optional(RecognizedText(string: 2f4, confidence: 0.3))
recognized text Optional(RecognizedText(string: ktril, confidence: 0.3))
recognized text Optional(RecognizedText(string: ¥LLI, confidence: 0.3))
recognized text Optional(RecognizedText(string: 11[Itl,, confidence: 0.3))
recognized text Optional(RecognizedText(string: 'rtlÈ131, confidence: 0.3))
Even with ROI set to a specific rectangle Normalized to Vision, I get the same results with single characters returning gibberish.
Any help would be amazing thank you.
Am I using the buffer right ?
Am I using the new perform(on: CVPixelBuffer) right ?
Maybe I didn't set up my camera properly? I can provide code
Hi, recently i tried to fine-tune Gemma-2-2b mlx model on my macbook (24 GB UMA). The code started running, after few seconds i saw swap size reaching 50GB and ram around 23 GB and then it stopped. I ran the Gemma-2-2b (cuda) on colab, it ran and occupied 27 GB on A100 gpu and worked fine. Here i didn't experienced swap issue.
Now my question is if my UMA was more than 27 GB, i also would not have experienced swap disk issue.
Thanks.
Topic:
Machine Learning & AI
SubTopic:
General
It is vital for Apple to refine its OCR models to correctly distinguish between Khmer and Thai scripts. Incorrectly labeling Khmer text as Thai is more than a technical bug; it is a culturally insensitive error that impacts national identity, especially given the current geopolitical climate between Cambodia and Thailand. Implementing a more robust language-detection threshold would prevent these harmful misidentifications.
There is a significant logic flaw in the VNRecognizeTextRequest language detection when processing Khmer script. When the property automaticallyDetectsLanguage is set to true, the Vision framework frequently misidentifies Khmer characters as Thai.
While both scripts share historical roots, they are distinct languages with different alphabets. Currently, the model’s confidence threshold for distinguishing between these two scripts is too low, leading to incorrect OCR output in both developer-facing APIs and Apple’s native ecosystem (Preview, Live Text, and Photos).
import SwiftUI
import Vision
class TextExtractor {
func extractText(from data: Data, completion: @escaping (String) -> Void) {
let request = VNRecognizeTextRequest { (request, error) in
guard let observations = request.results as? [VNRecognizedTextObservation] else {
completion("No text found.")
return
}
let recognizedStrings = observations.compactMap { observation in
let str = observation.topCandidates(1).first?.string
return "{text: \(str!), confidence: \(observation.confidence)}"
}
completion(recognizedStrings.joined(separator: "\n"))
}
request.automaticallyDetectsLanguage = true // <-- This is the issue.
request.recognitionLevel = .accurate
let handler = VNImageRequestHandler(data: data, options: [:])
DispatchQueue.global(qos: .background).async {
do {
try handler.perform([request])
} catch {
completion("Failed to perform OCR: \(error.localizedDescription)")
}
}
}
}
Recognizing Khmer
Confidence Score is low for Khmer text. (The output is in Thai language with low confidence score)
Recognizing English
Confidence Score is high expected.
Recognizing Thai
Confidence Score is high as expected
Issues on Preview, Photos
Khmer text
Copied text
Kouk Pring Chroum Temple [19121 รอาสายสุกตีนานยารรีสใหิสรราภูชิตีนนสุฐตีย์ [รุก
เผือชิษาธอยกัตธ์ตายตราพาษชาณา ถวเชยาใบสราเบรถทีมูสินตราพาษชาณา ทีมูโษา เช็ก
อาษเชิษฐอารายสุกบดตพรธุรฯ ตากร"สุก"ผาตากรธกรธุกเยากสเผาพศฐตาสาย รัอรณาษ"ตีพย"
สเผาพกรกฐาภูชิสาเครๆผู:สุกรตีพาสเผาพสรอสายใผิตรรารตีพสๆ เดียอลายสุกตีน
ธาราชรติ ธิพรหณาะพูชุบละเาหLunet De Lajonquiere ผารูกรสาราพารผรผาสิตภพ ตารสิทูก ธิพิ
คุณที่นสายเระพบพเคเผาหนารเกะทรนภาษเราภุพเสารเราษทีเลิกสญาเราหรุฬารชสเกาก เรากุม
สงสอบานตรเราะากกต่ายภากายระตารุกเตียน
Recommended Solutions
1. Set a Threshold
Filter out the detected result where the threshold is less than or equal to 0.5, so that it would not output low quality text which can lead to the issue.
For example,
let recognizedStrings = observations.compactMap { observation in
if observation.confidence <= 0.5 {
return nil
}
let str = observation.topCandidates(1).first?.string
return "{text: \(str!), confidence: \(observation.confidence)}"
}
2. Add Khmer Language Support
This issue would never happen if the model has the capability to detect and recognize image with Khmer language.
Doc2Text GitHub: https://github.com/seanghay/Doc2Text-Swift
I have a series of shortcuts that I’ve written that use the “Use Model” action to do various things. For example, I have a shortcut “Clipboard Markdown to Notes” that takes the content of the clipboard, creates a new note in Notes, converts the markdown content to rich text, adds it to the note etc.
One key step is to analyze the markdown content with “Use Model” and generate a short descriptive title for the note.
I use the on-device model for this, but sometimes the content and prompt exceed the context window size and the action fails with an error message to that effect.
In that case, I’d like to either repeat the action using the Cloud model, or, if the error was a refusal, to prompt the user to enter a title to use.
I‘ve tried using an IF based on whether the response had any text in it, but that didn’t work. No matter what I’ve tried, I can’t seem to find a way to catch the error from Use Model, determine what the error was, and take appropriate action.
Is there a way to do this?
(And by the way, a huge ”thank you” to whoever had the idea of making AppIntents visible in Shortcuts and adding the Use Model action — has made a huge difference already, and it lets us see what Siri will be able to use as well.)
I’m trying to integrate Apple’s Translation framework in a Swift 6 project with Approachable Concurrency enabled.
I’m following the code here: https://developer.apple.com/documentation/translation/translating-text-within-your-app#Offer-a-custom-translation
And, specifically, inside the following code
.translationTask(configuration) { session in
do {
// Use the session the task provides to translate the text.
let response = try await session.translate(sourceText)
// Update the view with the translated result.
targetText = response.targetText
} catch {
// Handle any errors.
}
}
On the try await session.translate(…) line, the compiler complains that “Sending ‘session’ risks causing data races”.
Extended error message:
Sending main actor-isolated 'session' to @concurrent instance method 'translate' risks causing data races between @concurrent and main actor-isolated uses
I’ve downloaded Apple’s sample code (at the top of linked webpage), it compiles fine as-is on Xcode 26.4, but fails with the same error as soon as I switch the Swift Language Mode to Swift 6 in the project.
How can I fix this?
I'm implementing an LLM with Metal Performance Shader Graph, but encountered a very strange behavior, occasionally, the model will report an error message as this:
LLVM ERROR: SmallVector unable to grow. Requested capacity (9223372036854775808) is larger than maximum value for size type (4294967295)
and crash, the stack backtrace screenshot is attached. Note that 5th frame is
mlir::getIntValues<long long>
and 6th frame is
llvm::SmallVectorBase<unsigned int>::grow_pod
It looks like mlir mistakenly took a 64 bit value for a 32 bit type. Unfortunately, I could not found the source code of
mlir::getIntValues, maybe it's Apple's closed source fork of llvm for MPS implementation? Anyway, any opinion or suggestion on that?
Topic:
Machine Learning & AI
SubTopic:
General
In an under-development MacOS & iOS app, I need to identify various measurements from OCR'ed text: length, weight, counts per inch, area, percentage. The unit type (e.g. UnitLength) needs to be identified as well as the measurement's unit (e.g. .inches) in order to convert the measurement to the app's internal standard (e.g. centimetres), the value of which is stored the relevant CoreData entity.
The use of NLTagger and NLTokenizer is problematic because of the various representations of the measurements: e.g. "50g.", "50 g", "50 grams", "1 3/4 oz."
Currently, I use a bespoke algorithm based on String contains and step-wise evaluation of characters, which is reasonably accurate but requires frequent updating as further representations are detected.
I'm aware of the Python SpaCy model being capable of NER Measurement recognition, but am reluctant to incorporate a Python-based solution into a production app. (ref [https://developer.apple.com/forums/thread/30092])
My preference is for an open-source NER Measurement model that can be used as, or converted to, some form of a Swift compatible Machine Learning model. Does anyone know of such a model?
Incident Identifier: 4C22F586-71FB-4644-B823-A4B52D158057
CrashReporter Key: adc89b7506c09c2a6b3a9099cc85531bdaba9156
Hardware Model: Mac16,10
Process: PRISMLensCore [16561]
Path: /Applications/PRISMLens.app/Contents/Resources/app.asar.unpacked/node_modules/core-node/PRISMLensCore.app/PRISMLensCore
Identifier: com.prismlive.camstudio
Version: (null) ((null))
Code Type: ARM-64
Parent Process: ? [16560]
Date/Time: (null)
OS Version: macOS 15.4 (24E5228e)
Report Version: 104
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x00000000 at 0x0000000000000000
Crashed Thread: 34
Application Specific Information:
*** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[__NSArrayM insertObject:atIndex:]: object cannot be nil'
Thread 34 Crashed:
0 CoreFoundation 0x000000018ba4dde4 0x18b960000 + 974308 (__exceptionPreprocess + 164)
1 libobjc.A.dylib 0x000000018b512b60 0x18b4f8000 + 109408 (objc_exception_throw + 88)
2 CoreFoundation 0x000000018b97e69c 0x18b960000 + 124572 (-[__NSArrayM insertObject:atIndex:] + 1276)
3 Portrait 0x0000000257e16a94 0x257da3000 + 473748 (-[PTMSRResize addAdditionalOutput:] + 604)
4 Portrait 0x0000000257de91c0 0x257da3000 + 287168 (-[PTEffectRenderer initWithDescriptor:metalContext:useHighResNetwork:faceAttributesNetwork:humanDetections:prevTemporalState:asyncInitQueue:sharedResources:] + 6204)
5 Portrait 0x0000000257dab21c 0x257da3000 + 33308 (__33-[PTEffect updateEffectDelegate:]_block_invoke.241 + 164)
6 libdispatch.dylib 0x000000018b739b2c 0x18b738000 + 6956 (_dispatch_call_block_and_release + 32)
7 libdispatch.dylib 0x000000018b75385c 0x18b738000 + 112732 (_dispatch_client_callout + 16)
8 libdispatch.dylib 0x000000018b742350 0x18b738000 + 41808 (_dispatch_lane_serial_drain + 740)
9 libdispatch.dylib 0x000000018b742e2c 0x18b738000 + 44588 (_dispatch_lane_invoke + 388)
10 libdispatch.dylib 0x000000018b74d264 0x18b738000 + 86628 (_dispatch_root_queue_drain_deferred_wlh + 292)
11 libdispatch.dylib 0x000000018b74cae8 0x18b738000 + 84712 (_dispatch_workloop_worker_thread + 540)
12 libsystem_pthread.dylib 0x000000018b8ede64 0x18b8eb000 + 11876 (_pthread_wqthread + 292)
13 libsystem_pthread.dylib 0x000000018b8ecb74 0x18b8eb000 + 7028 (start_wqthread + 8)
Topic:
Machine Learning & AI
SubTopic:
General