Core Image for depth maps & segmentation masks: numeric fidelity issues when rendering CIImage to CVPixelBuffer (looking for Architecture suggestions)

Question

Created 2w

Replies 2

Boosts 0

Participants 2

Hello All,

I’m working on a computer-vision–heavy iOS application that uses the camera, LiDAR depth maps, and semantic segmentation to reason about the environment (object identification, localization and measurement - not just visualization).

Current architecture

I initially built the image pipeline around CIImage as a unifying abstraction. It seemed like a good idea because:

CIImage integrates cleanly with Vision, ARKit, AVFoundation, Metal, Core Graphics, etc.
It provides a rich set of out-of-the-box transforms and filters.
It is immutable and thread-safe, which significantly simplified concurrency in a multi-queue pipeline.

The LiDAR depth maps, semantic segmentation masks, etc. were treated as CIImages, with conversion to CVPixelBuffer or MTLTexture only at the edges when required.

Problem

I’ve run into cases where Core Image transformations do not preserve numeric fidelity for non-visual data.

Example:

Rendering a CIImage-backed segmentation mask into a larger CVPixelBuffer can cause label values to change in predictable but incorrect ways.

This occurs even when:

using nearest-neighbor sampling
disabling color management (workingColorSpace / outputColorSpace = NSNull)
applying identity or simple affine transforms

I’ve confirmed via controlled tests that:

Metal → CVPixelBuffer paths preserve values correctly
CIImage → CVPixelBuffer paths can introduce value changes when resampling or expanding the render target

This makes CIImage unsafe as a source of numeric truth for segmentation masks and depth-based logic, even though it works well for visualization, and I should have realized this much sooner.

Direction I’m considering

I’m now considering refactoring toward more intent-based abstractions instead of a single image type, for example:

Visual images: CIImage (camera frames, overlays, debugging, UI)
Scalar fields: depth / confidence maps backed by CVPixelBuffer + Metal
Label maps: segmentation masks backed by integer-preserving buffers (no interpolation, no transforms)

In this model, CIImage would still be used extensively — but primarily for visualization and perceptual processing, not as the container for numerically sensitive data.

Thread safety concern

One of the original advantages of CIImage was that it is thread-safe by design, and that was my biggest incentive.

For CVPixelBuffer / MTLTexture–backed data, I’m considering enforcing thread safety explicitly via:

Swift Concurrency (actor-owned data, explicit ownership)

Questions

For those may have experience with CV / AR / imaging-heavy iOS apps, I was hoping to know the following:

Is this separation of image intent (visual vs numeric vs categorical) a reasonable architectural direction?
Do you generally keep CIImage at the heart of your pipeline, or push it to the edges (visualization only)?
How do you manage thread safety and ownership when working heavily with CVPixelBuffer and Metal? Using actor-based abstractions, GCD, or adhoc?
Are there any best practices or gotchas around using Core Image with depth maps or segmentation masks that I should be aware of?

I’d really appreciate any guidance or experience-based advice. I suspect I’ve hit a boundary of Core Image’s design, and I’m trying to refactor in a way that doesn't involve too much immediate tech debt, remains robust and maintainable long-term.

Thank you in advance!

Boost

Answer 1

FrankSchlegel OP

2w

What is the pixel format of the CVPixelBuffer in question?

0

Answer 2

FrankSchlegel OP

8h

The problem might be this:

Core Image uses 16-bit float RGBA as the default working format. That means that, whenever it needs an intermediate buffer for the rendering, it will create a 4-channel 16-bit float surface to render into. This also meant that your 1-channel unsigned integer values will automatically be mapped to float values in 0.0...1.0. That's probably where you lose precision.

There are a few options to circumvent this:

You could set the workingFormat context option to .L8 or .R8. However, this means all intermediate buffers will have that format. If you want to mix processing of the segmentation mask with other images, this won't work. If you only want to process the mask separately, you can set up a separate CIContext with this option. Note, however, that most built-in CIFilters assume a floating-point working format and might not perform well with this format.
You can process your segmentation map with Metal (as you suggested) as part of your CIFilter pipeline using a CIImageProcessorKernel. For the kernel, you can set the formatForInput(...) and the outputFormat to .R8. This should tell CI that it doesn't need to convert the segmentation mask before passing it to your processor kernel. In the process method, you can access the input's Metal texture and perform custom Metal processing with it, rendering into the output's texture (which is then also R8 format). This way, you won't lose any precision.

I think the second option is the best choice here as you get the best of both worlds (custom Metal processing + CI integration).

Tip: You can always use CI_PRINT_TREE to check the format of the intermediate buffers CI is using during rendering.

0