AI in VisioForge .NET SDK¶
VisioForge AI support is implemented as ordinary Media Blocks. The same block instances can be placed in a manual MediaBlocksPipeline, inserted into VideoCaptureCoreX, or inserted into MediaPlayerCoreX.
The AI packages do not replace the media engines. They add pass-through processing blocks: media continues downstream, optional overlays are drawn into the frame, and the block raises its own event with recognition results.
Why on-device AI¶
Every block on this page runs locally, in-process, on ONNX Runtime (video) or Whisper.net/GGML (speech) — there is no cloud API call, no per-request billing, and no network dependency at inference time. That matters for three common scenarios:
- Privacy and compliance — video and audio frames never leave the device, which simplifies GDPR/CCPA/BIPA reviews for camera and microphone applications (see the privacy note on face recognition specifically).
- Offline and edge deployments — kiosks, industrial cameras, vehicles, and field devices can run recognition without connectivity.
- Predictable cost and latency — throughput depends on the hardware you run on, not on a third-party API's rate limits or per-call pricing.
Each block accepts an OnnxExecutionProvider (Auto, CPU, CUDA, DirectML, CoreML) and a DeviceId, so the same code can run CPU-only in a CI pipeline and take advantage of an NVIDIA, DirectX 12, or Apple GPU on a deployed machine without a code change. Auto picks the best provider present in the loaded ONNX Runtime native build at run time.
Packages¶
| Package | Main purpose |
|---|---|
VisioForge.DotNet.Core.AI | ONNX video AI: OCR, object detection, analytics, face recognition, license plates, background removal, and custom inference. |
VisioForge.DotNet.Core.AI.Whisper | Local speech-to-text with Whisper ASR and Silero VAD. |
Both packages reference the core SDK types. Add the same native runtime packages that your host application already uses for Media Blocks, Video Capture X, or Media Player X.
Blocks¶
| Block | Media | Event | Typical use | Details |
|---|---|---|---|---|
OcrBlock | Video | OnTextDetected | Recognize text regions with PaddleOCR models. | OCR |
YOLOObjectDetectorBlock | Video | OnObjectsDetected | Run standalone object detection on each frame. | Object detection |
ObjectAnalyticsBlock | Video | OnAnalyticsUpdated | Track objects over time, count line crossings, and monitor polygon zones. | Object analytics |
FaceRecognitionBlock | Video | OnFacesIdentified | Detect faces and match them against an enrolled gallery. | Face recognition |
LicensePlateRecognizerBlock | Video | OnPlateRecognized | Detect and read license plates. | License plate recognition |
BackgroundRemovalBlock | Video | none | Replace, blur, or make the background transparent. | Background removal |
OnnxInferenceBlock | Video | OnInference | Run a custom ONNX model and receive raw output tensors. | ONNX inference |
SpeechToTextBlock | Audio | OnSpeechRecognized | Transcribe live or file audio with Whisper. | Speech-to-text |
Choosing the right AI block¶
- Need to read text in a frame (signage, documents, screens)? Use
OcrBlock. - Need to read a specific vehicle license plate, not general text? Use
LicensePlateRecognizerBlock— it runs a dedicated plate detector plus a plate-specific OCR head, which is both more accurate and faster than pointing generic OCR at a whole scene. - Need boxes and labels for objects, one frame at a time? Use
YOLOObjectDetectorBlock. - Need to count people/vehicles crossing a line, or track dwell time in a zone, not just per-frame boxes? Use
ObjectAnalyticsBlock— it adds ByteTrack tracking, tripwires, and polygon zones on top of the same detector families. - Need to know who is in frame, not just that a person is in frame? Use
FaceRecognitionBlock. - Need a virtual background, blur, or transparent output for a call or stream? Use
BackgroundRemovalBlock. - Have a custom ONNX model that isn't one of the built-in detector or matting families? Use
OnnxInferenceBlockand own the post-processing yourself. - Need a transcript, live captions, or SRT/VTT subtitles from audio? Use
SpeechToTextBlock.
Supported integration paths¶
Use a manual Media Blocks pipeline when you need full topology control:
flowchart LR
Source["Source block"] --> AI["AI processing block"]
AI --> Sink["Renderer or output block"]
AI -. "event" .-> App["Application code"] Use VideoCaptureCoreX when the application already uses the high-level capture engine and only needs to insert one or more AI blocks into the capture graph. Register video or audio blocks before StartAsync.
Use MediaPlayerCoreX when the application already uses the high-level playback engine. Register video or audio blocks before OpenAsync / PlayAsync.
Lifecycle rules¶
AI blocks must be registered before the engine builds the pipeline:
VideoCaptureCoreX: add blocks beforeStartAsync.MediaPlayerCoreX: add blocks beforeOpenAsync/PlayAsync.- Manual Media Blocks: connect the block before
StartAsync.
After the pipeline starts, the pipeline owns wired block instances and disposes them when the session stops. Create a fresh block instance for the next capture or playback session.
Block events are raised from pipeline or block worker threads. Keep handlers short and marshal UI updates to the UI dispatcher or main thread.
More detail¶
Video AI blocks (VisioForge.DotNet.Core.AI):
- OCR — text recognition
- Object detection
- Object analytics — tracking, tripwires, and polygon zones
- Face recognition
- License plate recognition (ANPR)
- Background removal (matting)
- Generic ONNX inference
Speech-to-text (VisioForge.DotNet.Core.AI.Whisper):
Engine integration:
Frequently Asked Questions¶
Do the AI blocks require an internet connection to run?¶
No. Inference is entirely local, using ONNX Runtime (video blocks) or Whisper.net/GGML (SpeechToTextBlock). No frame or audio sample is sent to a cloud service at inference time.
Which platforms do the AI blocks support?¶
The same cross-platform blocks used in Media Blocks pipelines, VideoCaptureCoreX, and MediaPlayerCoreX — Windows, macOS, Linux, Android, and iOS.
Do I need a GPU?¶
No. Every block defaults to OnnxExecutionProvider.Auto, which runs on the CPU when no GPU provider is available. Setting Provider to CUDA, DirectML, or CoreML accelerates inference when the corresponding GPU and ONNX Runtime build are present.
Where do I get the ONNX and Whisper model files?¶
Model weights are not shipped inside the VisioForge.DotNet.Core.AI / VisioForge.DotNet.Core.AI.Whisper NuGet packages. Your application supplies the .onnx / .bin files — point the block's settings at a local path. The SDK's own demos download the models they use from GitHub Releases on first run and cache them locally.
What license applies to the models the demos use?¶
It varies by model family and is independent of the SDK's own license — see the "Models and licensing" section on each block's page (OCR, object detection, face recognition, background removal). In short: PP-OCR, YOLOX, RT-DETR, YuNet, SFace, and the FastALPR ANPR models are Apache-2.0/MIT; stock Ultralytics YOLOv8 weights are AGPL-3.0 and need a commercial Ultralytics license in a closed-source product; Whisper GGML weights are MIT.
Can I run more than one AI block in the same pipeline?¶
Yes. Chain multiple video blocks (for example OcrBlock then BackgroundRemovalBlock) by connecting Output to Input in sequence, or register several video/audio blocks on VideoCaptureCoreX/MediaPlayerCoreX with Video_Processing_AddBlock/Audio_Processing_AddBlock. Each block adds its own inference cost to the pipeline, so measure end-to-end performance on your target hardware when combining several.