Stack Innovations / Services / AI & Automation / Computer Vision
Computer Vision · AI & Automation

Software that sees.
Pixels to structure.

Computer vision that looks at an image and returns something you can act on — boxes around the objects, text lifted off the page, regions masked out, and clean structured fields. Specialised detectors find it, Claude multimodal reads it, and every release is measured against an accuracy target before it ships.

/01Drag the volume · watch detection sharpen
Live · detection & extraction, running
Accuracy 97%
batch_2147.jpg · detect → extract → structure
Objects detected0
Accuracy (mAP)0%
Fields extracted0
p95 latency0ms
Images / day 12,000 imgs
Detection accuracy97%
Manual review cut−81%
Images processed120M
Median time-to-result0.4s
Trusted by teams shipping vision to production at
02 — Outcomes

Vision that earned trust.

A ledger of named systems where the line that moved was a real detection, a field read off a page, a defect caught — accuracy, review time, throughput, errors avoided. 6 of 30 shown · ledger updates as systems scale.

Northwind Docs
Document extraction · Invoices + forms
OCR plus Claude multimodal lifts line items, totals, and dates into clean JSON — keyed entry retired, exceptions routed to a human
−81%Manual keying
Cobalt Manufacturing
Defect detection · Line inspection
A YOLO detector flags scratches and misalignments on the line in real time, boxing each defect with a confidence score the QA team trusts
97%Defect recall
Vera Retail
Shelf analytics · Planogram audit
Shelf photos detected and counted per SKU, out-of-stock and misplacement flagged against the planogram — store walks replaced by a feed
3.4×Audit coverage
Lumen Identity
ID verification · KYC onboarding
Document detection, MRZ and field OCR, and tamper checks gate sign-up — low-confidence captures abstain and ask for a re-take, never guess
−67%Onboarding drop-off
Drift Insurance
Claims triage · Damage photos
Segment-and-assess on claim photos masks the damaged region, estimates severity, and routes the obvious cases straight to fast-track payout
−74%Time-to-estimate
Forge Logistics
Yard & dock · Container reads
Camera-based container and license-plate detection plus OCR logs every move automatically, eval-checked weekly so accuracy never drifts
+58%Reads automated
03 — The vision, live

The model doesn't guess.
It detects.

Point it at a scene and watch the pipeline run: a detector boxes every object it finds, the confidence threshold decides which survive, and the task toggle switches the job — detect objects, read text off the page, or segment regions. Turn on structure and the boxes become clean, typed fields you can store.

Vision stage · batch · 12,000 imgs/day
Detect → extract → structure
detection box above threshold below threshold
Detect every object in the scene and box it with a confidence score.
Confidence threshold0.45
Extracted structure · Claude multimodal
Detections
Avg confidence
Fields
Raise the threshold and noisy, low-confidence boxes drop away — precision climbs, but push too far and you miss real objects. That tradeoff is the whole job.
04 — Anatomy of the pipeline

Built like a pipeline,
not a prompt.

Quality hides in the stages between the pixels and the field — how you pre-process, which detector you run, whether you OCR or segment, how you turn boxes into typed structure. This is the room we work in: each stage measured, each model chosen for the job it's best at.

Vision pipeline · Northwind Document Extraction
Volume 12k imgs/day · Accuracy 97% · p95 0.4s
StageModelRoleSignal
IngestS3 · ffmpeg framesPull images and video frames, normalise format and EXIF orientationqueued
Pre-processOpenCVResize, denoise, deskew, colour-correct so the detector sees clean inputcleaned
DetectYOLO · Detectron2Box every object with a class label and a confidence scoreboxes
SegmentSegment Anything (SAM)Mask precise regions when a box isn't enough — damage, fields, instancesmasked
OCRPaddleOCR · TesseractRead text out of detected regions — line items, codes, MRZ, labelsread
ExtractClaude Opus 4.8 · multimodalUnderstand the crop and return typed, structured fields — JSON you can storestructured
ServeONNX · Triton · FastAPIRun optimised models on GPU, batched, behind a low-latency API0.4s
EvaluatemAP · Roboflow · LLM-as-judgeScore precision, recall & mAP on a frozen set every release97%
green measured & in target
live the stage running in the demo above
amber watch · below accuracy target
01
05 — Ship to production

Scope the task.

Before a single model runs, we pin down what counts as a correct detection, where the edge cases hide, and what the output has to look like downstream. Then we build an evaluation set — labelled images we'll grade every release against.

/ Week 00 · Scope & eval set
TaskDetect, OCR, or segment — and exactly what the structured output must contain
Edge casesGlare, occlusion, skew, low light, rare classes — the images that break naive models
Eval setLabelled images with ground-truth boxes & fields, frozen for grading
TargetmAP ≥ 0.95, abstain & route to human when confidence is low

Capture & label.

Vision lives or dies on data. We gather real images from the real environment — the actual cameras, lighting, and angles — then label them carefully. Garbage labels train garbage detectors; this is where most vision projects quietly fail.

/ Week 01 · Data & labels
CaptureReal cameras · lighting · angles · the messy field
LabelBoxes · masks · fields · Roboflow / CVAT
AugmentFlip · crop · jitter · synthetic edge cases
BalanceRare classes upsampled so they aren't ignored
SplitTrain / val / frozen test · no leakage

Detect & segment.

Pick the right detector for the job — YOLO for fast real-time boxes, Detectron2 for accuracy, Segment Anything when a box isn't precise enough. Fine-tune on your labels so it learns your classes, not a generic benchmark's.

/ Week 02 · Models
Detector · YOLO real-time / Detectron2 accuracy
Segmentation · Segment Anything for precise masks
Fine-tuned on your labelled classes
Confidence threshold tuned against the eval set
NMS & class balance verified on the frozen test split

Read & structure.

Boxes and masks aren't the answer — structure is. OCR reads text out of the regions, then Claude multimodal understands the crop and returns typed, validated fields. An honest "low confidence, send to a human" when the image doesn't support a clean read.

/ Week 03 · Extract & ground

Evaluate honestly.

Run the frozen eval set every change and score it — precision, recall, mAP, field accuracy. No "looks good in the demo." A number that moves on the real test images, or the change doesn't ship.

/ Week 04 · Evaluate
mAP@0.50.97 — boxes land on the right objects
Recall94% — few real objects missed
Field accuracy96% — extracted values match ground truth
Abstain rate5% — routed to human rather than guessed

Serve & monitor.

Live with optimised models on GPU, batched for throughput, and logging on every frame. We watch accuracy as cameras and lighting drift, catch model decay before users do, and keep the system seeing clearly as the world in front of the lens changes.

/ Ongoing · Serve & monitor
ONNX · Triton serving
GPU batching
Drift alerts
PII · face blur guards
Confidence gating
Abstain on low score
Weekly eval run
Human-in-the-loop review
06 — Why it compounds

An evaluated model improves.

Every eval run feeds the next: missed objects become new training data, false positives tighten the threshold, hard scenes get labelled and added. Ship-and-forget vision decays as cameras, lighting, and the world drift. Evaluated and re-trained, accuracy compounds.

Eval-driven by Stack Innovations — accuracy climbs as data and thresholds tighten
Ship-and-forget — plateaus, then decays as cameras and scenes drift from training
Representative of a typical 12-month engagement · detection accuracy (mAP) on a frozen evaluation set.
07 — Tools · honest kit

The kit, shown.

The models, runtimes, and tools we actually wire together to ingest, detect, segment, read, and structure. No mystery framework — just the kit that keeps detections accurate and fields clean.

Understanding
Claude Opus 4.8
Detection
YOLO
Segmentation
Segment Anything
Detection
Detectron2
Vision lib
OpenCV
OCR
PaddleOCR
Runtime
ONNX
Serving
Triton
Data & labels
Roboflow
Language
Python
Serving
FastAPI
Storage
S3
Start the build

Stop keying.
Start seeing. Your images.

A free vision audit to start — bring a folder of your real images and the result you need out of them, and we'll show you what a detection-to-structure pipeline would return, and where today's manual process falls down. A prototype, not a pitch.

Get a vision audit
Accent
Hero shader
Motion