Computer Vision · AI & Automation

Software that sees.
Pixels to structure.

Computer vision that looks at an image and returns something you can act on — boxes around the objects, text lifted off the page, regions masked out, and clean structured fields. Specialised detectors find it, Claude multimodal reads it, and every release is measured against an accuracy target before it ships.

Start a project → See it detect →

/01Drag the volume · watch detection sharpen

Live · detection & extraction, running

Accuracy 97%

◳ batch_2147.jpg · detect → extract → structure

Objects detected0

Accuracy (mAP)0%

Fields extracted0

p95 latency0ms

Images / day 12,000 imgs

Detection accuracy97%

Manual review cut−81%

Images processed120M

Median time-to-result0.4s

02 — Outcomes

Vision that earned trust.

A ledger of named systems where the line that moved was a real detection, a field read off a page, a defect caught — accuracy, review time, throughput, errors avoided. 6 of 30 shown · ledger updates as systems scale.

Northwind Docs

Document extraction · Invoices + forms

OCR plus Claude multimodal lifts line items, totals, and dates into clean JSON — keyed entry retired, exceptions routed to a human

−81%Manual keying

Cobalt Manufacturing

Defect detection · Line inspection

A YOLO detector flags scratches and misalignments on the line in real time, boxing each defect with a confidence score the QA team trusts

97%Defect recall

Vera Retail

Shelf analytics · Planogram audit

Shelf photos detected and counted per SKU, out-of-stock and misplacement flagged against the planogram — store walks replaced by a feed

3.4×Audit coverage

Lumen Identity

ID verification · KYC onboarding

Document detection, MRZ and field OCR, and tamper checks gate sign-up — low-confidence captures abstain and ask for a re-take, never guess

−67%Onboarding drop-off

Drift Insurance

Claims triage · Damage photos

Segment-and-assess on claim photos masks the damaged region, estimates severity, and routes the obvious cases straight to fast-track payout

−74%Time-to-estimate

Forge Logistics

Yard & dock · Container reads

Camera-based container and license-plate detection plus OCR logs every move automatically, eval-checked weekly so accuracy never drifts

+58%Reads automated

03 — The vision, live

The model doesn't guess.
It detects.

Point it at a scene and watch the pipeline run: a detector boxes every object it finds, the confidence threshold decides which survive, and the task toggle switches the job — detect objects, read text off the page, or segment regions. Turn on structure and the boxes become clean, typed fields you can store.

Vision stage · batch · 12,000 imgs/day

Detect → extract → structure

detection box above threshold below threshold

◳ Detect every object in the scene and box it with a confidence score.

Confidence threshold0.45

Extracted structure · Claude multimodal

Detections—

Avg confidence—

Fields—

Raise the threshold and noisy, low-confidence boxes drop away — precision climbs, but push too far and you miss real objects. That tradeoff is the whole job.

04 — Anatomy of the pipeline

Built like a pipeline,
not a prompt.

Quality hides in the stages between the pixels and the field — how you pre-process, which detector you run, whether you OCR or segment, how you turn boxes into typed structure. This is the room we work in: each stage measured, each model chosen for the job it's best at.

Vision pipeline · Northwind Document Extraction

Volume 12k imgs/day · Accuracy 97% · p95 0.4s

StageModelRoleSignal

IngestS3 · ffmpeg framesPull images and video frames, normalise format and EXIF orientationqueued

Pre-processOpenCVResize, denoise, deskew, colour-correct so the detector sees clean inputcleaned

DetectYOLO · Detectron2Box every object with a class label and a confidence scoreboxes

SegmentSegment Anything (SAM)Mask precise regions when a box isn't enough — damage, fields, instancesmasked

OCRPaddleOCR · TesseractRead text out of detected regions — line items, codes, MRZ, labelsread

ExtractClaude Opus 4.8 · multimodalUnderstand the crop and return typed, structured fields — JSON you can storestructured

ServeONNX · Triton · FastAPIRun optimised models on GPU, batched, behind a low-latency API0.4s

EvaluatemAP · Roboflow · LLM-as-judgeScore precision, recall & mAP on a frozen set every release97%

green measured & in target

live the stage running in the demo above

amber watch · below accuracy target

05 — Ship to production

Scope the task.

Before a single model runs, we pin down what counts as a correct detection, where the edge cases hide, and what the output has to look like downstream. Then we build an evaluation set — labelled images we'll grade every release against.

/ Week 00 · Scope & eval set

TaskDetect, OCR, or segment — and exactly what the structured output must contain

Edge casesGlare, occlusion, skew, low light, rare classes — the images that break naive models

Eval setLabelled images with ground-truth boxes & fields, frozen for grading

TargetmAP ≥ 0.95, abstain & route to human when confidence is low

Capture & label.

Vision lives or dies on data. We gather real images from the real environment — the actual cameras, lighting, and angles — then label them carefully. Garbage labels train garbage detectors; this is where most vision projects quietly fail.

/ Week 01 · Data & labels

CaptureReal cameras · lighting · angles · the messy field

LabelBoxes · masks · fields · Roboflow / CVAT

AugmentFlip · crop · jitter · synthetic edge cases

BalanceRare classes upsampled so they aren't ignored

SplitTrain / val / frozen test · no leakage

Detect & segment.

Pick the right detector for the job — YOLO for fast real-time boxes, Detectron2 for accuracy, Segment Anything when a box isn't precise enough. Fine-tune on your labels so it learns your classes, not a generic benchmark's.

/ Week 02 · Models

Detector · YOLO real-time / Detectron2 accuracy

Segmentation · Segment Anything for precise masks

Fine-tuned on your labelled classes

Confidence threshold tuned against the eval set

NMS & class balance verified on the frozen test split

Read & structure.

Boxes and masks aren't the answer — structure is. OCR reads text out of the regions, then Claude multimodal understands the crop and returns typed, validated fields. An honest "low confidence, send to a human" when the image doesn't support a clean read.

/ Week 03 · Extract & ground

Evaluate honestly.

Run the frozen eval set every change and score it — precision, recall, mAP, field accuracy. No "looks good in the demo." A number that moves on the real test images, or the change doesn't ship.

/ Week 04 · Evaluate

mAP@0.50.97 — boxes land on the right objects

Recall94% — few real objects missed

Field accuracy96% — extracted values match ground truth

Abstain rate5% — routed to human rather than guessed

Serve & monitor.

Live with optimised models on GPU, batched for throughput, and logging on every frame. We watch accuracy as cameras and lighting drift, catch model decay before users do, and keep the system seeing clearly as the world in front of the lens changes.

/ Ongoing · Serve & monitor

ONNX · Triton serving

GPU batching

Drift alerts

PII · face blur guards

Confidence gating

Abstain on low score

Weekly eval run

Human-in-the-loop review

06 — Why it compounds

An evaluated model improves.

Every eval run feeds the next: missed objects become new training data, false positives tighten the threshold, hard scenes get labelled and added. Ship-and-forget vision decays as cameras, lighting, and the world drift. Evaluated and re-trained, accuracy compounds.

Eval-driven by Stack Innovations — accuracy climbs as data and thresholds tighten

Ship-and-forget — plateaus, then decays as cameras and scenes drift from training

Representative of a typical 12-month engagement · detection accuracy (mAP) on a frozen evaluation set.

Software that sees.
Pixels to structure.

Vision that earned trust.

The model doesn't guess.
It detects.

Built like a pipeline,
not a prompt.

Scope the task.

Capture & label.

Detect & segment.

Read & structure.

Evaluate honestly.

Serve & monitor.

An evaluated model improves.

The kit, shown.

Stop keying.
Start seeing. Your images.

Software that sees. Pixels to structure.

Vision that earned trust.

The model doesn't guess.It detects.

Built like a pipeline,not a prompt.

Scope the task.

Capture & label.

Detect & segment.

Read & structure.

Evaluate honestly.

Serve & monitor.

An evaluated model improves.

The kit, shown.

Stop keying.Start seeing. Your images.

Software that sees.
Pixels to structure.

The model doesn't guess.
It detects.

Built like a pipeline,
not a prompt.

Stop keying.
Start seeing. Your images.