BiQadx
AI ResearchQ1 2024 · 14 min read

Deep Learning for Automated Image Anomaly Detection

Manual visual inspection of diagnostic disc runs — checking for micro-bubbles, reagent wicking failures, and pipetting artefacts — requires 45 seconds of trained technician time per disc. This research log documents training a convolutional neural network (ResNet-50 fine-tuned on 28,000 labelled disc images) to perform equivalent inspection in 280ms with 97.3% sensitivity for clinically significant artefacts.

BQ
BiQadx Core Engineering
Q1 2024
14 min read
97.3%
Defect Sensitivity
Clinically significant artefacts
280ms
Inference Time
On-device ARM Cortex-A72
28,000
Training Images
7 defect classes labelled
◆ Engineering Process Flow
1
DATA
2
TRAIN
3
EVALUATE
4
DEPLOY
5
MONITOR
◆ Key Findings
  • Model achieves >96% sensitivity on all 4 clinically-critical defect classes — exceeds the 95% minimum sensitivity specification for result invalidation decisions
  • INT8 quantisation achieves 280ms inference on ARM Cortex-A72 with <1% F1 degradation vs. FP32 — real-time inspection adds no perceptible delay to the assay workflow
  • Grad-CAM validation confirmed model attends to physically correct image regions for all 7 defect classes — interpretability requirement satisfied for IEC 62304 SaMD documentation
01

Defect Taxonomy & Clinical Impact

Seven defect classes were defined through a root-cause analysis of 1,847 failed assay runs over 18 months: Class 1 — Macro-bubbles (>2mm, complete chamber fill failure, 100% assay invalidity rate); Class 2 — Micro-bubbles (<2mm partial, 34% invalid rate); Class 3 — Reagent wicking (insufficient fill, 78% invalid); Class 4 — Spillover contamination (inter-chamber, 91% cross-contamination); Class 5 — Particulate contamination (fibrin clot, debris, 45% invalid); Class 6 — Optical read-zone fogging (condensation, 62% OD shift >5%); Class 7 — Disc seating errors (disc lifted, 100% mechanical failure). Classes 1, 3, 4, and 7 were designated as 'clinically significant' — mandatory result invalidation.

02

Dataset Construction & Labelling

28,414 disc images were captured using the EtherX's onboard 5MP CMOS camera (OV5640, 2592×1944 px) immediately after spin completion. Images were labelled by 3 independent biomedical scientists using a custom annotation tool (Labelbox). Inter-annotator agreement: Cohen's κ = 0.89 (near-perfect). Class imbalance: normal discs (Class 0) represented 71% of dataset; synthetic augmentation (random crop, rotation ±15°, Gaussian blur σ=0.5–2.0, brightness jitter ±20%) was applied to minority classes to achieve a 3:1 normal-to-defect ratio in the training set. Dataset split: 80% train, 10% validation, 10% held-out test.

03

Model Architecture & Transfer Learning

ResNet-50 pretrained on ImageNet (PyTorch, torchvision) was fine-tuned for 7-class classification. Final 3 layers replaced with: GlobalAveragePooling → Dropout(0.4) → Dense(256, ReLU) → Dense(7, Softmax). Training: Adam optimiser (lr=1e-4, cosine annealing), batch size=32, 40 epochs. Mixed-precision (FP16) training on NVIDIA A100. Class-weighted cross-entropy loss to penalise false negatives on clinically significant classes ×3 vs. standard weight. Best validation F1: 0.943. Grad-CAM heatmaps were generated for all defect predictions to confirm model attention on physically meaningful image regions — validated by the labelling scientists.

04

On-Device Deployment & Latency Optimisation

The model was optimised for on-device inference using ONNX Runtime with INT8 quantisation (post-training quantisation, <1% F1 degradation). On the EtherX ARM Cortex-A72 (4-core, 1.5 GHz with NEON SIMD): inference time 280ms per image vs. 45s manual inspection. Model size post-quantisation: 24.7 MB — fits within the 32 MB LPDDR4 allocation for AI inference. Calibration set for INT8 quantisation: 200 representative images covering all 7 classes. Edge cases (borderline micro-bubbles near the 2mm threshold) are escalated to a 'human review' queue rather than auto-rejected — affecting <3.2% of flagged images.

CNN Defect Detection Performance by Class (Held-Out Test Set, n=2,841)
Defect ClassSensitivitySpecificityPPVClinical Significance
Macro-bubble (>2mm)99.2%99.8%99.1%CRITICAL
Micro-bubble (<2mm)94.7%97.3%91.4%MODERATE
Reagent wicking failure98.1%99.1%97.6%CRITICAL
Spillover contamination96.8%98.7%95.2%CRITICAL
Particulate contamination91.3%97.6%89.1%MODERATE
Optical zone fogging93.6%98.2%92.8%MODERATE
Disc seating error99.7%99.9%99.6%CRITICAL
ResNet-50 INT8 quantised. Inference on ARM Cortex-A72. Ground truth from 3-scientist consensus labelling (κ=0.89).BiQadx Engineering Data

Research Context Only: This document is published as an engineering log for transparency. All content describes R&D-phase investigations. No clinical diagnostic claims are made. This is not a regulatory filing or clinical performance specification.

Engineering LibraryINS-011 / BiQadx © 2026
BiQadx content is R&D / prototype / pilot-stage. No clinical claims. For planning and technical understanding only. Not medical advice.