Defect Taxonomy & Clinical Impact
Seven defect classes were defined through a root-cause analysis of 1,847 failed assay runs over 18 months: Class 1 — Macro-bubbles (>2mm, complete chamber fill failure, 100% assay invalidity rate); Class 2 — Micro-bubbles (<2mm partial, 34% invalid rate); Class 3 — Reagent wicking (insufficient fill, 78% invalid); Class 4 — Spillover contamination (inter-chamber, 91% cross-contamination); Class 5 — Particulate contamination (fibrin clot, debris, 45% invalid); Class 6 — Optical read-zone fogging (condensation, 62% OD shift >5%); Class 7 — Disc seating errors (disc lifted, 100% mechanical failure). Classes 1, 3, 4, and 7 were designated as 'clinically significant' — mandatory result invalidation.
Dataset Construction & Labelling
28,414 disc images were captured using the EtherX's onboard 5MP CMOS camera (OV5640, 2592×1944 px) immediately after spin completion. Images were labelled by 3 independent biomedical scientists using a custom annotation tool (Labelbox). Inter-annotator agreement: Cohen's κ = 0.89 (near-perfect). Class imbalance: normal discs (Class 0) represented 71% of dataset; synthetic augmentation (random crop, rotation ±15°, Gaussian blur σ=0.5–2.0, brightness jitter ±20%) was applied to minority classes to achieve a 3:1 normal-to-defect ratio in the training set. Dataset split: 80% train, 10% validation, 10% held-out test.
Model Architecture & Transfer Learning
ResNet-50 pretrained on ImageNet (PyTorch, torchvision) was fine-tuned for 7-class classification. Final 3 layers replaced with: GlobalAveragePooling → Dropout(0.4) → Dense(256, ReLU) → Dense(7, Softmax). Training: Adam optimiser (lr=1e-4, cosine annealing), batch size=32, 40 epochs. Mixed-precision (FP16) training on NVIDIA A100. Class-weighted cross-entropy loss to penalise false negatives on clinically significant classes ×3 vs. standard weight. Best validation F1: 0.943. Grad-CAM heatmaps were generated for all defect predictions to confirm model attention on physically meaningful image regions — validated by the labelling scientists.
On-Device Deployment & Latency Optimisation
The model was optimised for on-device inference using ONNX Runtime with INT8 quantisation (post-training quantisation, <1% F1 degradation). On the EtherX ARM Cortex-A72 (4-core, 1.5 GHz with NEON SIMD): inference time 280ms per image vs. 45s manual inspection. Model size post-quantisation: 24.7 MB — fits within the 32 MB LPDDR4 allocation for AI inference. Calibration set for INT8 quantisation: 200 representative images covering all 7 classes. Edge cases (borderline micro-bubbles near the 2mm threshold) are escalated to a 'human review' queue rather than auto-rejected — affecting <3.2% of flagged images.
