Deploying visual person detection on resource-constrained Internet of Things (IoT) devices demands models that balance classification accuracy against severe memory and latency budgets. Kolmogorov–Arnold Networks (KANs), which replace fixed activation functions with learnable B-spline edges, have recently attracted interest as expressive alternatives to conventional multi-layer perceptrons, yet their suitability for embedded inference remains largely unexplored. In this work we couple a MobileNetV3 Small feature extractor with compact KAN classifiers of varying depth and evaluate the resulting hybrid architectures on the Visual Wake Words person-detection benchmark. We train 46 model variants spanning eight KAN topologies—including three ultra-compact heads with as few as two hidden neurons—six width multipliers, and three quantization strategies: native FP32, hybrid INT8 (backbone quantized, KANinFP32), and full TFLite INT8. The hybrid INT8 pathway achieves up to 3.3× compression with accuracy losses below 0.25 percentage points, while TFLite INT8 conversion yields sub-megabyte models (0.31–1.55MB) suitable for devices such as Seeed XIAO and ESP32-S3. We further document a notable finding: KAN layers are stored as INT64 rather than INT8 inside TFLite flatbuffers, limiting compression to 1.9–2.9× instead of the expected ∼4×. We address this limitation through an operator decomposition strategy that replaces non-standard operations (HardSigmoid, Gather) with equivalent compositions of TFLite-native operators, eliminating all INT64 and FP32 tensors from the converted model and enabling full XNNPACK delegation. Preliminary on-device measurements on an ESP32-S3 microcontroller and a Raspberry Pi 3 confirm deployment feasibility, with the smallest model (0.26MB) fitting entirely within internal SRAM and achieving 83.1% accuracy on-device—matching the desktop result to within 0.07pp after resolution of a per-channel quantization mismatch in the TFLite Micro runtime. Our findings offer concrete deployment guidelines for practitioners integrating KAN-based classifiers into resource-constrained inference pipelines and identify quantization of spline-based layers as a tractable engineering challenge for the TinyML community.
Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies / Faggi, D., Kuznetsov, O., Mereu, S., Galdelli, A., Frontoni, E., Arnesano, M.. - In: IEEE ACCESS. - ISSN 2169-3536. - (2026). [10.1109/ACCESS.2026.3693881]
Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies
GALDELLI, ALESSANDRO;FRONTONI, EMANUELE;ARNESANO, MARCO
2026-01-01
Abstract
Deploying visual person detection on resource-constrained Internet of Things (IoT) devices demands models that balance classification accuracy against severe memory and latency budgets. Kolmogorov–Arnold Networks (KANs), which replace fixed activation functions with learnable B-spline edges, have recently attracted interest as expressive alternatives to conventional multi-layer perceptrons, yet their suitability for embedded inference remains largely unexplored. In this work we couple a MobileNetV3 Small feature extractor with compact KAN classifiers of varying depth and evaluate the resulting hybrid architectures on the Visual Wake Words person-detection benchmark. We train 46 model variants spanning eight KAN topologies—including three ultra-compact heads with as few as two hidden neurons—six width multipliers, and three quantization strategies: native FP32, hybrid INT8 (backbone quantized, KANinFP32), and full TFLite INT8. The hybrid INT8 pathway achieves up to 3.3× compression with accuracy losses below 0.25 percentage points, while TFLite INT8 conversion yields sub-megabyte models (0.31–1.55MB) suitable for devices such as Seeed XIAO and ESP32-S3. We further document a notable finding: KAN layers are stored as INT64 rather than INT8 inside TFLite flatbuffers, limiting compression to 1.9–2.9× instead of the expected ∼4×. We address this limitation through an operator decomposition strategy that replaces non-standard operations (HardSigmoid, Gather) with equivalent compositions of TFLite-native operators, eliminating all INT64 and FP32 tensors from the converted model and enabling full XNNPACK delegation. Preliminary on-device measurements on an ESP32-S3 microcontroller and a Raspberry Pi 3 confirm deployment feasibility, with the smallest model (0.26MB) fitting entirely within internal SRAM and achieving 83.1% accuracy on-device—matching the desktop result to within 0.07pp after resolution of a per-channel quantization mismatch in the TFLite Micro runtime. Our findings offer concrete deployment guidelines for practitioners integrating KAN-based classifiers into resource-constrained inference pipelines and identify quantization of spline-based layers as a tractable engineering challenge for the TinyML community.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


