Deploying visual person detection on resource-constrained Internet of Things (IoT) devices demands models that balance classification accuracy against severe memory and latency budgets. Kolmogorov–Arnold Networks (KANs), which replace fixed activation functions with learnable B-spline edges, have recently attracted interest as expressive alternatives to conventional multi-layer perceptrons, yet their suitability for embedded inference remains largely unexplored. In this work we couple a MobileNetV3 Small feature extractor with compact KAN classifiers of varying depth and evaluate the resulting hybrid architectures on the Visual Wake Words person-detection benchmark. We train 46 model variants spanning eight KAN topologies—including three ultra-compact heads with as few as two hidden neurons—six width multipliers, and three quantization strategies: native FP32, hybrid INT8 (backbone quantized, KANinFP32), and full TFLite INT8. The hybrid INT8 pathway achieves up to 3.3× compression with accuracy losses below 0.25 percentage points, while TFLite INT8 conversion yields sub-megabyte models (0.31–1.55MB) suitable for devices such as Seeed XIAO and ESP32-S3. We further document a notable finding: KAN layers are stored as INT64 rather than INT8 inside TFLite flatbuffers, limiting compression to 1.9–2.9× instead of the expected ∼4×. We address this limitation through an operator decomposition strategy that replaces non-standard operations (HardSigmoid, Gather) with equivalent compositions of TFLite-native operators, eliminating all INT64 and FP32 tensors from the converted model and enabling full XNNPACK delegation. Preliminary on-device measurements on an ESP32-S3 microcontroller and a Raspberry Pi 3 confirm deployment feasibility, with the smallest model (0.26MB) fitting entirely within internal SRAM and achieving 83.1% accuracy on-device—matching the desktop result to within 0.07pp after resolution of a per-channel quantization mismatch in the TFLite Micro runtime. Our findings offer concrete deployment guidelines for practitioners integrating KAN-based classifiers into resource-constrained inference pipelines and identify quantization of spline-based layers as a tractable engineering challenge for the TinyML community.

Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies / Faggi, D., Kuznetsov, O., Mereu, S., Galdelli, A., Frontoni, E., Arnesano, M.. - In: IEEE ACCESS. - ISSN 2169-3536. - (2026). [10.1109/ACCESS.2026.3693881]

Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies

GALDELLI, ALESSANDRO;FRONTONI, EMANUELE;ARNESANO, MARCO
2026-01-01

Abstract

Deploying visual person detection on resource-constrained Internet of Things (IoT) devices demands models that balance classification accuracy against severe memory and latency budgets. Kolmogorov–Arnold Networks (KANs), which replace fixed activation functions with learnable B-spline edges, have recently attracted interest as expressive alternatives to conventional multi-layer perceptrons, yet their suitability for embedded inference remains largely unexplored. In this work we couple a MobileNetV3 Small feature extractor with compact KAN classifiers of varying depth and evaluate the resulting hybrid architectures on the Visual Wake Words person-detection benchmark. We train 46 model variants spanning eight KAN topologies—including three ultra-compact heads with as few as two hidden neurons—six width multipliers, and three quantization strategies: native FP32, hybrid INT8 (backbone quantized, KANinFP32), and full TFLite INT8. The hybrid INT8 pathway achieves up to 3.3× compression with accuracy losses below 0.25 percentage points, while TFLite INT8 conversion yields sub-megabyte models (0.31–1.55MB) suitable for devices such as Seeed XIAO and ESP32-S3. We further document a notable finding: KAN layers are stored as INT64 rather than INT8 inside TFLite flatbuffers, limiting compression to 1.9–2.9× instead of the expected ∼4×. We address this limitation through an operator decomposition strategy that replaces non-standard operations (HardSigmoid, Gather) with equivalent compositions of TFLite-native operators, eliminating all INT64 and FP32 tensors from the converted model and enabling full XNNPACK delegation. Preliminary on-device measurements on an ESP32-S3 microcontroller and a Raspberry Pi 3 confirm deployment feasibility, with the smallest model (0.26MB) fitting entirely within internal SRAM and achieving 83.1% accuracy on-device—matching the desktop result to within 0.07pp after resolution of a per-channel quantization mismatch in the TFLite Micro runtime. Our findings offer concrete deployment guidelines for practitioners integrating KAN-based classifiers into resource-constrained inference pipelines and identify quantization of spline-based layers as a tractable engineering challenge for the TinyML community.
2026
Modeling, Quantization (signal), Accuracy, Internet of Things, Splines (mathematics), Training, Topology, Architecture, Computer architecture, Tensors, Internet of Things, INT8 quantization, Kolmogorov–Arnold networks, MobileNetV3, model compression, person detection, TinyML, Visual Wake Words
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/358181
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact