Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies

IRIS

Deploying visual person detection on resource-constrained Internet of Things (IoT) devices demands models that balance classification accuracy against severe memory and latency budgets. Kolmogorov–Arnold Networks (KANs), which replace fixed activation functions with learnable B-spline edges, have recently attracted interest as expressive alternatives to conventional multi-layer perceptrons, yet their suitability for embedded inference remains largely unexplored. In this work we couple a MobileNetV3 Small feature extractor with compact KAN classifiers of varying depth and evaluate the resulting hybrid architectures on the Visual Wake Words person-detection benchmark. We train 46 model variants spanning eight KAN topologies—including three ultra-compact heads with as few as two hidden neurons—six width multipliers, and three quantization strategies: native FP32, hybrid INT8 (backbone quantized, KANinFP32), and full TFLite INT8. The hybrid INT8 pathway achieves up to 3.3× compression with accuracy losses below 0.25 percentage points, while TFLite INT8 conversion yields sub-megabyte models (0.31–1.55MB) suitable for devices such as Seeed XIAO and ESP32-S3. We further document a notable finding: KAN layers are stored as INT64 rather than INT8 inside TFLite flatbuffers, limiting compression to 1.9–2.9× instead of the expected ∼4×. We address this limitation through an operator decomposition strategy that replaces non-standard operations (HardSigmoid, Gather) with equivalent compositions of TFLite-native operators, eliminating all INT64 and FP32 tensors from the converted model and enabling full XNNPACK delegation. Preliminary on-device measurements on an ESP32-S3 microcontroller and a Raspberry Pi 3 confirm deployment feasibility, with the smallest model (0.26MB) fitting entirely within internal SRAM and achieving 83.1% accuracy on-device—matching the desktop result to within 0.07pp after resolution of a per-channel quantization mismatch in the TFLite Micro runtime. Our findings offer concrete deployment guidelines for practitioners integrating KAN-based classifiers into resource-constrained inference pipelines and identify quantization of spline-based layers as a tractable engineering challenge for the TinyML community.

Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies / Faggi, D., Kuznetsov, O., Mereu, S., Galdelli, A., Frontoni, E., Arnesano, M.. - In: IEEE ACCESS. - ISSN 2169-3536. - 14:(2026), pp. 81696-81712. [10.1109/ACCESS.2026.3693881]

Hybrid CNN-KAN Architectures for Ultra-Compact Person Detection on IoT Devices: A Systematic Evaluation of Quantization Strategies

FAGGI, DANIELE^Primo;KUZNETSOV, OLEKSANDR;MEREU, STEFANO;GALDELLI, ALESSANDRO;FRONTONI, EMANUELE;ARNESANO, MARCO^Ultimo

2026-01-01

Abstract

Deploying visual person detection on resource-constrained Internet of Things (IoT) devices demands models that balance classification accuracy against severe memory and latency budgets. Kolmogorov–Arnold Networks (KANs), which replace fixed activation functions with learnable B-spline edges, have recently attracted interest as expressive alternatives to conventional multi-layer perceptrons, yet their suitability for embedded inference remains largely unexplored. In this work we couple a MobileNetV3 Small feature extractor with compact KAN classifiers of varying depth and evaluate the resulting hybrid architectures on the Visual Wake Words person-detection benchmark. We train 46 model variants spanning eight KAN topologies—including three ultra-compact heads with as few as two hidden neurons—six width multipliers, and three quantization strategies: native FP32, hybrid INT8 (backbone quantized, KANinFP32), and full TFLite INT8. The hybrid INT8 pathway achieves up to 3.3× compression with accuracy losses below 0.25 percentage points, while TFLite INT8 conversion yields sub-megabyte models (0.31–1.55MB) suitable for devices such as Seeed XIAO and ESP32-S3. We further document a notable finding: KAN layers are stored as INT64 rather than INT8 inside TFLite flatbuffers, limiting compression to 1.9–2.9× instead of the expected ∼4×. We address this limitation through an operator decomposition strategy that replaces non-standard operations (HardSigmoid, Gather) with equivalent compositions of TFLite-native operators, eliminating all INT64 and FP32 tensors from the converted model and enabling full XNNPACK delegation. Preliminary on-device measurements on an ESP32-S3 microcontroller and a Raspberry Pi 3 confirm deployment feasibility, with the smallest model (0.26MB) fitting entirely within internal SRAM and achieving 83.1% accuracy on-device—matching the desktop result to within 0.07pp after resolution of a per-channel quantization mismatch in the TFLite Micro runtime. Our findings offer concrete deployment guidelines for practitioners integrating KAN-based classifiers into resource-constrained inference pipelines and identify quantization of spline-based layers as a tractable engineering challenge for the TinyML community.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2026
			
	Rivista su cui è pubblicata l'opera
	
				IEEE ACCESS
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ACCESS.2026.3693881
			
	Parole chiave
	
				Modeling, Quantization (signal), Accuracy, Internet of Things, Splines (mathematics), Training, Topology, Architecture, Computer architecture, Tensors, Internet of Things, INT8 quantization, Kolmogorov–Arnold networks, MobileNetV3, model compression, person detection, TinyML, Visual Wake Words
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Faggi_Hybrid-CNN-KAN-Architectures-Ultra-Compact_2026.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza d'uso: Creative commons Dimensione 2.12 MB Formato Adobe PDF Visualizza/Apri	2.12 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11566/358181

Citazioni

ND

0

ND

social impact