We present ANFR, the first FL-native architecture that combines scaled weight standardization with channel attention to address data heterogeneity. Our approach reduces non-IID accuracy degradation by 40% while maintaining favorable privacy–utility trade-offs under differential privacy.
Transactions on Machine Learning Research (TMLR) 2025
Data heterogeneity across clients is a fundamental challenge in Federated Learning. Standard batch normalization layers, which aggregate statistics locally, become unreliable when client data distributions differ significantly. This is particularly problematic in medical imaging applications where imaging protocols, equipment, and patient populations vary across institutions.
We introduce ANFR (Adaptive Normalization-Free Feature Recalibration), an architecture designed specifically for federated learning scenarios. By combining scaled weight standardization with adaptive channel attention mechanisms, ANFR achieves robust performance across heterogeneous data distributions without relying on potentially unstable batch statistics.
FL-Native Architecture: First neural network architecture specifically designed for federated learning that eliminates batch normalization in favor of weight standardization.
Adaptive Channel Attention: A recalibration mechanism that dynamically adjusts feature importance based on local data characteristics while maintaining global model coherence.
Privacy Compatibility: Maintains strong performance under differential privacy constraints (ε=1), making it suitable for sensitive healthcare applications.
Extensive Validation: Evaluated across 4 datasets and 3 different aggregation methods, demonstrating consistent 40% reduction in non-IID accuracy degradation.
ANFR replaces standard batch normalization with scaled weight standardization, which normalizes the weights of each layer rather than the activations. This eliminates the dependency on batch-level statistics that can vary drastically across federated clients.
The adaptive channel attention module learns to emphasize informative features and suppress noisy ones based on the global context of each sample. This provides client-adaptive behavior without requiring explicit communication of client-specific parameters.
We evaluate ANFR on four medical imaging and natural image classification datasets under varying degrees of non-IID data partitioning. Key findings include:
40% reduction in accuracy drop compared to standard architectures under severe non-IID conditions
Consistent improvements across FedAvg, FedProx, and SCAFFOLD aggregation methods
Maintains 95% of baseline accuracy under ε=1 differential privacy
If you find this work useful, please cite our paper:
@article{siomos2025anfr,
title={Addressing Data Heterogeneity in Federated Learning with
Adaptive Normalization-Free Feature Recalibration},
author={Siomos, Vasilis and Passerat-Palmbach, Jonathan and Tarroni, Giacomo},
journal={Transactions on Machine Learning Research (TMLR)},
year={2025}
}