FEDERATED LEARNING & DISTRIBUTED SYSTEMS
A Methodology and Tool for Automatic Workload Distribution: A Case Study on Federated Learning
The paper introduces a methodology and supporting tool to automatically distribute workloads in federated learning (FL) across heterogeneous edge devices, targeting performance optimization while preserving privacy. Instead of assigning uniform tasks to all clients—as classic FL often does—the approach adapts workloads to device capabilities and orchestrates training runs from within Jupyter notebooks.
Introduction and Core Innovation
The paper introduces a methodology and supporting tool to automatically distribute workloads in federated learning (FL) across heterogeneous edge devices, targeting performance optimization while preserving privacy. Instead of assigning uniform tasks to all clients—as classic FL often does—the approach adapts workloads to device capabilities and orchestrates training runs from within Jupyter notebooks. The system combines a notebook extension, Docker-based containerization, and a skeleton-based compiler to parallelize user code and manage distributed execution.
Related Work and Positioning
Positioned against prior work such as FedAvg, FedProx, and self-adaptive schemes like FedSAE, the authors argue that heterogeneous clients cause "stragglers" and motivate workload-aware orchestration. They also note complementary strands like cross-facility FL (XFFL) for large-scale training and survey privacy challenges (e.g., inference and membership attacks), which often demand techniques like differential privacy or homomorphic encryption—albeit with computational overhead that can exacerbate heterogeneity. This context sets the stage for their automation-first, usability-focused design.
Methodological Framework
Methodologically, the framework rests on three principles: (1) strict separation of concerns between ML logic ("what") and distributed orchestration ("how"); (2) metaprogramming/code generation that treats user functions as data and fills tested templates; and (3) a declarative Pythonic API implemented through decorators. Users annotate functions (model definition, data loader, local trainer), and the system captures sources, detects imports, and generates self-contained artifacts for master and clients. A code generator compiles these into executable scripts (e.g., master.py, client_script.py, common_utils.py).
Runtime Architecture and Execution
At runtime, the master script parses a CSV "machinefile" of clients (hostnames, usernames, ports) and conducts each training round in four phases: distribution of code and weights, parallel execution on workers, aggregation of client updates, and cleanup. SSH underpins secure communications, while the compiler strips decorator scaffolding and injects user logic into templates by parsing and rewriting abstract syntax trees (with algorithms provided to extract imports and function bodies and inject them into template ASTs).
Interactive Workflow and Extensions
The workflow is designed to be launched interactively from a Jupyter notebook cell via federated.compile(...), which triggers code generation and the distributed run. Beyond FL-specific decorators, the same compiler supports general-purpose ones—dockerize for environment capture and isolation, multi-processing for local parallelism, and map-reduce for large-scale data processing—so the approach can extend beyond the showcased FL use case.
Practical Case Study
A case study demonstrates federated training with FedAvg on EMNIST using TensorFlow. Three clients, each with distinct local datasets, train models inside Docker containers (e.g., workers with 2 CPUs and 2 GB RAM), while a containerized orchestrator coordinates rounds and aggregates weights. The setup preserves data locality and standardizes environments, illustrating how sequential user code—properly decorated—gets parallelized across nodes with minimal friction.
Privacy Considerations and Future Directions
On privacy, the system relies on the FL non-data-transfer principle, SSH-secured exchanges of model parameters, and container isolation. The authors acknowledge residual risks (e.g., leakage via model updates) and list future work: integrating stronger protections such as differential privacy and homomorphic encryption, and analyzing information leakage from model deltas to design countermeasures. Overall, the contribution is a practical, automation-centric path to adaptive workload distribution for FL in heterogeneous cloud-edge settings, integrated tightly with data scientists' existing notebook workflows.