The framework uses hydra-conf and leverages its hierarchical configuration feature. This allows configurations to be split up into files and be re-used. The result: a final configuration file made out of different other fragments. More on that in the composition section of this page.
Overview
There exist two configuration locations relative to the root (/
) of the repository: /config
and /edml/config
. They exist for organisational purposes: the former contains the final experiment configuration files and the latter contains partial configurations that can be included into the experiment configuration files. The full configuration file with all available keys and their general purpose looks like this:
# Describes which dataset to load.
dataset:
# The name of the dataset to use. This is used as a key inside an if-else branch to determine which method to call.
name: cifar10
# These are custom settings for the above dataset. The keys differ based on the implementation.
average_setting: micro
num_classes: 10
# Battery configuration. This field holds the energy consumption of a device based on various metrics.
battery:
# The base consumption per second.
deduction_per_second: 0
# The consumption per megaflop of data processed.
deduction_per_mflop: 0.001
# The consumption per megabyte of data recieived over the network.
deduction_per_mbyte_received: 0.001
# The consumption per megabyte of data sent over the network.
deduction_per_mbyte_sent: 0.01
# Points to the loss function to use during backpropagation. This can be a builtin pytorch loss function or a custom one.
loss_fn:
# `_target_` points to the class to instantiate.
_target_: torch.nn.CrossEntropyLoss
# The `experiment` section contains experiment-specific parameters.
experiment:
# The name of the experiment. Used in `wandb` as the project name, too.
project: inda-ml-comparisons
name: cifar100-effectiveness-adaptive-threshold-mechanism-none
# The type of experiment. Only used for logging/wandb.
job: train
# Training parameters.
batch_size: 64
max_epochs: 1
max_rounds: 200
metrics: [ accuracy ]
# Checkpoint saving and early stopping.
save_weights: True
server_model_save_path: "edml/models/weights/"
client_model_save_path: "edml/models/weights/"
# When enabled, early stopping uses the metric specified to check the model improvement rate after each
# training round. If the rate is too small, training is stopped completely.
early_stopping: True
early_stopping_patience: 200
early_stopping_metric: accuracy
# Dataset partitioning. If true, the dataset is split and each device gets their own data.
partition: True
# An optional lits of fractions used to determine the ratio of the data split. Numbers have to be between 0 and 1.
# The numbers should add up to 1. The length should be equal to the number of devices participating in the
# experiment.
fractions: !!null
# An optional list of numbers describing a simulated latency of a device. The length should be equal to the number of devices participating in the
# experiment.
latency: !!null
# Useful for dubugging purposes. Only loads a single batch from the dataset.
load_single_batch_for_debugging: False
# The model provider is responsible for providing a client and server model. The class can be extended for custom implementations.
# This config field has to be point to an implementation of `edml.models.provider.base.ModelProvider`.
model_provider:
_target_: edml.models.provider.base.ModelProvider
# Custom keys based on the model provider you use. In the case for `edml.models.provider.base.ModelProvider`, you can specify
# the client and server model by ponting to two classes that inherit from `torch.nn.Module`.
client:
_target_: edml.models.mnist_models.ClientNet
server:
_target_: edml.models.mnist_models.ServerNet
# The optimizer to use during backpropagation.
optimizer: !!null
# The optional schduler to use for the learning rate.
scheduler: !!null
# An optional initial seed. If not set, the seed is initialized randomly.
seed: default
# The device/network topology. Contains A list of devices, each with their unique ID, their IP address and their initial battery capacity.
topology:
# A list of device definitions.
devices: [
# A device definition. All fields are mandatory.
{
device_id: "d0",
address: "localhost:50051",
battery_capacity: 45000,
},
{
device_id: "d1",
address: "localhost:50052",
battery_capacity: 45000,
},
{
device_id: "d2",
address: "localhost:50053",
battery_capacity: 45000,
},
{
device_id: "d3",
address: "localhost:50054",
battery_capacity: 45000,
},
{
device_id: "d4",
address: "localhost:50055",
battery_capacity: 45000,
}
]
# The first n devices to use for the experiment. Defaults to the same number of devices declared above.
# Sometimes it is useful to shrink the count for debugging purposes.
num_devices: ${len:${topology.devices}}
# The device's unique ID. This is set inside the code and does not have to be set manually. It just exists here so that `hydra`
# knows about the field and we can manipulate it during runtime.
own_device_id: "d0"
# The GRPC configuration. if not configured explicitly, sane default values will be used.
grpc: default
# The wandb configuration section.
wandb:
# If true, the experiments get logged to wandb.
enabled: true
# The org or user to log the experiment run to.
entity: <your-wandb-username-or-organisation-name>
# The filename from which the API key is read from.
wandb_key: wandb_key.txt
# If False, controllers will run devices in parallel. If True, they will run sequentially and their runtime is corrected
# to account for the parallelism in post-processing.
# Important Note: In a limited energy setting, the runtime will not be accounted for correctly (i.e. wall time) if parallelism is only simulated.
simulate_parallelism: False
# The key is used to group experiments together. Useful for grouping experiments of sweeps together in `wandb`.
# You define config key paths.
group_by:
# The grouping takes the `controller.name`, `controller.scheduler.name` and `controller.adaptive_threshold_fn.name` properties into account.
- controller: [ name, scheduler: name, adaptive_threshold_fn: name ]
# Additionally, the `model_provider.decoder.path` property is taken into account.
- model_provider: [ decoder: path ]
# `group_name` is a custom resolver for the configuration that takes a value and formats it in a special way for later usage.
# The term `${group_name:${group_by}}` means: use the value `group_by` to setup and configure the grouping. Normally, you'd
# only change the `group_by` configuration value.
group: ${group_name:${group_by}}
Composition
What hydra
allows us to do is to split the various subfields into their own files. For example, we can create a file at /config/dataset/cifar10.yaml
that contains the following:
name: cifar10
average_setting: micro
num_classes: 10
Inside our configuration yaml file we then write:
dataset: cifar10
this makes hydra
look into the dataset
folder for a file named cifar10.yaml
. If it is found, its content is added under the dataset
key. This allows the framework to provide a pre-defined
set of common configurations that one can simply include into their own experiment configuration file, reducing repetition and maintenance.
Pre-defined configurations
For a set of pre-defined configuration files, please take a look into the edml/config
folder. We provide configuration files for various datasets, loss functions, models, optimizers and more.