Introduction¶
Distributed Data Parallel Wrapper (DDPW) is a lightweight Python wrapper relevant to PyTorch users. It is written in Python 3.13.
DDPW enables writing compute-intensive tasks (such as training models) without deeply worrying about the underlying compute platform (CPU, Apple SoC, GPUs, or SLURM (uses Submitit)) and instead allows specifying it simply as an argument. This considerably minimises the need to change the code for each type of platform.
DDPW handles basic logistical tasks such as creating threads on GPUs/SLURM nodes, setting up inter-process communication, etc., and provides simple, default utility methods to move modules to devices and get dataset samplers, allowing the user to focus on the main aspects of the task.
Installation¶
DDPW is distributed on PyPI. The source code is available on GitHub and can be used to manually build it as a dependency package.
Target platforms
This wrapper is released for all architectures but is tested only on Linux arch-64 and Apple SoC.
Usage¶
As a decorator¶
1from ddpw import Platform, wrapper
2
3platform = Platform(device="gpu", n_cpus=32, ram=64, n_gpus=4, verbose=True)
4
5@wrapper(platform)
6def run(*args, **kwargs):
7 # global and local ranks, and the process group in:
8 # kwargs['global_rank'], # kwargs['local_rank'], kwargs['group']
9 pass
10
11if __name__ == '__main__':
12 run(*args, **kwargs)
As a callable¶
1from ddpw import Platform, Wrapper
2
3# some task
4def run(*args, **kwargs):
5 # global and local ranks, and the process group in:
6 # kwargs['global_rank'], # kwargs['local_rank'], kwargs['group']
7 pass
8
9# platform (e.g., 4 GPUs)
10platform = Platform(device='gpu', n_gpus=4)
11
12# wrapper
13wrapper = Wrapper(platform=platform)
14
15# start
16wrapper.start(task, *args, **kwargs)
Refer to the API for more configuration options or the example with MNIST for an illustration.