DISTRIBUTED#

Distributed utilities for parallel processing.

Supports both Distributed Data Parallel (DDP) and Data Parallel (DP) models.

Examples

>>> from utils.distributed import make_ddp, make_dp
>>> model = make_ddp(model) # for DDP    >>> model = make_dp(model) # for DP

Note: - DDP is not applicable to rehearsal methods (see make_ddp for more details). - When using DDP, you might need the wait_for_master function.

Synchronization before and after training is handled automatically.

Classes#

class utils.distributed.CustomDP(module, device_ids=None, output_device=None, dim=0)[source]#

Bases: DataParallel

Custom DataParallel class to avoid using .module.

is_init = False#

load_state_dict(state_dict, strict=True, assign=False)[source]#: Override default load_state_dict to avoid using .module.

state_dict(destination=None, prefix='', keep_vars=False)[source]#: Override default state_dict to avoid using .module.

Functions#

utils.distributed.make_ddp(model)[source]#

Create a DistributedDataParallel (DDP) model.

Note: DDP is not applicable to rehearsal methods (e.g., GEM, A-GEM, ER, etc.). This is because DDP breaks the buffer, which has to be synchronized. Ad-hoc solutions are possible, but they are not implemented here.

Parameters:: model (Module) – The model to be wrapped with DDP.
Returns:: The DDP-wrapped model.
Return type:: None

utils.distributed.make_dp(model)[source]#

Create a DataParallel (DP) model.

Parameters:: model – The model to be wrapped with DP.
Returns:: The DP-wrapped model.

utils.distributed.setup(rank, world_size)[source]#

Set up the distributed environment for parallel processing using Distributed Data Parallel (DDP).

Parameters:

rank (int) – The rank of the current process.
world_size (int) – The total number of processes.

Returns:

None

Return type:

None

utils.distributed.wait_for_master()[source]#

Wait for the master process to arrive at the barrier.

This is a blocking call.
The function is a no-op if the current process is the master (or DDP is not used).

Returns:: None
Return type:: None

utils.distributed.make_ddp(model)[source]#

Create a DistributedDataParallel (DDP) model.

Parameters:: model (Module) – The model to be wrapped with DDP.
Returns:: The DDP-wrapped model.
Return type:: None

utils.distributed.make_dp(model)[source]#

Create a DataParallel (DP) model.

Parameters:: model – The model to be wrapped with DP.
Returns:: The DP-wrapped model.

utils.distributed.setup(rank, world_size)[source]#

Set up the distributed environment for parallel processing using Distributed Data Parallel (DDP).

Parameters:

rank (int) – The rank of the current process.
world_size (int) – The total number of processes.

Returns:

None

Return type:

None

utils.distributed.wait_for_master()[source]#

Wait for the master process to arrive at the barrier.

This is a blocking call.
The function is a no-op if the current process is the master (or DDP is not used).

Returns:: None
Return type:: None