DISTRIBUTED#

Distributed utilities for parallel processing.

Supports both Distributed Data Parallel (DDP) and Data Parallel (DP) models.

Examples

>>> from utils.distributed import make_ddp, make_dp
>>> model = make_ddp(model) # for DDP    >>> model = make_dp(model) # for DP

Note: - DDP is not applicable to rehearsal methods (see make_ddp for more details). - When using DDP, you might need the wait_for_master function.

  • Synchronization before and after training is handled automatically.

Classes#

class utils.distributed.CustomDP(module, device_ids=None, output_device=None, dim=0)[source]#

Bases: DataParallel

Custom DataParallel class to avoid using .module when accessing intercept_names attributes.

intercept_names#

List of attribute names to intercept.

Type:

list

intercept_names = ['classifier', 'num_classes', 'set_return_prerelu']#

Functions#

utils.distributed.make_ddp(model)[source]#

Create a DistributedDataParallel (DDP) model.

Note: DDP is not applicable to rehearsal methods (e.g., GEM, A-GEM, ER, etc.). This is because DDP breaks the buffer, which has to be synchronized. Ad-hoc solutions are possible, but they are not implemented here.

Parameters:

model (Module) – The model to be wrapped with DDP.

Returns:

The DDP-wrapped model.

Return type:

None

utils.distributed.make_dp(model)[source]#

Create a DataParallel (DP) model.

Parameters:

model – The model to be wrapped with DP.

Returns:

The DP-wrapped model.

utils.distributed.setup(rank, world_size)[source]#

Set up the distributed environment for parallel processing using Distributed Data Parallel (DDP).

Parameters:
  • rank (int) – The rank of the current process.

  • world_size (int) – The total number of processes.

Returns:

None

Return type:

None

utils.distributed.wait_for_master()[source]#

Wait for the master process to arrive at the barrier.

  • This is a blocking call.

  • The function is a no-op if the current process is the master (or DDP is not used).

Returns:

None

Return type:

None