CHECKPOINTS#

Functions#

utils.checkpoints.can_save_and_exit(fn)[source]#

Wraps a function to catch KeyboardInterrupt and SigInt signals.

If running in a Jupyter notebook, this will prevent the kernel from crashing when the user interrupts the execution of a cell and retain the current state.

If running in a script, this will:

catch the KeyboardInterrupt and exit gracefully
catch the SigInt and save a checkpoint before exiting

This is useful for training scripts where you want to be able to stop the training process and save the current state of the model.

Parameters:: fn (Callable) – the function to be wrapped
Returns:: the wrapped function
Return type:: Callable

utils.checkpoints.mammoth_load_checkpoint(checkpoint_path, model=None, ignore_classifier=False, args=None, return_only_args=False)[source]#

Loads the keys from the given checkpoint. - Handles DataParallel and DistributedDataParallel checkpoints. - Handles checkpoints from previous versions of the code. - Handles head initialization for LUCIR.

Parameters:

checkpoint_path (str) – the path to the checkpoint file or URL.
model (Module | None) – the model to be loaded. It can be None ONLY with return_only_args=True.
ignore_classifier – whether to ignore the classifier weights.
args (Namespace | None) – the current arguments. If provided, it will check if the loaded arguments match the current ones.
return_only_args (bool) – if True, only returns the loaded arguments and not the model.

Returns:

the model with the checkpoint loaded.

Return type:

Namespace | Tuple[Module, Dict[str, float | int] | None]

utils.checkpoints.save_mammoth_checkpoint(task, end_task, args, model, results=None, optimizer_st=None, scheduler_st=None, checkpoint_name=None)[source]#

Save a checkpoint for the model for the given task. Handles saving as a single file (will require weights_only=False) or separate weights (can be loaded safely with weights_only=True).