# Curriculum Learning Support Curriculum learning — training in phases where different parts of the model are active or learn at different speeds — requires two orthogonal controls: - **which layers are trainable** (frozen / unfrozen) - **how fast each layer learns** (per-layer LR coefficients) torch_mentor keeps both as first-class Mentee state, persisted in every checkpoint. The optimizer is treated as a *derivative* of that state: it can be updated in-place when possible, or rebuilt from scratch when the structure changes. --- ## Design principles **Progressive adoption with escape hatches.** Every control can be used independently. Freeze layers without touching LR coefficients. Set LR coefficients before the optimizer even exists. Rebuild the optimizer explicitly or let torch_mentor do it automatically. **Source of truth lives in the model, not the optimizer.** `_frozen_modules` and `_lr_coefficients` are the authoritative state. The optimizer reflects them but is always reproducible from them via `create_train_objects`. **Rebuild on phase change, update in-place for fine adjustments.** Changing which layers are trainable is a training-phase transition — fresh Adam state for newly active layers is often desirable. Adjusting a coefficient by a small ratio is a fine adjustment — the ratio update preserves accumulated scheduler decay. --- ## Layer inspection Before freezing or assigning coefficients, inspect the available layer paths: ```python print(model.layer_names) # ['backbone', 'backbone.conv1', 'backbone.layer1', ..., 'head'] ``` `layer_names` lists every parameter-bearing module in traversal order. These are the strings accepted by `freeze`, `unfreeze`, `set_lr_coefficient`, and `select_layers`. Patterns are matched with `re.fullmatch`, so plain strings act as exact selectors and regex patterns select groups: ```python model.select_layers(["backbone"]) # exact match model.select_layers([r"backbone\.layer[34]"]) # regex model.select_layers(["head", "backbone"]) # order follows layer_names ``` --- ## Freezing and unfreezing ### Basic usage `freeze` and `unfreeze` accept a single string or a list of strings / regex patterns: ```python model.freeze("backbone") # freeze entire backbone model.freeze([r"backbone\.layer[12]"]) # freeze first two layer groups model.unfreeze("backbone.layer4") # unfreeze one sub-layer model.unfreeze(["backbone"]) # unfreeze whole subtree ``` Both methods return `self` for chaining: ```python model.freeze("backbone").freeze("neck") ``` ### State persistence Frozen state is saved in every checkpoint and restored automatically by `resume` and `resume_training` — no extra bookkeeping required. ### Optimizer interaction `freeze` sets `requires_grad=False` on the affected parameters. If an optimizer already exists, its param groups are left untouched: Adam skips parameters whose gradient is `None`, so the frozen groups become inert without any restructuring. `unfreeze` sets `requires_grad=True`. The optimizer interaction depends on when the layer was frozen relative to when the optimizer was built: | Situation | Behaviour | |---|---| | Layer was frozen *after* the optimizer was built | param group already exists; parameters become live again; Adam initialises their state on the first gradient step | | Layer was frozen *before* the optimizer was built | no param group exists; a rebuild is required | ```python # Layer frozen after optimizer was built — fast path, no state loss model.create_train_objects(lr=1e-3) model.freeze("backbone") model.unfreeze("backbone") # fast path — group already exists # Layer frozen before optimizer was built — rebuild required model.freeze("backbone") model.create_train_objects(lr=1e-3) # optimizer built without backbone model.unfreeze("backbone", reset_optimizer_if_needed=True) # triggers rebuild # or raise if reset_optimizer_if_needed=False (default) ``` ### Typical fine-tuning workflow ```python import torch.nn as nn from torchvision.models import resnet50 from mentor import Mentee, Classifier class MyResNet(Mentee): def __init__(self, num_classes=10): super().__init__() base = resnet50(weights="IMAGENET1K_V1") self.backbone = nn.Sequential(*list(base.children())[:-1]) self.head = nn.Linear(2048, num_classes) self.trainer = Classifier() def forward(self, x): return self.head(self.backbone(x).flatten(1)) model = MyResNet(num_classes=10).to("cuda") # Phase 1 — train head only model.freeze("backbone") model.create_train_objects(lr=1e-3) model.fit(train_data, val_data, epochs=5, checkpoint_path="phase1.pt") # Phase 2 — fine-tune everything with a lower global LR model.unfreeze("backbone", reset_optimizer_if_needed=True) model.create_train_objects(lr=1e-4) model.fit(train_data, val_data, epochs=10, checkpoint_path="phase2.pt") ``` --- ## Per-layer learning rate coefficients `set_lr_coefficient` assigns a multiplier relative to the global learning rate. The effective LR for a layer is `global_lr x coefficient`. ```python model.set_lr_coefficient(0.1, "backbone") # 10x lower than global LR model.set_lr_coefficient(0.0, "backbone.layer1") # zero out one sub-layer model.set_lr_coefficient(1.0, "backbone") # restore default (removes entry) model.set_lr_coefficient(0.1, [r"backbone\..*"]) # regex — all backbone sub-layers ``` Coefficients are stored in `_lr_coefficients` (a sparse dict; absent key means 1.0) and are persisted in every checkpoint. ### Ancestor inheritance Setting a coefficient on a parent module propagates to its children's param groups. This means you can set one coefficient for `"backbone"` and every sub-layer (`backbone.layer1`, `backbone.layer1.conv1`, …) inherits it, without needing to enumerate each one: ```python model.set_lr_coefficient(0.01, "backbone") # all of backbone at 1% LR model.set_lr_coefficient(0.1, "backbone.layer4") # layer4 overrides to 10% ``` When `create_train_objects` is called, each param group resolves its coefficient from the most specific matching entry in `_lr_coefficients`. ### Setting before the optimizer exists Coefficients can be set at any point — before or after `create_train_objects`: ```python model = MyResNet() model.set_lr_coefficient(0.01, "backbone") # stored only; no optimizer yet model.create_train_objects(lr=1e-3) # optimizer built with 0.01 x 1e-3 for backbone ``` ### Live in-place update When the optimizer already has a dedicated param group for the target layer (built by `create_train_objects`), `set_lr_coefficient` updates the group's `lr` in-place: ```python group["lr"] *= new_coefficient / old_coefficient ``` This preserves any LR decay the scheduler has already applied. The update also propagates to descendant groups — setting a coefficient for `"backbone"` updates `"backbone.layer1"`, `"backbone.layer1.conv1"`, and so on. ### Rebuild path A rebuild via `create_train_objects` is triggered automatically when: - the target layer has no dedicated param group (optimizer was built as a single flat group, or the layer was frozen when the optimizer was built), or - the old coefficient was `0.0` and the new one is non-zero (ratio undefined). ```python # Explicit rebuild (always safe) model.set_lr_coefficient(0.1, "backbone") model.create_train_objects(lr=1e-3) # Automatic rebuild on demand model.set_lr_coefficient(0.1, "backbone", reset_optimizer_if_needed=True) # Raise instead of rebuild (default) model.set_lr_coefficient(0.1, "backbone") # RuntimeError if rebuild needed ``` ### Layer-wise learning rate decay (LLRD) A common transfer-learning technique applies exponentially decaying LRs from the output layer back to the input: ```python layers = model.layer_names # ordered from input to output n = len(layers) decay = 0.9 for i, layer in enumerate(layers): coeff = decay ** (n - 1 - i) # highest LR at output, lowest at input model.set_lr_coefficient(coeff, layer) model.create_train_objects(lr=1e-3) ``` ### Coefficient = 0.0 Setting a coefficient to `0.0` zeroes the LR for that layer without freezing it (`requires_grad` is unchanged). Gradients still flow — the layer just receives no parameter update. This is occasionally useful to "pause" a layer temporarily while keeping it in the computational graph. --- ## Combining freeze and LR coefficients The two systems are fully independent. `_frozen_modules` controls gradient flow; `_lr_coefficients` controls update magnitude. A frozen layer with a coefficient set will have the coefficient applied when it is unfrozen and the optimizer is rebuilt: ```python model.set_lr_coefficient(0.01, "backbone") model.freeze("backbone") # ... train head only ... model.unfreeze("backbone", reset_optimizer_if_needed=True) # backbone is now trainable at 0.01 x global_lr, as stored in _lr_coefficients ``` --- ## API reference ### `freeze(patterns, optimizer=None, reset_optimizer_if_needed=False)` Freeze layers matched by *patterns* (`str` or `list[str]`). Updates `_frozen_modules` and sets `requires_grad=False`. Returns `self`. ### `unfreeze(patterns, optimizer=None, reset_optimizer_if_needed=False)` Unfreeze layers matched by *patterns* (`str` or `list[str]`). Updates `_frozen_modules` and sets `requires_grad=True`. If the optimizer lacks a group for an unfrozen layer, raises `RuntimeError` unless `reset_optimizer_if_needed=True`. Returns `self`. ### `set_lr_coefficient(coefficient, patterns, optimizer=None, reset_optimizer_if_needed=False)` Set LR coefficient for layers matched by *patterns* (`str` or `list[str]`). Updates `_lr_coefficients`. Updates optimizer param groups in-place where possible; triggers rebuild or raises when not possible (controlled by `reset_optimizer_if_needed`). Returns `self`. ### `select_layers(patterns)` Return the layer paths from `layer_names` that match any pattern in *patterns*, deduplicated and sorted in module traversal order. Raises `ValueError` if a pattern matches nothing. Used internally by `freeze`, `unfreeze`, and `set_lr_coefficient`. ### `create_train_objects(lr, step_size, gamma, ...)` Always reads `_frozen_modules` and `_lr_coefficients` to build one param group per non-frozen layer with `lr = global_lr x coefficient`. Calling this is the safe "full rebuild" path after any structural change.