Curriculum Learning Support

Curriculum learning — training in phases where different parts of the model are active or learn at different speeds — requires two orthogonal controls:

  • which layers are trainable (frozen / unfrozen)

  • how fast each layer learns (per-layer LR coefficients)

torch_mentor keeps both as first-class Mentee state, persisted in every checkpoint. The optimizer is treated as a derivative of that state: it can be updated in-place when possible, or rebuilt from scratch when the structure changes.


Design principles

Progressive adoption with escape hatches. Every control can be used independently. Freeze layers without touching LR coefficients. Set LR coefficients before the optimizer even exists. Rebuild the optimizer explicitly or let torch_mentor do it automatically.

Source of truth lives in the model, not the optimizer. _frozen_modules and _lr_coefficients are the authoritative state. The optimizer reflects them but is always reproducible from them via create_train_objects.

Rebuild on phase change, update in-place for fine adjustments. Changing which layers are trainable is a training-phase transition — fresh Adam state for newly active layers is often desirable. Adjusting a coefficient by a small ratio is a fine adjustment — the ratio update preserves accumulated scheduler decay.


Layer inspection

Before freezing or assigning coefficients, inspect the available layer paths:

print(model.layer_names)
# ['backbone', 'backbone.conv1', 'backbone.layer1', ..., 'head']

layer_names lists every parameter-bearing module in traversal order. These are the strings accepted by freeze, unfreeze, set_lr_coefficient, and select_layers.

Patterns are matched with re.fullmatch, so plain strings act as exact selectors and regex patterns select groups:

model.select_layers(["backbone"])               # exact match
model.select_layers([r"backbone\.layer[34]"])   # regex
model.select_layers(["head", "backbone"])       # order follows layer_names

Freezing and unfreezing

Basic usage

freeze and unfreeze accept a single string or a list of strings / regex patterns:

model.freeze("backbone")                     # freeze entire backbone
model.freeze([r"backbone\.layer[12]"])       # freeze first two layer groups
model.unfreeze("backbone.layer4")            # unfreeze one sub-layer
model.unfreeze(["backbone"])                 # unfreeze whole subtree

Both methods return self for chaining:

model.freeze("backbone").freeze("neck")

State persistence

Frozen state is saved in every checkpoint and restored automatically by resume and resume_training — no extra bookkeeping required.

Optimizer interaction

freeze sets requires_grad=False on the affected parameters. If an optimizer already exists, its param groups are left untouched: Adam skips parameters whose gradient is None, so the frozen groups become inert without any restructuring.

unfreeze sets requires_grad=True. The optimizer interaction depends on when the layer was frozen relative to when the optimizer was built:

Situation

Behaviour

Layer was frozen after the optimizer was built

param group already exists; parameters become live again; Adam initialises their state on the first gradient step

Layer was frozen before the optimizer was built

no param group exists; a rebuild is required

# Layer frozen after optimizer was built — fast path, no state loss
model.create_train_objects(lr=1e-3)
model.freeze("backbone")
model.unfreeze("backbone")              # fast path — group already exists

# Layer frozen before optimizer was built — rebuild required
model.freeze("backbone")
model.create_train_objects(lr=1e-3)    # optimizer built without backbone
model.unfreeze("backbone", reset_optimizer_if_needed=True)   # triggers rebuild
# or raise if reset_optimizer_if_needed=False (default)

Typical fine-tuning workflow

import torch.nn as nn
from torchvision.models import resnet50
from mentor import Mentee, Classifier

class MyResNet(Mentee):
    def __init__(self, num_classes=10):
        super().__init__()
        base = resnet50(weights="IMAGENET1K_V1")
        self.backbone = nn.Sequential(*list(base.children())[:-1])
        self.head = nn.Linear(2048, num_classes)
        self.trainer = Classifier()

    def forward(self, x):
        return self.head(self.backbone(x).flatten(1))

model = MyResNet(num_classes=10).to("cuda")

# Phase 1 — train head only
model.freeze("backbone")
model.create_train_objects(lr=1e-3)
model.fit(train_data, val_data, epochs=5, checkpoint_path="phase1.pt")

# Phase 2 — fine-tune everything with a lower global LR
model.unfreeze("backbone", reset_optimizer_if_needed=True)
model.create_train_objects(lr=1e-4)
model.fit(train_data, val_data, epochs=10, checkpoint_path="phase2.pt")

Per-layer learning rate coefficients

set_lr_coefficient assigns a multiplier relative to the global learning rate. The effective LR for a layer is global_lr x coefficient.

model.set_lr_coefficient(0.1, "backbone")          # 10x lower than global LR
model.set_lr_coefficient(0.0, "backbone.layer1")   # zero out one sub-layer
model.set_lr_coefficient(1.0, "backbone")          # restore default (removes entry)
model.set_lr_coefficient(0.1, [r"backbone\..*"])   # regex — all backbone sub-layers

Coefficients are stored in _lr_coefficients (a sparse dict; absent key means 1.0) and are persisted in every checkpoint.

Ancestor inheritance

Setting a coefficient on a parent module propagates to its children’s param groups. This means you can set one coefficient for "backbone" and every sub-layer (backbone.layer1, backbone.layer1.conv1, …) inherits it, without needing to enumerate each one:

model.set_lr_coefficient(0.01, "backbone")   # all of backbone at 1% LR
model.set_lr_coefficient(0.1,  "backbone.layer4")  # layer4 overrides to 10%

When create_train_objects is called, each param group resolves its coefficient from the most specific matching entry in _lr_coefficients.

Setting before the optimizer exists

Coefficients can be set at any point — before or after create_train_objects:

model = MyResNet()
model.set_lr_coefficient(0.01, "backbone")   # stored only; no optimizer yet
model.create_train_objects(lr=1e-3)          # optimizer built with 0.01 x 1e-3 for backbone

Live in-place update

When the optimizer already has a dedicated param group for the target layer (built by create_train_objects), set_lr_coefficient updates the group’s lr in-place:

group["lr"] *= new_coefficient / old_coefficient

This preserves any LR decay the scheduler has already applied. The update also propagates to descendant groups — setting a coefficient for "backbone" updates "backbone.layer1", "backbone.layer1.conv1", and so on.

Rebuild path

A rebuild via create_train_objects is triggered automatically when:

  • the target layer has no dedicated param group (optimizer was built as a single flat group, or the layer was frozen when the optimizer was built), or

  • the old coefficient was 0.0 and the new one is non-zero (ratio undefined).

# Explicit rebuild (always safe)
model.set_lr_coefficient(0.1, "backbone")
model.create_train_objects(lr=1e-3)

# Automatic rebuild on demand
model.set_lr_coefficient(0.1, "backbone", reset_optimizer_if_needed=True)

# Raise instead of rebuild (default)
model.set_lr_coefficient(0.1, "backbone")   # RuntimeError if rebuild needed

Layer-wise learning rate decay (LLRD)

A common transfer-learning technique applies exponentially decaying LRs from the output layer back to the input:

layers = model.layer_names          # ordered from input to output
n = len(layers)
decay = 0.9
for i, layer in enumerate(layers):
    coeff = decay ** (n - 1 - i)   # highest LR at output, lowest at input
    model.set_lr_coefficient(coeff, layer)

model.create_train_objects(lr=1e-3)

Coefficient = 0.0

Setting a coefficient to 0.0 zeroes the LR for that layer without freezing it (requires_grad is unchanged). Gradients still flow — the layer just receives no parameter update. This is occasionally useful to “pause” a layer temporarily while keeping it in the computational graph.


Combining freeze and LR coefficients

The two systems are fully independent. _frozen_modules controls gradient flow; _lr_coefficients controls update magnitude. A frozen layer with a coefficient set will have the coefficient applied when it is unfrozen and the optimizer is rebuilt:

model.set_lr_coefficient(0.01, "backbone")
model.freeze("backbone")
# ... train head only ...
model.unfreeze("backbone", reset_optimizer_if_needed=True)
# backbone is now trainable at 0.01 x global_lr, as stored in _lr_coefficients

API reference

freeze(patterns, optimizer=None, reset_optimizer_if_needed=False)

Freeze layers matched by patterns (str or list[str]). Updates _frozen_modules and sets requires_grad=False. Returns self.

unfreeze(patterns, optimizer=None, reset_optimizer_if_needed=False)

Unfreeze layers matched by patterns (str or list[str]). Updates _frozen_modules and sets requires_grad=True. If the optimizer lacks a group for an unfrozen layer, raises RuntimeError unless reset_optimizer_if_needed=True. Returns self.

set_lr_coefficient(coefficient, patterns, optimizer=None, reset_optimizer_if_needed=False)

Set LR coefficient for layers matched by patterns (str or list[str]). Updates _lr_coefficients. Updates optimizer param groups in-place where possible; triggers rebuild or raises when not possible (controlled by reset_optimizer_if_needed). Returns self.

select_layers(patterns)

Return the layer paths from layer_names that match any pattern in patterns, deduplicated and sorted in module traversal order. Raises ValueError if a pattern matches nothing. Used internally by freeze, unfreeze, and set_lr_coefficient.

create_train_objects(lr, step_size, gamma, ...)

Always reads _frozen_modules and _lr_coefficients to build one param group per non-frozen layer with lr = global_lr x coefficient. Calling this is the safe “full rebuild” path after any structural change.