This guide walks the training-side surface as one story, from the one-line zero-touch setup to custom scopes and marks. For a flat signature reference, jump to the Reference section. For inference instrumentation, see the Inference guide.Documentation Index
Fetch the complete documentation index at: https://docs.cirron.com/llms.txt
Use this file to discover all available pages before exploring further.
The happy path
One call, called once per process. Framework hooks do the work.cirron.session root scope,
starts a background flush thread, and registers clean-shutdown
handlers.
What the hooks produce
| Framework | How it’s triggered | What you get |
|---|---|---|
| Keras | model.fit() | epoch and batch scopes; logged metrics as marks |
| HuggingFace Trainer | trainer.train() | epoch and step scopes; end-of-epoch values as summary marks |
| PyTorch + DataLoader | for batch in loader: | data_load, forward, backward, optimizer_step, implicit step |
Distributed training
Every rank callsci.profile(). The SDK reads RANK / LOCAL_RANK /
WORLD_SIZE from the environment and tags every span with the rank.
The platform merges views at query time.
See ci.profile for the full signature and
parameter table.
Custom loops
If your loop doesn’t fit the hook patterns (generator-based iteration, custom samplers, step counters without a DataLoader), wrap the iterables. They’re transparent:ci.epochs(range(20)) yields
0..19 exactly while opening an indexed epoch scope around each
iteration.
ci.batches() additionally measures DataLoader stall time (time
spent waiting for data vs. time spent on compute) when the iterable
is a torch.utils.data.DataLoader.
See Loop wrappers.
Custom regions
Explicit scopes for regions the hooks and wrappers don’t cover: augmentation, beam search, custom schedulers, preprocessing passes.epoch / batch / forward tree stays intact and your custom
scope slots in at the right level. Max depth: 64.
See ci.scope.
Values
Attach scalar values to the innermost open scope.kind="point"(default): time-series values recorded while the span is open. Viewers render as a chart.kind="summary": a single canonical end-of-span value. Viewers render as one value on the span.
ci.mark.
Framework hooks
Hooks are installed automatically byci.profile() when the framework
is importable. Each hook is wrapped in a top-level try/except, so a
hook that fails logs a warning and training continues.
Priority
When multiple frameworks are installed, hooks fire in priority order:epoch, step) first. Lower-level frameworks yield on those names
so no semantic scope is duplicated; HuggingFace Trainer running
over a PyTorch DataLoader gets one epoch span per epoch, not two.
PyTorch
| Hook | Mechanism | Scope produced |
|---|---|---|
| Forward pass | nn.Module.__call__ pre/post hook | forward (with mode=train|eval) |
| Backward pass | Autograd hooks on Tensor.backward | backward |
| Optimizer | register_optimizer_step_pre/post_hook | optimizer_step |
| DataLoader | DataLoader.__iter__ / __next__ wrapping | data_load per batch |
| Step boundary | First __next__ after each optimizer_step | step wrapping the four above |
| CUDA time | torch.cuda.Event pairs per scope | gpu_ns attribute on spans |
step span covering all of them.
Epoch-boundary detection uses DataLoader iterator exhaustion: a new
iterator begins a new epoch. Fallback: every N optimizer steps
(configurable, default 1000). When using ci.epochs(), the wrapper
handles it directly.
TensorFlow / Keras
Akeras.callbacks.Callback is auto-registered by patching
Model.fit to inject it if not already present. Opens/closes scopes
on on_epoch_begin/end and on_train_batch_begin/end. Logged
metrics become marks automatically.
HuggingFace transformers
ATrainerCallback is auto-registered by patching Trainer.__init__.
Opens scopes for on_train_begin, on_epoch_begin, on_step_begin.
End-of-epoch values are marked kind="summary". Torch hooks nest
correctly underneath HF’s step span.
scikit-learn
No auto-hook. Opt in by wrapping the estimator:ci.wrap.
Snapshots
At each detected epoch boundary, the SDK captures per-tensor statistics for every parameter in the model being profiled. Three modes, controlled byci.profile(snapshots=...).
| Mode | Cost per epoch boundary | What’s captured |
|---|---|---|
"stats" | ≤ 50 ms (typical model) | {mean, std, min, max, norm, histogram[16]} per tensor |
"sampled" | ≤ 200 ms on sampled steps | Stats + raw tensor values for random() < sample_rate epochs |
"full" | unbounded; debug-only | Stats + raw tensor values every epoch |
"sampled" and "full" modes, raw tensors are written as
safetensors blobs at ./.cirron/snapshots/<span_id>/weights.safetensors
(and gradients.safetensors when gradients are non-None).
Model discovery
Keras and HuggingFace hooks discover the model from their callback kwargs. Bare PyTorch loops that don’t useci.epochs() should
register the model once with ci.watch(model) before training:
ci.watch.
"full" mode is not recommended for models over 100M parameters.
At 7B+, even "sampled" is expensive, drop the sample_rate.Output sinks
By defaultci.profile() writes each batch as a JSON file under
./.cirron/spool/. The output= parameter swaps or fans-out that
local destination so you can stream traces alongside training output
or run purely in-memory:
output="none"
inside a Cirron pipeline still ships traces to the platform over the
kernel event stream; only the local mirroring is suppressed.
See output= reference for the
full table.
In-process read-back
ci.trace() returns the current session’s scope tree without leaving
the process. Useful in notebooks (the cell renders the tree inline)
and for ad-hoc analysis (a flat DataFrame for quantiles and
group-bys):
ci.trace reference.
Lifecycle
Three helpers for manual control. Theatexit handler registered by
ci.profile() calls them for you on process exit, reach for them
only when you need deterministic behavior in tests or hot-reload
scenarios.
Next
Inference guide
@ci.inference, LLM detection, config-driven capture.Reference, ci.profile
Full signature and parameter table.
Reference, ci.trace
Read-back formats and filters.