Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cirron.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks the training-side surface as one story, from the one-line zero-touch setup to custom scopes and marks. For a flat signature reference, jump to the Reference section. For inference instrumentation, see the Inference guide.

The happy path

One call, called once per process. Framework hooks do the work.
import cirron as ci

ci.profile()
That’s the whole setup. The SDK detects installed frameworks and installs hooks for each. It opens the cirron.session root scope, starts a background flush thread, and registers clean-shutdown handlers.

What the hooks produce

FrameworkHow it’s triggeredWhat you get
Kerasmodel.fit()epoch and batch scopes; logged metrics as marks
HuggingFace Trainertrainer.train()epoch and step scopes; end-of-epoch values as summary marks
PyTorch + DataLoaderfor batch in loader:data_load, forward, backward, optimizer_step, implicit step
import cirron as ci
ci.profile()

model.fit(X, y, epochs=20)

Distributed training

Every rank calls ci.profile(). The SDK reads RANK / LOCAL_RANK / WORLD_SIZE from the environment and tags every span with the rank. The platform merges views at query time. See ci.profile for the full signature and parameter table.

Custom loops

If your loop doesn’t fit the hook patterns (generator-based iteration, custom samplers, step counters without a DataLoader), wrap the iterables. They’re transparent: ci.epochs(range(20)) yields 0..19 exactly while opening an indexed epoch scope around each iteration.
for epoch in ci.epochs(range(20)):
    for batch in ci.batches(loader):
        loss = train_step(batch)
        ci.mark("loss", loss.item())
ci.batches() additionally measures DataLoader stall time (time spent waiting for data vs. time spent on compute) when the iterable is a torch.utils.data.DataLoader. See Loop wrappers.

Custom regions

Explicit scopes for regions the hooks and wrappers don’t cover: augmentation, beam search, custom schedulers, preprocessing passes.
with ci.scope("augmentation"):
    batch = augment(batch)

with ci.scope("postprocess", variant="beam-search"):
    output = beam_search(logits)
Scopes nest arbitrarily under whatever scope is already open, so the hooks’ epoch / batch / forward tree stays intact and your custom scope slots in at the right level. Max depth: 64. See ci.scope.

Values

Attach scalar values to the innermost open scope.
ci.mark("loss", loss.item())                                # point (default)
ci.mark("grad_norm", compute_grad_norm(model))
ci.mark("learning_rate", scheduler.get_last_lr()[0])
ci.mark("epoch_accuracy", val_acc, kind="summary")          # canonical epoch value
Two kinds:
  • kind="point" (default): time-series values recorded while the span is open. Viewers render as a chart.
  • kind="summary": a single canonical end-of-span value. Viewers render as one value on the span.
See ci.mark.

Framework hooks

Hooks are installed automatically by ci.profile() when the framework is importable. Each hook is wrapped in a top-level try/except, so a hook that fails logs a warning and training continues.

Priority

When multiple frameworks are installed, hooks fire in priority order:
transformers  >  tensorflow  >  torch
Higher-level frameworks claim ownership of the semantic scopes (epoch, step) first. Lower-level frameworks yield on those names so no semantic scope is duplicated; HuggingFace Trainer running over a PyTorch DataLoader gets one epoch span per epoch, not two.

PyTorch

HookMechanismScope produced
Forward passnn.Module.__call__ pre/post hookforward (with mode=train|eval)
Backward passAutograd hooks on Tensor.backwardbackward
Optimizerregister_optimizer_step_pre/post_hookoptimizer_step
DataLoaderDataLoader.__iter__ / __next__ wrappingdata_load per batch
Step boundaryFirst __next__ after each optimizer_stepstep wrapping the four above
CUDA timetorch.cuda.Event pairs per scopegpu_ns attribute on spans
Gradient accumulation (multiple forward/backward pairs between optimizer steps) produces a single step span covering all of them. Epoch-boundary detection uses DataLoader iterator exhaustion: a new iterator begins a new epoch. Fallback: every N optimizer steps (configurable, default 1000). When using ci.epochs(), the wrapper handles it directly.

TensorFlow / Keras

A keras.callbacks.Callback is auto-registered by patching Model.fit to inject it if not already present. Opens/closes scopes on on_epoch_begin/end and on_train_batch_begin/end. Logged metrics become marks automatically.

HuggingFace transformers

A TrainerCallback is auto-registered by patching Trainer.__init__. Opens scopes for on_train_begin, on_epoch_begin, on_step_begin. End-of-epoch values are marked kind="summary". Torch hooks nest correctly underneath HF’s step span.

scikit-learn

No auto-hook. Opt in by wrapping the estimator:
from sklearn.ensemble import RandomForestClassifier
import cirron as ci

model = ci.wrap(RandomForestClassifier(n_estimators=100))
model.fit(X, y)      # opens a scope around fit, delegates everything else
See ci.wrap.

Snapshots

At each detected epoch boundary, the SDK captures per-tensor statistics for every parameter in the model being profiled. Three modes, controlled by ci.profile(snapshots=...).
ModeCost per epoch boundaryWhat’s captured
"stats"≤ 50 ms (typical model){mean, std, min, max, norm, histogram[16]} per tensor
"sampled"≤ 200 ms on sampled stepsStats + raw tensor values for random() < sample_rate epochs
"full"unbounded; debug-onlyStats + raw tensor values every epoch
In "sampled" and "full" modes, raw tensors are written as safetensors blobs at ./.cirron/snapshots/<span_id>/weights.safetensors (and gradients.safetensors when gradients are non-None).

Model discovery

Keras and HuggingFace hooks discover the model from their callback kwargs. Bare PyTorch loops that don’t use ci.epochs() should register the model once with ci.watch(model) before training:
import cirron as ci

ci.profile(snapshots="stats")
ci.watch(model)

for epoch in range(20):
    ...
See ci.watch.
"full" mode is not recommended for models over 100M parameters. At 7B+, even "sampled" is expensive, drop the sample_rate.

Output sinks

By default ci.profile() writes each batch as a JSON file under ./.cirron/spool/. The output= parameter swaps or fans-out that local destination so you can stream traces alongside training output or run purely in-memory:
ci.profile(output="stdout")            # live [cirron] lines per closed span
ci.profile(output="log")               # cirron.trace logger at INFO
ci.profile(output=["spool", "log"])    # disk + log
ci.profile(output="none")              # nothing written; pair with ci.trace()
Sinks are independent of the platform transport. output="none" inside a Cirron pipeline still ships traces to the platform over the kernel event stream; only the local mirroring is suppressed. See output= reference for the full table.

In-process read-back

ci.trace() returns the current session’s scope tree without leaving the process. Useful in notebooks (the cell renders the tree inline) and for ad-hoc analysis (a flat DataFrame for quantiles and group-bys):
ci.trace()                          # text tree to stdout (or notebook value)
ci.trace(format="dict")             # nested dict
ci.trace(format="json")             # JSON string
ci.trace(format="df")               # pandas DataFrame, one row per span
ci.trace(name="epoch")              # filter by scope name
ci.trace(last=10)                   # 10 most recently closed spans
Works with or without an active profiler. When no profiler is attached, the call is purely in-memory and never writes a spool file as a side effect (safe on read-only filesystems). See ci.trace reference.

Lifecycle

Three helpers for manual control. The atexit handler registered by ci.profile() calls them for you on process exit, reach for them only when you need deterministic behavior in tests or hot-reload scenarios.
ci.flush()       # synchronously drain scope + mark buffers to spool
ci.health()      # dict: enabled, drop counts, hook handles, transport, spool usage
ci.shutdown()    # close root scope, flush, stop flush thread, clear singleton
See Lifecycle reference.

Next

Inference guide

@ci.inference, LLM detection, config-driven capture.

Reference, ci.profile

Full signature and parameter table.

Reference, ci.trace

Read-back formats and filters.