Core concepts - Cirron Documentation

Scope tree

The core data model is a scope tree. A scope is a named span with a start time, end time, optional index, optional attributes, a parent pointer, and a list of marks. Scopes nest; the innermost open scope in the current thread is the target for ci.mark(). Scopes are thread-local. Parallel DataLoader workers, distributed training ranks, and async inference handlers each get their own scope tree, tagged with a worker/rank identifier. The platform reconstructs cross-thread and cross-rank views at query time. Inside the training loop, the canonical shape is:

cirron.session
  epoch[n]
    step[n]
      data_load
      forward
      backward
      optimizer_step

Epoch spans are siblings of each other under the session, never nested. Max depth is 64; scopes beyond that are dropped with a warning.

Marks

ci.mark(name, value, kind="point" | "summary", **attrs) attaches a named value to the innermost open scope on the current thread. When no scope is open, it attaches to the cirron.session root opened by ci.profile(). Two kinds of marks:

kind="point" (default): a time-series data point recorded while the span is open (per-step loss, per-batch accuracy). Viewers render these as a time series.
kind="summary": a canonical end-of-span value (epoch-final loss, run-level accuracy). Viewers render these as a single value attached to the span.

Values are coerced: float64 for numerics, 256-byte cap for strings, bool. Complex types (tensors, arrays) should use snapshots, not marks. Marks live in a lock-free per-thread ring buffer (default 64k capacity). When full, the oldest mark is dropped and a drop counter is incremented; drop counts surface in ci.health() and the dashboard.

Framework hook priority

When ci.profile() detects multiple frameworks, hooks install in a fixed priority order:

transformers  >  tensorflow  >  torch

Higher-level frameworks claim ownership of the semantic scopes (epoch, step) before lower-level ones decide whether to open their own. This prevents duplicate epoch spans when, for example, HF Trainer drives a torch DataLoader: transformers claims epoch and step via its callback, and torch yields on those scopes while still producing data_load / forward / backward / optimizer_step children. scikit-learn is opt-in: there is no auto-hook. Use ci.wrap(estimator) explicitly.

Transport selection

The SDK picks a transport automatically based on the environment, with graceful degradation at every step. Your code is never blocked on I/O or the network.

Environment	Transport
`CIRRON_RUN_ID` set (platform pipeline / deployment)	Kernel / runtime event stream
API key configured, no run context	HTTPS `POST /v1/traces`
No credentials	File-only (local spool)

All three paths write to ./.cirron/spool/ first. Anything that can’t reach the platform stays on disk until the next flush or until you run cirron spool flush later.

Local spool

The local spool is public API. Third-party tools and the Cirron platform ingestion worker both consume the same files.

./.cirron/
  spool/<created_ns>-<batch_id>.json
  snapshots/<span_id>/weights.safetensors
  snapshots/<span_id>/gradients.safetensors

The full schema, including the spans[], marks[], and snapshots[] record layouts, is documented on the Schemas page. It’s stable within a major SDK version. Every batch carries an sdk_version field so readers can branch on it.

Overhead

The SDK’s hot path is synchronous and lock-free. Batching, file I/O, and network send all run on a background flush thread, so your training or serving code only pays for the scope push/pop or mark append itself. Everything else is off-thread. Observed per-call cost:

Operation	x86_64	arm64
`ci.scope` open / close	~4.4 μs	~2.7 μs
`ci.mark`	~3.7 μs	~2.4 μs
`ci.epochs` / `ci.batches` iteration	~4.8 μs	~2.8 μs

Snapshot cost scales with the number of parameter tensors in your model, not with a fixed per-call figure. Each tensor pays one reduction pass for mean/std/min/max + one histogram bucketing pass. On GPU these run as device-side kernels and complete in single-digit milliseconds for typical model sizes. On CPU the work is memory-bandwidth-bound across every parameter tensor and takes noticeably longer. Snapshot mode is opt-out. Pass snapshots=None to ci.profile() if you don’t want per-epoch weight/gradient stats. SDK overhead is tracked and surfaced as a mark inside every scope, so you can see the instrumentation tax in your own traces.

Numbers above are the median of 1M per-primitive iterations during benchmarking using Python 3.13 with released torch. x86_64 numbers were taken on a 2-vCPU cloud VM, arm64 numbers on an Apple Silicon device.

Error handling

The SDK never crashes the user’s process. This is a load-bearing rule. Every hook, flush, and ingest call is wrapped in a top-level exception handler. Exceptions are logged at WARNING and counted; they never propagate into your training or serving code. The flush thread is supervised. If it dies, a new one respawns with backoff; three deaths in 60 seconds degrades to spool-only mode (traces write to disk, no network) until the process restarts. If the local spool fills disk (default cap: 1 GB, configurable via Cirron(spool_max_bytes=...)), the oldest batch files are dropped and a drop counter is incremented.

Profiling

Training instrumentation surface: profile, scope, mark, epochs, batches, framework hooks, snapshots.

Schemas

Spool JSON layout, safetensors snapshot layout, and the platform wire schemas.

SDK

Documentation Index

​Scope tree

​Marks

​Framework hook priority

​Transport selection

​Local spool

​Overhead

​Error handling

​Next

Profiling

Schemas

Scope tree

Marks

Framework hook priority

Transport selection

Local spool

Overhead

Error handling

Next