Documentation Index
Fetch the complete documentation index at: https://docs.cirron.com/llms.txt
Use this file to discover all available pages before exploring further.
Two surfaces are documented here:
- The local spool format: what the SDK writes to
./.cirron/.
Public API, stable within a major SDK version, consumed by the
Cirron ingestion worker and by any third-party tool.
- The platform wire schemas: what ends up in the Cirron database
after ingestion. Useful when you’re writing queries, building a
custom consumer, or exporting to your own storage.
Directory layout
./.cirron/
spool/
<created_ns>-<batch_id>.json # one batch per file
snapshots/
<span_id>/
weights.safetensors # sampled / full mode only
gradients.safetensors # when gradients are non-None
<created_ns>: wall-clock time the batch was sealed, nanoseconds
since Unix epoch, zero-padded to 20 digits. Filenames sort
lexicographically in chronological order; the flush thread uses this
for oldest-first eviction when the spool cap is exceeded.
<batch_id>: 32-char lowercase hex (UUID4 without dashes).
- Files are written via a
.json.tmp → os.replace() handoff, so a
reader that opens a *.json file always sees a complete batch.
Batch JSON
{
"schema_version": 1,
"sdk_version": "0.x.y",
"batch_id": "abcdef...",
"created_ns": 1234567890000000000,
"spans": [ ... ],
"marks": [ ... ],
"snapshots": [ ... ]
}
spans[]
{
"id": "hex32",
"name": "epoch",
"parent_id": "hex32 | null",
"index": 0,
"start_ns": 0,
"end_ns": 0,
"cpu_ns": null,
"gpu_ns": null,
"memory_peak_bytes": null,
"thread_id": 140000000,
"pid": 12345,
"rank": 0,
"attrs": { "key": "value" },
"mark_ids": ["hex32", "..."]
}
cpu_ns, gpu_ns, and memory_peak_bytes default to null.
gpu_ns is set by torch CUDA event pairs when profiling a CUDA
forward/backward pass. cpu_ns and memory_peak_bytes are reserved
and not populated today. mark_ids holds the IDs of every mark
attached to this span.
marks[]
{
"id": "hex32",
"span_id": "hex32 | \"root\"",
"name": "loss",
"value_type": "float | int | string | bool",
"value": 0.5,
"attrs": { "step": 10 },
"ts_ns": 0,
"kind": "point | summary"
}
A mark attaches to the innermost open scope on the producing thread.
When no scope is open, it attaches to the cirron.session root.
Marks emitted before ci.profile() was called (or after
shutdown()) use the legacy "root" sentinel instead of a real
span ID.
snapshots[]
{
"id": "hex32",
"span_id": "hex32",
"tensor_name": "layer1.0.conv1.weight",
"shape": [64, 3, 7, 7],
"dtype": "float32",
"mode": "stats",
"stats": {
"mean": 0.0,
"std": 0.0,
"min": 0.0,
"max": 0.0,
"norm": 0.0,
"histogram": {
"bins": ["... 17 floats ..."],
"counts": ["... 16 ints ..."]
}
},
"blob_uri": null,
"ts_ns": 0,
"attrs": {}
}
mode values:
"stats": inline statistics only. blob_uri is null. Default.
"sampled": stats + a safetensors blob on
random() < sample_rate epoch boundaries. Records that lose the
roll stay mode="stats" with blob_uri=null.
"full": stats + blob every epoch. Debug-only; not recommended
for 100M+ parameter models.
Sampled and full write one safetensors file per (span, kind):
./.cirron/snapshots/<span_id>/weights.safetensors for weights and
gradients.safetensors for gradients. Every record for that span
shares the same blob_uri; tensor_name is used verbatim as the key
inside the container, so consumers can call container[record["tensor_name"]]
directly with no sanitization.
Gradient records use tensor_name = "<param>.grad" (e.g.
"layer1.0.conv1.weight.grad") and only appear when the gradient was
non-None at capture time.
Canonical scope shape
cirron.session
epoch[n]
step[n]
data_load
forward
backward
optimizer_step
Epoch spans are siblings under the session, never nested. When
multiple framework hooks coexist (e.g. HF Trainer over a PyTorch
DataLoader), only the highest-priority hook owns epoch and step
(transformers > tensorflow > torch); others yield on those names so
no semantic scope is duplicated.
Operations executed before the training loop runs (warmup
forwards, sanity checks, optimizer construction) have
parent_id == session_id, not an epoch. No epoch exists yet; this
is correct behavior, not a bug.
For inference, the top-level scope per call is request instead of
epoch.
Reading the spool
import json
from pathlib import Path
from safetensors import safe_open
for batch_file in sorted(Path("./.cirron/spool").glob("*.json")):
batch = json.loads(batch_file.read_text())
for span in batch["spans"]:
print(span["name"], span["end_ns"] - span["start_ns"])
for snap in batch["snapshots"]:
if snap["blob_uri"] is None:
continue
path = snap["blob_uri"].removeprefix("file://")
with safe_open(path, framework="pt") as f:
tensor = f.get_tensor(snap["tensor_name"])
Forward compatibility
Readers must tolerate unknown top-level keys and unknown per-span
/ per-mark fields, so minor SDK bumps can add optional metadata.
Removing or renaming existing fields, or changing their types,
requires a schema_version bump and follows SemVer.
Wire format: POST /v1/traces
When the HTTP transport is active (external runs with an API key),
the SDK batches spans / marks / snapshots into the same JSON shape
documented above and posts it to POST /v1/traces on the Cirron
platform API. The body wraps the batch like this:
POST /v1/traces
Authorization: Bearer <api_key>
Content-Type: application/json
Content-Encoding: gzip
X-Cirron-SDK-Version: 0.x.y
{
"schema_version": 1,
"sdk_version": "0.x.y",
"batch_id": "abcdef...",
"created_ns": 1234567890000000000,
"spans": [ ... ],
"marks": [ ... ],
"snapshots": [ ... ] # metadata only; blobs upload separately
}
Successful submissions return 202 Accepted with the batch ID.
Idempotent by batch_id (24-hour dedupe window server-side), so
retrying the same batch after a timeout is safe. Rate-limited
responses return 429 with a Retry-After header the SDK respects
via exponential backoff.
For self-hosted installs, this is the full wire contract: a custom
ingestion worker that accepts the above payload is sufficient to
consume SDK traffic.
After ingestion, traces land in these tables. Field names are
camelCase (Prisma conventions); the SDK sends snake_case and the
ingestion worker maps it.
TraceSpan
| Field | Type | Notes |
|---|
id | string | cuid |
traceId | string | Root scope ID for the process session |
parentSpanId | string? | null for root |
name | string | Scope name (epoch, step, forward, request, …) |
index | int? | Scope index (epoch number, batch number) |
attrs | json? | Arbitrary user attributes |
startNs | bigint | Wall time, nanoseconds |
endNs | bigint | Wall time, nanoseconds |
cpuNs | bigint? | CPU time |
gpuNs | bigint? | GPU time; null when CUDA unavailable |
memoryPeakBytes | bigint? | RSS peak during span |
threadId | bigint? | |
rank | int | Distributed-training rank (default 0) |
workspaceId | string | Resource link |
pipelineId | string? | Resource link |
runId | string | Resource link |
deploymentId | string? | Resource link (inference) |
modelId | string? | Resource link |
Indexes: (workspaceId, runId, startNs),
(workspaceId, pipelineId, startNs),
(workspaceId, deploymentId, startNs), (traceId, parentSpanId).
TraceMark
| Field | Type | Notes |
|---|
id | string | cuid |
spanId | string | Owning span |
name | string | Mark name (loss, grad_norm, …) |
valueType | string | "float" | "int" | "string" | "bool" |
valueFloat | float? | Populated when valueType="float" |
valueInt | bigint? | Populated when valueType="int" |
valueString | string? | 256-byte cap |
valueBool | bool? | |
attrs | json? | |
tsNs | bigint | Wall time |
kind | string | "point" (default) | "summary" |
TraceSnapshot
| Field | Type | Notes |
|---|
id | string | cuid |
spanId | string | Owning span (typically an epoch) |
tensorName | string | e.g. "layer1.0.conv1.weight" |
shape | json | e.g. [768, 3072] |
dtype | string | e.g. "float32" |
mode | string | "stats" | "sampled" | "full" |
stats | json? | {mean, std, min, max, norm, histogram} for stats-bearing records |
blobUri | string? | S3 URI for sampled / full; null for pure stats records |
Snapshot object-storage layout
s3://<bucket>/traces/<workspace_id>/<run_id>/<span_id>/<snapshot_id>.<ext>
Self-hosted deployments point at MinIO or on-prem S3-compatible
storage using the same path scheme.