Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.cirron.com/llms.txt

Use this file to discover all available pages before exploring further.

Two surfaces are documented here:
  1. The local spool format: what the SDK writes to ./.cirron/. Public API, stable within a major SDK version, consumed by the Cirron ingestion worker and by any third-party tool.
  2. The platform wire schemas: what ends up in the Cirron database after ingestion. Useful when you’re writing queries, building a custom consumer, or exporting to your own storage.

Local spool format (v1)

Directory layout

./.cirron/
  spool/
    <created_ns>-<batch_id>.json       # one batch per file
  snapshots/
    <span_id>/
      weights.safetensors              # sampled / full mode only
      gradients.safetensors            # when gradients are non-None
  • <created_ns>: wall-clock time the batch was sealed, nanoseconds since Unix epoch, zero-padded to 20 digits. Filenames sort lexicographically in chronological order; the flush thread uses this for oldest-first eviction when the spool cap is exceeded.
  • <batch_id>: 32-char lowercase hex (UUID4 without dashes).
  • Files are written via a .json.tmpos.replace() handoff, so a reader that opens a *.json file always sees a complete batch.

Batch JSON

{
  "schema_version": 1,
  "sdk_version": "0.x.y",
  "batch_id": "abcdef...",
  "created_ns": 1234567890000000000,
  "spans": [ ... ],
  "marks": [ ... ],
  "snapshots": [ ... ]
}

spans[]

{
  "id": "hex32",
  "name": "epoch",
  "parent_id": "hex32 | null",
  "index": 0,
  "start_ns": 0,
  "end_ns": 0,
  "cpu_ns": null,
  "gpu_ns": null,
  "memory_peak_bytes": null,
  "thread_id": 140000000,
  "pid": 12345,
  "rank": 0,
  "attrs": { "key": "value" },
  "mark_ids": ["hex32", "..."]
}
cpu_ns, gpu_ns, and memory_peak_bytes default to null. gpu_ns is set by torch CUDA event pairs when profiling a CUDA forward/backward pass. cpu_ns and memory_peak_bytes are reserved and not populated today. mark_ids holds the IDs of every mark attached to this span.

marks[]

{
  "id": "hex32",
  "span_id": "hex32 | \"root\"",
  "name": "loss",
  "value_type": "float | int | string | bool",
  "value": 0.5,
  "attrs": { "step": 10 },
  "ts_ns": 0,
  "kind": "point | summary"
}
A mark attaches to the innermost open scope on the producing thread. When no scope is open, it attaches to the cirron.session root. Marks emitted before ci.profile() was called (or after shutdown()) use the legacy "root" sentinel instead of a real span ID.

snapshots[]

{
  "id": "hex32",
  "span_id": "hex32",
  "tensor_name": "layer1.0.conv1.weight",
  "shape": [64, 3, 7, 7],
  "dtype": "float32",
  "mode": "stats",
  "stats": {
    "mean": 0.0,
    "std": 0.0,
    "min": 0.0,
    "max": 0.0,
    "norm": 0.0,
    "histogram": {
      "bins":   ["... 17 floats ..."],
      "counts": ["... 16 ints ..."]
    }
  },
  "blob_uri": null,
  "ts_ns": 0,
  "attrs": {}
}
mode values:
  • "stats": inline statistics only. blob_uri is null. Default.
  • "sampled": stats + a safetensors blob on random() < sample_rate epoch boundaries. Records that lose the roll stay mode="stats" with blob_uri=null.
  • "full": stats + blob every epoch. Debug-only; not recommended for 100M+ parameter models.
Sampled and full write one safetensors file per (span, kind): ./.cirron/snapshots/<span_id>/weights.safetensors for weights and gradients.safetensors for gradients. Every record for that span shares the same blob_uri; tensor_name is used verbatim as the key inside the container, so consumers can call container[record["tensor_name"]] directly with no sanitization. Gradient records use tensor_name = "<param>.grad" (e.g. "layer1.0.conv1.weight.grad") and only appear when the gradient was non-None at capture time.

Canonical scope shape

cirron.session
  epoch[n]
    step[n]
      data_load
      forward
      backward
      optimizer_step
Epoch spans are siblings under the session, never nested. When multiple framework hooks coexist (e.g. HF Trainer over a PyTorch DataLoader), only the highest-priority hook owns epoch and step (transformers > tensorflow > torch); others yield on those names so no semantic scope is duplicated. Operations executed before the training loop runs (warmup forwards, sanity checks, optimizer construction) have parent_id == session_id, not an epoch. No epoch exists yet; this is correct behavior, not a bug. For inference, the top-level scope per call is request instead of epoch.

Reading the spool

import json
from pathlib import Path
from safetensors import safe_open

for batch_file in sorted(Path("./.cirron/spool").glob("*.json")):
    batch = json.loads(batch_file.read_text())

    for span in batch["spans"]:
        print(span["name"], span["end_ns"] - span["start_ns"])

    for snap in batch["snapshots"]:
        if snap["blob_uri"] is None:
            continue
        path = snap["blob_uri"].removeprefix("file://")
        with safe_open(path, framework="pt") as f:
            tensor = f.get_tensor(snap["tensor_name"])

Forward compatibility

Readers must tolerate unknown top-level keys and unknown per-span / per-mark fields, so minor SDK bumps can add optional metadata. Removing or renaming existing fields, or changing their types, requires a schema_version bump and follows SemVer.

Wire format: POST /v1/traces

When the HTTP transport is active (external runs with an API key), the SDK batches spans / marks / snapshots into the same JSON shape documented above and posts it to POST /v1/traces on the Cirron platform API. The body wraps the batch like this:
POST /v1/traces
  Authorization: Bearer <api_key>
  Content-Type: application/json
  Content-Encoding: gzip
  X-Cirron-SDK-Version: 0.x.y

  {
    "schema_version": 1,
    "sdk_version": "0.x.y",
    "batch_id": "abcdef...",
    "created_ns": 1234567890000000000,
    "spans": [ ... ],
    "marks": [ ... ],
    "snapshots": [ ... ]       # metadata only; blobs upload separately
  }
Successful submissions return 202 Accepted with the batch ID. Idempotent by batch_id (24-hour dedupe window server-side), so retrying the same batch after a timeout is safe. Rate-limited responses return 429 with a Retry-After header the SDK respects via exponential backoff. For self-hosted installs, this is the full wire contract: a custom ingestion worker that accepts the above payload is sufficient to consume SDK traffic.

Platform wire schemas

After ingestion, traces land in these tables. Field names are camelCase (Prisma conventions); the SDK sends snake_case and the ingestion worker maps it.

TraceSpan

FieldTypeNotes
idstringcuid
traceIdstringRoot scope ID for the process session
parentSpanIdstring?null for root
namestringScope name (epoch, step, forward, request, …)
indexint?Scope index (epoch number, batch number)
attrsjson?Arbitrary user attributes
startNsbigintWall time, nanoseconds
endNsbigintWall time, nanoseconds
cpuNsbigint?CPU time
gpuNsbigint?GPU time; null when CUDA unavailable
memoryPeakBytesbigint?RSS peak during span
threadIdbigint?
rankintDistributed-training rank (default 0)
workspaceIdstringResource link
pipelineIdstring?Resource link
runIdstringResource link
deploymentIdstring?Resource link (inference)
modelIdstring?Resource link
Indexes: (workspaceId, runId, startNs), (workspaceId, pipelineId, startNs), (workspaceId, deploymentId, startNs), (traceId, parentSpanId).

TraceMark

FieldTypeNotes
idstringcuid
spanIdstringOwning span
namestringMark name (loss, grad_norm, …)
valueTypestring"float" | "int" | "string" | "bool"
valueFloatfloat?Populated when valueType="float"
valueIntbigint?Populated when valueType="int"
valueStringstring?256-byte cap
valueBoolbool?
attrsjson?
tsNsbigintWall time
kindstring"point" (default) | "summary"

TraceSnapshot

FieldTypeNotes
idstringcuid
spanIdstringOwning span (typically an epoch)
tensorNamestringe.g. "layer1.0.conv1.weight"
shapejsone.g. [768, 3072]
dtypestringe.g. "float32"
modestring"stats" | "sampled" | "full"
statsjson?{mean, std, min, max, norm, histogram} for stats-bearing records
blobUristring?S3 URI for sampled / full; null for pure stats records

Snapshot object-storage layout

s3://<bucket>/traces/<workspace_id>/<run_id>/<span_id>/<snapshot_id>.<ext>
Self-hosted deployments point at MinIO or on-prem S3-compatible storage using the same path scheme.