Documentation Index
Fetch the complete documentation index at: https://docs.cirron.com/llms.txt
Use this file to discover all available pages before exploring further.
@ci.inference wraps a serving function with profiling. Each call
opens a request scope, attributes latency and cost to the
deployment record the platform already has. For LLMs, it also
automatically captures token counts, time-to-first-token, and
tokens/second.
The basics
- Opens a
requestscope with an auto-generated request ID. - Invokes your function.
- Closes the scope. Per-request latency, scope tree, and marks are attributed to the deployment.
Per-request isolation
Every request gets its own scope tree viacontextvars.ContextVar.
Concurrent requests never contaminate each other’s scopes or marks,
regardless of whether the runtime uses threads, asyncio, or both.
Config-driven capture
Pass aconfig= dict to toggle optional capture logic at runtime,
without redeploying code.
ci.env() reads from the deployment’s environment variables. On the
dashboard’s deployment config panel, edit the CONFIG env var (or
whichever key you chose) and hit apply. The platform triggers a rolling
restart of the deployment’s containers with the new value, and the
next call to ci.env("CONFIG") returns it.
See ci.env in Configuration for the
JSON auto-parsing rules.
Automatic LLM detection
When the wrapped function calls an OpenAI-compatible client or HuggingFacegenerate, the SDK captures LLM-shaped metrics with no
extra code:
- OpenAI-compatible responses: if the return value has
usage.prompt_tokens/usage.completion_tokens, they’re marked on the request scope. - HuggingFace
generate: input_ids length and output length are captured. - Streaming responses: the time between scope open and first yield is marked as time-to-first-token; tokens/second is computed across the stream.
try/except; if it
fails, your function still returns normally.
Lifecycle and deployment context
When the SDK is running inside a Cirron deployment,ci.profile() is
typically called at module import time, before the serving framework
starts accepting traffic. The deployment’s runtime injects the
CIRRON_DEPLOYMENT_ID, CIRRON_WORKSPACE_ID, and any
CIRRON_SECRET_* env vars your function reads via ci.secret().
Running standalone (no deployment record), @ci.inference still
produces local traces: the request scope lands at
./.cirron/spool/ like any other scope, just without deployment
attribution.
Next
Configuration
ci.env, ci.secret, and the Cirron class: what you’ll use
to source config and credentials in a deployment.Profiling
Training-side instrumentation if your deployment also trains or
fine-tunes inline.