ci.inference - Cirron Documentation

Decorator that wraps a serving function with profiling. Each call opens a request scope with per-request ContextVar isolation, attributes latency and cost to the deployment record the platform already has. For LLMs, it also automatically captures token counts, time-to-first-token, and tokens/second.

Signature

def inference(
    fn: Callable | None = None,
    *,
    config: dict | None = None,
) -> Callable

Parameters

Name	Type	Default	Purpose
`fn`	`Callable?`	`None`	Implicit: set when used as bare `@ci.inference`
`config`	`dict?`	`None`	Runtime feature flags the wrapped function reads via `config.get()`

Supports both forms:

@ci.inference                 # no args
@ci.inference(config=config)  # with args

Behavior

On each call the decorator:

Allocates an auto-generated request ID (UUID4).
Opens a request scope bound to that ID via contextvars.ContextVar.
Invokes the wrapped function.
Closes the scope. Per-request latency, the nested scope tree, and any marks emitted during the call attribute to the deployment.

The decorator does not change the function’s signature or return value.

Concurrency

Per-request isolation via ContextVar means concurrent requests never contaminate each other’s scopes or marks, regardless of whether the runtime uses threads, asyncio, or both. Works with:

FastAPI / Starlette (async)
Flask (threaded)
ASGI servers directly
Plain function calls in synchronous code

Examples

Basic

import cirron as ci

@ci.inference
def predict(request):
    with ci.scope("preprocess"):       # custom scope inside the request
        x = preprocess(request)
    with ci.scope("model"):
        y = model(x)
    with ci.scope("postprocess"):
        return format_response(y)

Inside a @ci.inference function, the full scope and mark surface is available: ci.scope opens nested spans inside the auto-generated request scope, and ci.mark attaches per-request values.

Async

@ci.inference
async def predict(request):
    async with aiohttp.ClientSession() as s:
        with ci.scope("fetch"):
            data = await s.get(request["url"])
        with ci.scope("model"):
            return model(data)

Config-driven capture

config = ci.env("CONFIG") or {}

@ci.inference(config=config)
def predict(request):
    result = model(preprocess(request))
    if config.get("capture_embeddings"):
        ci.mark("embedding_norm", result.embedding.norm().item())
    threshold = config.get("threshold", 0.5)
    return {"label": "positive" if result.score > threshold else "negative"}

Edit CONFIG on the deployment config panel in the dashboard and hit apply; the platform triggers a rolling restart and the next call to ci.env("CONFIG") returns the new value.

FastAPI

from fastapi import FastAPI
import cirron as ci

ci.profile()
app = FastAPI()

@app.post("/predict")
@ci.inference
async def predict(payload: dict):
    return run(payload)

LLM auto-detection

Wrapped calls that hit an OpenAI-compatible client or HuggingFace generate() are detected automatically:

OpenAI-shaped responses: if the return value has usage.prompt_tokens / usage.completion_tokens (the shape the openai>=1.0 Python client returns), they’re marked on the request scope.
HuggingFace generate: calls into transformers.GenerationMixin.generate are detected; input_ids length and output length are marked.
Streaming responses: when the wrapped function returns an iterator or async iterator of chunks, the time between scope open and first yield is marked as time-to-first-token; tokens/second is computed across the stream.

Detection covers openai>=1.0 clients and transformers.generate. Custom streaming wrappers, other LLM SDKs, and hand-rolled SSE/WebSocket clients may not be detected; fall back to explicit ci.mark("tokens", n) calls when the auto-detection doesn’t fire. Detection is best-effort and wrapped in try/except; if it fails, the wrapped function still returns normally.

Standalone use

Without a deployment record (running outside a Cirron deployment), @ci.inference still produces local traces: the request scope lands at ./.cirron/spool/ like any other scope, just without deployment attribution.

Inference guide

Narrative walk-through including FastAPI and Flask examples.

ci.scope

The with ci.scope("preprocess"): blocks the examples use.

ci.mark

Attach per-request values (tokens, scores, latencies).

ci.env

How CONFIG flows in from the deployment’s env vars.

SDK

Documentation Index

​Signature

​Parameters

​Behavior

​Concurrency

​Examples

​Basic

​Async

​Config-driven capture

​FastAPI

​LLM auto-detection

​Standalone use

​Related