Documentation Index
Fetch the complete documentation index at: https://docs.cirron.com/llms.txt
Use this file to discover all available pages before exploring further.
ci.load() is the single entry point for data access. One function,
flat kwargs, local-first by default. Nothing hits the network unless
you explicitly opt in via source="platform" or a scheme in the
source string.
Signature
name string (s3://, gs://, postgres://, …)
always overrides the source= kwarg. Without a scheme and with
the default source="local", ci.load() probes the local filesystem
and never calls the platform.
Where the data comes from
Filtering and selection
match= and ext= work on any filesystem-backed source (local, S3,
GCS, Azure, file://).
where= is passed through to SQL sources unescaped: it’s your query,
against your data. Bound result size with LIMIT when you can.
Transforms at load time
@ci.map when the transform is vectorizable against pandas /
polars; use plain callables for per-row work.
How the switch is made: the @ci.map decorator sets a
_cirron_batch_map=True attribute on the callable. ci.load()
checks for that attribute: present means the whole frame is passed
in one call, absent means rows are iterated. That’s the entire
mechanism; decorate or don’t.
Return types
as_= | Returns | Requires |
|---|---|---|
"pandas" | pandas.DataFrame | cirron-sdk[pandas] |
"polars" | polars.DataFrame or LazyFrame | cirron-sdk[polars] |
"iter" | Iterator[dict] in batch_size batches | nothing extra |
"tensor" | torch.Tensor or tf.Tensor (auto-detected) | framework installed |
"hf" | datasets.Dataset | cirron-sdk[hf] |
as_= is not specified,
ci.load() raises CirronDependencyError with an install hint.
Lazy loading
lazy=True returns a LazyHandle with .collect(). Useful for
large datasets that will be filtered or projected further before
materialization.
Size guardrails
Before downloading anything,ci.load() sums the matched bytes
across all sources (for multi-source calls) and applies a
three-tier policy on the total:
| Size | Behavior |
|---|---|
| < 1 GB | Silent |
| < 10 GB | WARNING log with narrowing hints (use match=, etc.) |
| ≥ 10 GB | Raises CirronDataSizeError unless confirm_large=True |
Cirron instance:
LIMIT to bound results.
Credential resolution for SQL sources
Credentials resolve in this order, first match wins:- URI inline:
postgres://user:pass@host/db - Platform integrations:
GET /api/integrations/resolvewith a scoped, short-lived token (requires a configured Cirron integration for that host) ci.secret("<scheme>-<host>"): platform-mounted secret- Driver env var:
PGPASSWORD/MYSQL_PWD/SNOWFLAKE_PASSWORD/DATABRICKS_TOKEN
Not-yet-shipped
search= / top_k= accept input today for API stability but raise
the stdlib NotImplementedError (not a Cirron-specific exception)
until the platform vector-index feature ships. The docs will update
when it does.
Errors
| Exception | When |
|---|---|
CirronDependencyError | as_= requires a backend that isn’t installed (pandas, polars, hf) |
CirronDataSizeError | Matched bytes ≥ load_max_bytes and confirm_large=False |
CirronDatasetNotFound | source="platform" and the registered name doesn’t exist |
CirronPlatformRequired | source="platform" but credentials or network are unavailable |
Next
Configuration
The
Cirron class, ci.env, ci.secret, and where credentials
come from.ci.load reference
Full signature and parameter table.