Requirements¶

AI Workloads is a platform by Carbon Labs that automatically meters carbon emissions from AI inference workloads and retires verified carbon credits against them. Users connect their AI provider accounts (OpenAI, Anthropic, OpenRouter), and the system polls usage APIs, converts token consumption into CO2 estimates using empirically grounded methodology, and generates cryptographically signed retirement receipts with full chain of custody.

The platform serves three audiences: individual developers wanting simple offset tooling, AI-native startups needing CSRD-ready compliance artifacts, and inference providers wanting to embed carbon-neutral inference as a premium tier.

The free tier provides standalone value as a usage analytics dashboard (cached/uncached token splits, per-model breakdowns, CSV/JSON export). Paid tiers add automated credit retirement, signed receipts, and audit packs.

Tech stack: Python 3.12+ / FastAPI / SQLAlchemy 2.0 + Alembic / asyncpg on Supabase Postgres / ARQ on Redis / Clerk auth / Pydantic v2 / httpx async / WeasyPrint / ECS Fargate.

1. Provider Connection Management¶

As a developer, I want to register my AI provider API keys securely so that the platform can poll my usage data without exposing credentials.

WHEN a user submits an API key for OpenAI, Anthropic, or OpenRouter THEN the system SHALL validate the key against the provider's API before storing it.
WHEN a key passes validation THEN the system SHALL encrypt it and store the reference in AWS Secrets Manager, persisting only the Secrets Manager ARN in the database.
IF a key validation fails THEN the system SHALL return a descriptive error and SHALL NOT store the key.
WHEN a user lists their connections THEN the system SHALL return connection metadata (provider, status, last_polled_at) without ever exposing the API key.
WHEN a user deletes a connection THEN the system SHALL schedule the Secrets Manager secret for deletion and deactivate the associated workload.
WHEN a user triggers a manual sync on a connection THEN the system SHALL rate-limit manual syncs to at most 1 per connection per 5 minutes.
WHEN a connection encounters a permanent auth failure (401/403) THEN the system SHALL set status to \"error\" with a descriptive message and stop polling until the user re-authenticates.
WHEN a connection encounters a transient failure (429/5xx) THEN the system SHALL apply exponential backoff (base 30s, max 15min, jitter) and retain \"active\" status.
THE system SHALL support multiple provider connections per organization.
THE system SHALL track connection status as one of: validating, active, error, disabled.

1B. Workload & Project Assignment¶

As a developer, I want polled usage data automatically organized into projects and workloads so that I can see per-project emissions without manual tagging.

WHEN a provider connection is created with an optional project_id THEN the system SHALL create a default workload under that project for the connection.
WHEN a provider connection is created without a project_id THEN the system SHALL create or reuse a \"Default\" project for the organization and assign a default workload under it.
WHEN telemetry events are polled THEN the system SHALL assign them to the default workload for that connection.
WHEN a user creates a project THEN the system SHALL allow re-mapping existing connections to the new project, which creates a new workload and moves future events.
THE system SHALL allow a single connection to feed multiple workloads (one per project it's mapped to), with the user choosing which project new events route to.
WHEN a user views project breakdowns THEN the system SHALL aggregate only the telemetry events belonging to workloads under that project.
THE system SHALL NOT require manual event tagging - all assignment happens at the connection-to-project mapping level.

2. Telemetry Ingestion¶

As a developer, I want my AI usage data to be automatically ingested and deduplicated so that I get accurate, complete records without manual effort.

WHEN the hourly cron fires THEN the system SHALL enqueue a poll job for every connection with status "active."
WHEN polling a provider THEN the system SHALL use the connection's sync_cursor for incremental data retrieval.
WHEN telemetry events are ingested THEN the system SHALL generate an idempotency_hash using SHA-256 of provider + org_id + model + bucket_start_hour as the coarse dedup key, and upsert (last-write-wins on token counts) rather than reject on collision.
WHEN ingesting from Anthropic THEN the system SHALL map the three-way token split: input_tokens -> uncached_input, cache_creation_input_tokens -> cache_creation, cache_read_input_tokens -> cached_input.
WHEN ingesting from OpenAI or OpenRouter THEN the system SHALL treat all input tokens as uncached (conservative default).
THE system SHALL record event_timestamp (provider-reported) separately from sync_timestamp (ingestion time).
WHEN the daily reconciliation job runs (03:00 UTC) THEN the system SHALL re-poll T-24h through T-0 and upsert any revised data.
THE system SHALL enforce partial immutability on telemetry events via PostgreSQL triggers: the idempotency_hash, event_timestamp, and model fields SHALL be protected from UPDATE, and all rows SHALL be protected from DELETE. Token counts and raw_payload MAY be updated via upsert during reconciliation.
WHEN a poll returns has_more=true THEN the system SHALL re-enqueue itself for the same connection to drain remaining data.
WHEN poll results are ingested THEN the system SHALL immediately enqueue emissions calculation jobs for each new/updated event.

3. Emissions Calculation Engine¶

As a developer, I want my token usage automatically converted to CO2 estimates with transparent methodology so that I can trust and cite the numbers.

WHEN a telemetry event is processed THEN the system SHALL calculate emissions using the pipeline: tokens -> energy (joules) -> kWh -> CO2 (kg).
THE system SHALL separate calculation into prefill energy, decode energy, and cached-read energy phases.
THE system SHALL apply a PUE multiplier: 1.3x for known hyperscalers (OpenAI, Anthropic, Google), 1.55x for unknown providers.
THE system SHALL use versioned Carbon Factors stored in the database, with model-to-tier mapping via fnmatch glob patterns (small/medium/large/reasoning tiers).
WHEN a model is not recognized by any glob pattern THEN the system SHALL fall back to the \"medium\" tier (conservative default).
THE system SHALL store the full calculation breakdown (tier, per-phase energy, PUE, grid intensity, CO2, bounds) as JSON alongside computed scalar fields.
THE system SHALL compute uncertainty bounds using configurable lower/upper multipliers from the active Carbon Factors version.
THE system SHALL use EPA eGRID2023 U.S. national average (~350 gCO2/kWh) for v1 grid intensity.
WHEN a new Carbon Factors version is published THEN newly ingested events SHALL use the new version while existing calculations SHALL remain unchanged.
THE system SHALL expose the full methodology (sources, assumptions, factors version, uncertainty basis) via a public API endpoint and dashboard page.

4. Credit Retirement & Receipt Engine¶

As a paying customer, I want cryptographically signed receipts proving that real carbon credits were retired against my emissions so that I can show auditors a full chain of custody.

WHEN a billing period closes for a paid-tier organization THEN the system SHALL retire credits from internal inventory matching the period's total CO2.
EACH retirement SHALL reference real credit serial numbers from the credit registry (ACR, Verra, etc.).
WHEN generating a receipt THEN the system SHALL create a canonical JSON payload, compute SHA-256 hash, and sign with Ed25519 via PyNaCl.
THE system SHALL assign each receipt a serial number in the format CL-YYYYMM-XXXXX using a PostgreSQL sequence.
WHEN a receipt is created THEN the system SHALL generate a branded PDF via WeasyPrint + Jinja2 templates and upload to S3.
THE system SHALL expose a public verification endpoint (no auth) that returns the receipt metadata, signature, and public key so any third party can verify independently.
THE signing private key SHALL be stored in AWS Secrets Manager with annual rotation and key versioning.
WHEN verifying a receipt THEN the system SHALL validate the Ed25519 signature against the stored payload_hash and return a boolean valid/invalid result.
THE system SHALL generate monthly Audit Packs (zip of receipts, calculations, retirement confirmations, methodology version) and upload to S3.
THE system SHALL NOT generate receipts until the T+48h reconciliation window has closed for the billing period, ensuring late-arriving telemetry is captured before finalizing.

5. Billing System¶

As a user, I want a clear free-to-paid upgrade path so that I can start with free analytics and upgrade when I'm ready to offset.

THE system SHALL support five tiers: Free ($0), Starter ($29/mo, 10M tokens), Growth ($99/mo, 100M tokens), Scale ($299/mo, 1B tokens), Enterprise (custom).
WHEN a user signs up THEN the system SHALL create a Clerk organization and default to the Free tier with no payment method required.
WHEN a user upgrades to a paid tier THEN the system SHALL create a Stripe subscription and require a valid payment method.
THE system SHALL scope billing periods to calendar months using event_timestamp boundaries (not sync_timestamp).
WHEN Stripe fires invoice.payment_succeeded THEN the system SHALL transition the billing period to \"closing\" and enqueue the credit retirement + receipt generation job (respecting the T+48h reconciliation window per Requirement 4.10).
WHEN Stripe fires invoice.payment_failed THEN the system SHALL set the billing period to \"failed\" and notify the user.
IF late-arriving telemetry falls into a closed billing period beyond the T+48h window THEN the system SHALL apply a \"prior period adjustment\" line item in the next billing period.
THE Free tier SHALL provide: provider connections, per-model/per-project breakdowns, cached/uncached token splits, CO2 estimation, CSV/JSON export.
THE Free tier SHALL NOT generate receipts, retire credits, or produce audit packs.
THE system SHALL handle Stripe webhook signature verification and reject unsigned/invalid payloads with HTTP 400.

6. Web Dashboard¶

As a user, I want a web dashboard that gives me better AI usage analytics than native provider dashboards so that the free tier alone is worth using.

THE dashboard SHALL be a Next.js 14+ App Router application consuming only the documented FastAPI REST API.
THE dashboard SHALL contain zero business logic - every number displayed comes from an API response.
WHEN a user views the organization overview THEN the dashboard SHALL display: total CO2e, connected providers, active projects, and current billing period summary.
WHEN a user views a project detail page THEN the dashboard SHALL display: per-model breakdown, daily emissions chart, cached vs. uncached token split.
THE dashboard SHALL provide CSV and JSON export for usage and emissions data.
THE dashboard SHALL display receipts with view/download (JSON + PDF) and verification status.
THE dashboard SHALL include an in-app methodology page with source citations and uncertainty range explanations.
WHEN a free-tier user views offset-related features THEN the dashboard SHALL display an \"Upgrade to offset\" CTA.
THE dashboard SHALL use Clerk React SDK for authentication and TanStack Query for data fetching.
THE dashboard SHALL handle loading, empty, and error states for all data views.

7. Background Job System¶

As a platform operator, I want reliable background jobs for polling, reconciliation, and billing so that the system runs autonomously without manual intervention.

THE system SHALL use ARQ (async Redis queue) for all background job processing.
THE system SHALL define the following cron jobs: hourly provider polling, daily T+24h reconciliation (03:00 UTC), monthly audit pack generation (3^rd of month, 06:00 UTC).
WHEN a job fails with a transient error THEN the system SHALL retry with exponential backoff (max 3 retries for most jobs, max 2 for reconciliation).
WHEN a job fails permanently THEN the system SHALL log the error with structured context (job_name, connection_id, error_type, traceback) and set appropriate entity status.
THE system SHALL distinguish between transient failures (429, 503, timeout) and permanent failures (401, 403, 404) in retry logic.
THE worker configuration SHALL enforce job_timeout=300s default, with reconciliation at 900s.
WHEN the queue depth exceeds a configurable threshold THEN the system SHALL log a warning for monitoring.
THE system SHALL expose a /health endpoint that checks: API server, database connectivity, Redis connectivity, and last successful poll timestamp.
THE system SHALL use structured logging (structlog) with correlation IDs for request tracing across API and background jobs.
THE system SHALL support horizontal worker scaling by running multiple ARQ worker ECS tasks keyed to queue depth.

8. API Design & Security¶

As a developer consuming the API, I want a well-documented, secure REST API so that I can integrate confidently.

THE API SHALL be versioned via URL prefix (/v1/...) with auto-generated OpenAPI 3.1 spec from Pydantic models.
ALL endpoints (except /public/receipts/verify/{serial_number} and /health) SHALL require a Clerk JWT Bearer token verified via JWKS.
ALL database queries SHALL be scoped by org_id extracted from the auth dependency - no cross-organization access.
THE API SHALL use pagination on all list endpoints (page-based with page and page_size parameters; max page_size of 200 for telemetry events, 100 for receipts).
THE API SHALL enforce rate limiting via slowapi: 100 req/s free tier, 500 req/s paid tiers.
THE API SHALL use standard error response format: { "detail": "descriptive error message" } for simple errors, with additional fields (e.g., upgrade_url) where contextually useful.
THE API SHALL serve over HTTPS with HSTS headers, CORS restricted to the dashboard domain.
THE API SHALL never log, return, or include API keys in error messages or raw_payload fields.
THE system SHALL use Pydantic v2 SecretStr for API key input and never serialize keys in responses.
THE API SHALL expose read-only Carbon Factors endpoints (current, by version, list all versions).

9. Data Model & Persistence¶

As a platform operator, I want a well-structured, auditable database schema so that data integrity is maintained and migrations are safe.

THE system SHALL use SQLAlchemy 2.0 async ORM with asyncpg driver on Supabase-managed PostgreSQL.
THE system SHALL use Alembic for versioned, reviewable database migrations.
THE system SHALL enforce partial telemetry_events immutability via PostgreSQL BEFORE UPDATE triggers (protecting idempotency_hash, event_timestamp, model) and BEFORE DELETE triggers. Token counts and raw_payload are updatable for reconciliation upserts.
THE system SHALL use PostgreSQL JSONB for flexible fields (raw_payload on TelemetryEvent, credit_serial_numbers on CarbonReceipt).
THE system SHALL use PostgreSQL ARRAY(String) for model_patterns on CarbonFactors.
THE system SHALL define composite indexes on (workload_id, event_timestamp) and (event_timestamp) for telemetry query performance.
THE system SHALL define unique constraints on: (org_id, name) for projects, (org_id, period_start) for billing periods, idempotency_hash for telemetry events, serial_number for receipts.
THE system SHALL use UUID primary keys generated at the application layer.
THE system SHALL store all timestamps as timezone-aware UTC using DateTime(timezone=True).
THE system SHALL define a data retention strategy: hot data (current + 2 months in PostgreSQL), warm data (3-24 months archived to S3 Parquet), cold data (24+ months in Glacier).

10. Infrastructure & Deployment¶

As a platform operator, I want a reproducible, scalable deployment so that the system handles growth without manual scaling.

THE system SHALL deploy on AWS ECS Fargate with three service types: API (min 2 tasks), Worker (min 1, auto-scales on queue depth), Dashboard (min 2 tasks).
THE system SHALL use an Application Load Balancer with TLS termination for the API.
THE system SHALL maintain three environments: dev (local Docker Compose), staging (ECS + test Stripe), prod (ECS + live Stripe).
THE system SHALL provide a Dockerfile with WeasyPrint system dependencies (Pango, Cairo, GDK-Pixbuf).
THE system SHALL provide a docker-compose.yml for local development with API, Redis, and PostgreSQL services.
THE system SHALL store PDFs and Audit Pack zips in S3 with appropriate bucket policies.
THE system SHALL use ElastiCache (Redis) for ARQ job queue in staging/prod.
THE system SHALL use pydantic-settings for environment variable management with .env file support for local dev.

11. Phase 2 - SDK, MCP & CLI Ecosystem¶

As a developer, I want SDK and CLI tools so that I can integrate carbon tracking into my workflows without using the dashboard.

WHEN Phase 1 acceptance criteria are met and OpenAPI spec is published THEN Phase 2 development SHALL begin.
THE system SHALL provide a TypeScript SDK (@carbonlabs/sdk) with types auto-generated from the OpenAPI spec.
THE system SHALL provide a Python SDK (carbonlabs on PyPI) as a thin async wrapper over the REST API.
THE system SHALL provide an MCP server with tools: estimate_emissions, get_summary, list_receipts, verify_receipt, check_status.
THE system SHALL provide a CLI tool with commands: auth login, connections add, status, receipts list, receipts verify, export.
ALL SDK/MCP/CLI tools SHALL consume only the documented REST API - no direct database access.
THE SDK overhead SHALL add less than 50ms to API call latency.