How it works

CostMyAI Methodology Center

CostMyAI estimates AI infrastructure costs using published provider pricing, empirically-derived token models, and use-case workload assumptions. When the Analyzer evaluates a model switch, every verdict is grounded in named public benchmarks or we refuse to make a claim.

The three-band verdict system

VERIFIED: Verified savings: The alternative model scores at least 5 points above threshold on a named public benchmark that matches your task type. We name the benchmark, show the dollar saving, and certify it in the headline.
EQUIVALENT: Within margin: The saving is real. The quality gap is narrow (within 5 points) but below the confidence threshold, or benchmark data exists for a related task type but not this exact one. Shown with an amber flag. Not in the certified total.
REFUSED: No claim made: No benchmark data for this workload type. We say so plainly and exclude the saving from the certified total. That refusal is the product working.

Benchmark taxonomy by task type

Coding: AA Coding Index (Artificial Analysis, real-world coding) - High confidence
Agentic / tool use: TAU2 (tool-use and agent task completion) - High confidence
Reasoning: GPQA (graduate-level science reasoning) - High confidence
Extraction / classification: IFBench (instruction-following accuracy) - High confidence
RAG / long-context retrieval: LCR (long-context reasoning accuracy) - Medium confidence
General generation / chat: General intelligence index (composite) - Low confidence

How token costs are calculated

Every AI API charges in tokens (roughly 0.75 words per token). CostMyAI multiplies token counts by the published per-million price for each model, applied separately to input and output tokens. Output tokens typically cost 2-5x more than input.

Cross-provider tokenizer inflation factors (5-12% variance) are applied in the recommendation engine. Headline prices always reflect list price for like-for-like comparison.

Analyze my spend