How it works
CostMyAI Methodology Center
CostMyAI estimates AI infrastructure costs using published provider pricing, empirically-derived token models, and use-case workload assumptions. When the Analyzer evaluates a model switch, every verdict is grounded in named public benchmarks or we refuse to make a claim.
The three-band verdict system
- VERIFIED: Verified savings
- The alternative model scores at least 5 points above threshold on a named public benchmark that matches your task type. We name the benchmark, show the dollar saving, and certify it in the headline.
- EQUIVALENT: Within margin
- The saving is real. The quality gap is narrow (within 5 points) but below the confidence threshold, or benchmark data exists for a related task type but not this exact one. Shown with an amber flag. Not in the certified total.
- REFUSED: No claim made
- No benchmark data for this workload type. We say so plainly and exclude the saving from the certified total. That refusal is the product working.
Benchmark taxonomy by task type
- Coding: AA Coding Index (Artificial Analysis, real-world coding) - High confidence
- Agentic / tool use: TAU2 (tool-use and agent task completion) - High confidence
- Reasoning: GPQA (graduate-level science reasoning) - High confidence
- Extraction / classification: IFBench (instruction-following accuracy) - High confidence
- RAG / long-context retrieval: LCR (long-context reasoning accuracy) - Medium confidence
- General generation / chat: General intelligence index (composite) - Low confidence
How token costs are calculated
Every AI API charges in tokens (roughly 0.75 words per token). CostMyAI multiplies token counts by the published per-million price for each model, applied separately to input and output tokens. Output tokens typically cost 2-5x more than input.
Cross-provider tokenizer inflation factors (5-12% variance) are applied in the recommendation engine. Headline prices always reflect list price for like-for-like comparison.
Analyze my spend