How it works

CostMyAI Methodology Center

CostMyAI estimates AI infrastructure costs using published provider pricing, empirically-derived token models, and use-case workload assumptions. When the Analyzer evaluates a model switch, every verdict is grounded in named public benchmarks or we refuse to make a claim.

The three-band verdict system

VERIFIED: Verified savings
The alternative model scores at least 5 points above threshold on a named public benchmark that matches your task type. We name the benchmark, show the dollar saving, and certify it in the headline.
EQUIVALENT: Within margin
The saving is real. The quality gap is narrow (within 5 points) but below the confidence threshold, or benchmark data exists for a related task type but not this exact one. Shown with an amber flag. Not in the certified total.
REFUSED: No claim made
No benchmark data for this workload type. We say so plainly and exclude the saving from the certified total. That refusal is the product working.

Benchmark taxonomy by task type

  • Coding: AA Coding Index (Artificial Analysis, real-world coding) - High confidence
  • Agentic / tool use: TAU2 (tool-use and agent task completion) - High confidence
  • Reasoning: GPQA (graduate-level science reasoning) - High confidence
  • Extraction / classification: IFBench (instruction-following accuracy) - High confidence
  • RAG / long-context retrieval: LCR (long-context reasoning accuracy) - Medium confidence
  • General generation / chat: General intelligence index (composite) - Low confidence

How token costs are calculated

Every AI API charges in tokens (roughly 0.75 words per token). CostMyAI multiplies token counts by the published per-million price for each model, applied separately to input and output tokens. Output tokens typically cost 2-5x more than input.

Cross-provider tokenizer inflation factors (5-12% variance) are applied in the recommendation engine. Headline prices always reflect list price for like-for-like comparison.

Analyze my spend