Pricing

Fireworks AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

  • Fireworks AI uses a pay-as-you-go model across three main surfaces.
  • Serverless inference is billed per million tokens, starting at $0.10/M for models under 4B parameters; cached input tokens and batch inference are both available at 50% off standard serverless rates.
  • On-demand dedicated GPU deployments are billed per second: $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200.
  • Fine-tuning is billed per million training tokens, starting at $0.50/M for models up to 16B parameters, with LoRA fine-tuned models served at base-model inference prices.
  • Audio transcription (Whisper) is priced from $0.0009–$0.0015 per audio minute.
  • New accounts receive $1 in free starter credits.
  • Enterprise plans with reserved capacity and SLAs require contacting sales.

Benchmark context

#11

of 13 in AI/ML Infrastructure & LLM Tools

0.0%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.