Pricing
Fireworks AI pricing context
Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.
Reviewed pricing summary
- Fireworks AI uses a pay-as-you-go model across three main surfaces.
- Serverless inference is billed per million tokens, starting at $0.10/M for models under 4B parameters; cached input tokens and batch inference are both available at 50% off standard serverless rates.
- On-demand dedicated GPU deployments are billed per second: $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200.
- Fine-tuning is billed per million training tokens, starting at $0.50/M for models up to 16B parameters, with LoRA fine-tuned models served at base-model inference prices.
- Audio transcription (Whisper) is priced from $0.0009–$0.0015 per audio minute.
- New accounts receive $1 in free starter credits.
- Enterprise plans with reserved capacity and SLAs require contacting sales.
Benchmark context
#11
of 13 in AI/ML Infrastructure & LLM Tools
0.0%
AI search visibility
Sources and verification
Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.