Pricing

Fireworks AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Fireworks AI uses a pay-as-you-go model across three main surfaces. Serverless inference is billed per million tokens, starting at $0.10/M for models under 4B parameters; cached input tokens and batch inference are both available at 50% off standard serverless rates. On-demand dedicated GPU deployments are billed per second: $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200. Fine-tuning is billed per million training tokens, starting at $0.50/M for models up to 16B parameters, with LoRA fine-tuned models served at base-model inference prices. Audio transcription (Whisper) is priced from $0.0009–$0.0015 per audio minute. New accounts receive $1 in free starter credits. Enterprise plans with reserved capacity and SLAs require contacting sales.