Pricing

Fireworks AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

Fireworks AI uses a pay-as-you-go model across three main surfaces.
Serverless inference is billed per million tokens, starting at $0.10/M for models under 4B parameters; cached input tokens and batch inference are both available at 50% off standard serverless rates.
On-demand dedicated GPU deployments are billed per second: $2.90/hr for A100 80GB, $6.00/hr for H100/H200, and $9.00/hr for B200.
Fine-tuning is billed per million training tokens, starting at $0.50/M for models up to 16B parameters, with LoRA fine-tuned models served at base-model inference prices.
Audio transcription (Whisper) is priced from $0.0009–$0.0015 per audio minute.
New accounts receive $1 in free starter credits.
Enterprise plans with reserved capacity and SLAs require contacting sales.

View full AI visibility report Compare alternatives

Benchmark context

#11

of 13 in AI/ML Infrastructure & LLM Tools

0.0%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.

fireworks.ai docs.fireworks.ai fireworks.ai businesswire.com sacra.com fireworks.ai