Pricing

Fireworks AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

  • Fireworks AI uses a usage-based, pay-as-you-go model with no required subscription.
  • Serverless inference starts at $0.10/1M tokens for models under 4B parameters, $0.20/1M for 4B–16B, $0.90/1M for models over 16B, and model-specific rates for frontier models (e.g., DeepSeek V3 family at $0.56 input/$1.68 output per 1M tokens).
  • Batch inference is priced at 50% of serverless rates; cached input tokens at 50%.
  • On-demand GPU deployments are billed per second: H100 and H200 at $7/hr, B200 at $10/hr, B300 at $12/hr.
  • Fine-tuning via LoRA SFT starts at $0.50/1M training tokens for models up to 16B parameters; full-parameter SFT from $1.00/1M.
  • Reinforcement fine-tuning is billed at the same per-GPU-second rate as on-demand deployment.
  • New accounts receive $1 in free starter credits.
  • Enterprise pricing is available via direct contract.

Benchmark context

#8

of 10 in LLM Inference & Serverless GPU

0.0%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.