Pricing

Together AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

  • Together AI uses a pay-as-you-go model with three primary pricing tiers.
  • Serverless inference is charged per million tokens, with separate input and output rates varying by model; prices range from approximately $0.05 to $7.00 per million tokens.
  • Batch inference is priced at 50% of real-time API rates for most models, with support for up to 30B enqueued tokens.
  • Fine-tuning is billed per million tokens processed during training, varying by model size and method (LoRA vs. full fine-tuning, SFT vs.
  • DPO).
  • GPU clusters are available on a pay-as-you-go hourly basis (approximately $3.49/hr for H100, $4.19/hr for H200, $7.49/hr for B200) or as reserved capacity with commitment discounts for periods over 6 days.
  • Dedicated model inference endpoints are billed per minute of usage.
  • Managed storage and sandbox environments carry additional fees.
  • Enterprise and AI Factory deployments require custom pricing via sales.

Benchmark context

#2

of 10 in LLM Inference & Serverless GPU

6.7%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.