Pricing

Together AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

  • Together AI uses pay-per-use pricing across all product lines.
  • Serverless inference is billed per million tokens: small models start at approximately $0.03–$0.10/1M input tokens; large reasoning models such as DeepSeek R1 cost up to $3.00/1M input and $7.00/1M output tokens.
  • Batch inference is priced at up to 50% below serverless rates.
  • Fine-tuning starts at $0.48/1M tokens for LoRA SFT on models up to 16B parameters, scaling to specialized model pricing (e.g., $10/1M for DeepSeek-class models).
  • GPU clusters are available on-demand (H100 at $3.49/hr, H200 at $4.19/hr, B200 at $7.49/hr) with reserved discounts for commitments of one week or longer (e.g., H100 from $2.55/hr at 4–6 months).
  • Dedicated inference instances start at $3.99/hr (H100).
  • Sandbox compute is billed at $0.0446/vCPU/hr and $0.0149/GiB RAM/hr.
  • Enterprise-tier, AI Factory, and GB200/GB300 cluster pricing require direct sales contact.

Benchmark context

#10

of 13 in AI/ML Infrastructure & LLM Tools

3.2%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.