Pricing

Together AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

  • Serverless inference is token-based and varies by model: e.g., Llama 3.3 70B at $0.88/$0.88 per 1M input/output tokens; DeepSeek-R1-0528 at $3.00/$7.00; gpt-oss-120B at $0.15/$0.60.
  • Batch inference available at approximately 50% discount on most serverless models.
  • Dedicated model inference: $3.99/hr (1x H100), $5.49/hr (1x H200), $9.95/hr (1x B200).
  • GPU cluster on-demand pricing: H100 at $3.49/hr, H200 at $4.19/hr, B200 at $7.49/hr per GPU; reserved pricing discounts up to ~27% for 4-6 month commitments.
  • Together Sandbox Code Interpreter: $0.03/session (60 min); VM compute at $0.0446/vCPU/hr and $0.0149/GiB RAM/hr.
  • Managed Storage: $0.16/GiB/month.
  • Fine-tuning: LoRA from $0.48/1M tokens (models up to 16B).
  • A free-tier credit ($100 at signup, up to $50,000 via Startup Accelerator) is available with no credit card required.

Benchmark context

#6

of 10 in AI Code Sandboxes & Agent Runtimes

3.2%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.