Pricing
Together AI pricing context
Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.
Reviewed pricing summary
- Serverless inference is token-based and varies by model: e.g., Llama 3.3 70B at $0.88/$0.88 per 1M input/output tokens; DeepSeek-R1-0528 at $3.00/$7.00; gpt-oss-120B at $0.15/$0.60.
- Batch inference available at approximately 50% discount on most serverless models.
- Dedicated model inference: $3.99/hr (1x H100), $5.49/hr (1x H200), $9.95/hr (1x B200).
- GPU cluster on-demand pricing: H100 at $3.49/hr, H200 at $4.19/hr, B200 at $7.49/hr per GPU; reserved pricing discounts up to ~27% for 4-6 month commitments.
- Together Sandbox Code Interpreter: $0.03/session (60 min); VM compute at $0.0446/vCPU/hr and $0.0149/GiB RAM/hr.
- Managed Storage: $0.16/GiB/month.
- Fine-tuning: LoRA from $0.48/1M tokens (models up to 16B).
- A free-tier credit ($100 at signup, up to $50,000 via Startup Accelerator) is available with no credit card required.
Benchmark context
#6
of 10 in AI Code Sandboxes & Agent Runtimes
3.2%
AI search visibility
Sources and verification
Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.