Pricing

Together AI pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Serverless inference is token-based and varies by model: e.g., Llama 3.3 70B at $0.88/$0.88 per 1M input/output tokens; DeepSeek-R1-0528 at $3.00/$7.00; gpt-oss-120B at $0.15/$0.60. Batch inference available at approximately 50% discount on most serverless models. Dedicated model inference: $3.99/hr (1x H100), $5.49/hr (1x H200), $9.95/hr (1x B200). GPU cluster on-demand pricing: H100 at $3.49/hr, H200 at $4.19/hr, B200 at $7.49/hr per GPU; reserved pricing discounts up to ~27% for 4-6 month commitments. Together Sandbox Code Interpreter: $0.03/session (60 min); VM compute at $0.0446/vCPU/hr and $0.0149/GiB RAM/hr. Managed Storage: $0.16/GiB/month. Fine-tuning: LoRA from $0.48/1M tokens (models up to 16B). A free-tier credit ($100 at signup, up to $50,000 via Startup Accelerator) is available with no credit card required.

together.ai together.ai together.ai together.ai together.ai docs.together.ai