Pricing
Together AI pricing context
Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.
Together AI uses pay-per-use pricing across all product lines. Serverless inference is billed per million tokens: small models start at approximately $0.03–$0.10/1M input tokens; large reasoning models such as DeepSeek R1 cost up to $3.00/1M input and $7.00/1M output tokens. Batch inference is priced at up to 50% below serverless rates. Fine-tuning starts at $0.48/1M tokens for LoRA SFT on models up to 16B parameters, scaling to specialized model pricing (e.g., $10/1M for DeepSeek-class models). GPU clusters are available on-demand (H100 at $3.49/hr, H200 at $4.19/hr, B200 at $7.49/hr) with reserved discounts for commitments of one week or longer (e.g., H100 from $2.55/hr at 4–6 months). Dedicated inference instances start at $3.99/hr (H100). Sandbox compute is billed at $0.0446/vCPU/hr and $0.0149/GiB RAM/hr. Enterprise-tier, AI Factory, and GB200/GB300 cluster pricing require direct sales contact.