Pricing

Replicate pricing context

Human-reviewed pricing summary paired with DevTune’s public AI search visibility benchmark.

Reviewed pricing summary

  • Replicate uses pure pay-as-you-go billing with no free tier.
  • Public models are billed by the second based on GPU hardware: Nvidia T4 at $0.000225/sec ($0.81/hr), L40S at $0.000975/sec ($3.51/hr), A100 80GB at $0.001400/sec ($5.04/hr), and H100 at $0.001525/sec ($5.49/hr).
  • Multi-GPU configurations up to 8×H100 are available via committed-spend contracts.
  • Some models use per-output pricing (e.g., FLUX Schnell at $3.00/1,000 images; FLUX Dev at $0.025/image).
  • LLM models use per-token rates (e.g., DeepSeek-R1 at $3.75/million input tokens).
  • Private custom models run on dedicated hardware and accrue idle-time charges.
  • Enterprise plans add a dedicated account manager, priority support, higher GPU limits, performance SLAs, and volume discounts.

Benchmark context

#10

of 10 in LLM Inference & Serverless GPU

0.0%

AI search visibility

Sources and verification

Pricing changes often. Treat this page as evaluation context and verify contract terms, usage limits, and add-ons against the vendor’s current materials before making a buying decision.