§ feed · storyline

Live model performance metrics accessible via AI Gateway

AI Gateway adds hourly-updated P50 latency and throughput metrics across hundreds of models and providers, sortable in the UI and accessible via the REST API.

Jan 26 · 14:00:00 · primary fetch1 sourceupdated Jan 26 · 14:00:00

AI Gateway now displays throughput and latency metrics across hundreds of models, helping you choose the right model based on live performance data. Metrics appear in three places and are updated every hour: The AI Gateway now includes sortable columns for latency and throughput. Each row displays the best P50 metrics (lowest latency, highest throughput) for that model across all its available providers. Metrics are updated every hour and based on live AI Gateway customer requests.model list Sort by throughput to find the fastest token generation, or by latency to find models with the quickest time-to-first-token.

On the individual model pages, you can see P50 latency and throughput for each provider that has recorded usage. This helps you compare provider performance for the same model and choose the best option for your use case. To access these pages, click on any model in the to get a more detailed view of the breakdown across all the providers that carry the model in AI Gateway. Metrics are refreshed hourly and only appear for providers with sufficient traffic.model list Here is an example for :openai/gpt-oss-120b Similar to the overall model list, you can sort by latency…

read full article on vercel.com ↗

§ sources1 publication · timeline below

vercel.comLive model performance metrics accessible via AI Gatewayprimary14:00:00