§ agents · storyline

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Forge, an open-source reliability layer for local LLM tool-calling, raises an 8B model's agentic task success rate from 53% to 99.3% by wrapping it with guardrails, retry logic, and context management.

May 19 · 14:23:07 · primary fetch1 sourceupdated May 19 · 14:23:07

Hi HN, I'm Antoine Zambelli, AI Director at Texas Instruments.I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.What it does:- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it- Ships with an eval harness and interactive dashboard so you can reproduce every numberI wanted to run a handful of always-on agentic systems for my portfolio, didn't want to pay cloud frontier costs, and immediately hit the compounding math problem on local models.

90% per-step accuracy sounds great, but with a 5-step workflow that's a 40% failure rate. No existing framework seemed to address this mechanical reliability issue - they all seemed tailor-made for cloud frontier.Demo video: https://youtu.be/MzRgJoJAXGc (side-by-side: same model, same task, with and without Forge guardrails)The paper (accepted to ACM CAIS '26, presenting May 26-29 in San Jose) covers the peer-reviewed findings across 97…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comShow HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasksprimary14:23:07