Testing if "bash is all you need"
Braintrust tests bash-versus-SQL agent approaches on GitHub issue datasets, finding SQL achieves 100% accuracy against bash's 53% while using far fewer tokens and costing 6.5x less.
We invited from to share how they tested the "bash is all you need" hypothesis for AI agents.Ankur GoyalBraintrust There's a growing conviction in the AI community that filesystems and bash are the optimal abstraction for AI agents. The logic makes sense: LLMs have been extensively trained on code, terminals, and file navigation, so you should be able to give your agent a shell and let it work. Even non-coding agents may benefit from this approach. Vercel's recent post on showed this by mapping sales calls, support tickets, and other structured data onto the filesystem. The agent greps for relevant sections, pulls what it needs, and builds context on demand.building agents with filesystems and bash But there's an alternative view worth testing.
Filesystems may be the right abstraction for exploring and retrieving context, but what about querying structured data? We to find out.built an eval harness We tasked agents with querying a dataset of GitHub issues and pull requests. This type of semi-structured data mirrors real-world use cases like customer support tickets or sales call transcripts. Question complexity ranged from: Three agent approaches competed: Each agent received the…
- vercel.comTesting if "bash is all you need"primary