Today I discovered…
Browserbase Stagehand
A library/framework to build AI-powered browser automation on top of Playwright. Can work with or without Computer Use Models (CUA). Technically Stagehand project handles only the DOM manipulation and state management, and Browserbase handles the core browser automation and interaction capabilities. I am looking at the combo as a single project here.
💖 What I like about Stagehand:
Easy to get started and maintain: Thanks to intuitive API structure and excellent docs, it was ridiculously easy to perform browser actions and extract content.
await page.goto(“https://maps.google.com“);
await page.act("Type 'Turkish food in Delhi");await page.extract(….)
API is also impressive, can’t recall a single instance when it failed me.
extractFine control vs one-shot: It supports both atomic operations (visit this site, click here, extract that) via its `goto`, `act`, `extract` methods, as well as high-level goals (get me this info) via its Operator agent template or Computer Use model. I prefer to use Operator for exploratory work and then write specific atomic operations code for more reliability and performance during automation in production.
Cheap atomic operations: It costed only $0.001 (7k tokens of gpt-4o-mini) for a 3 atomic steps automation (extracting info from Google Map).
👎 What I dislike about Stagehand:
Expensive autonomous high-level goal executions by the Operator agent: It costed $0.07 (500k+ tokens of gpt-4o-mini), 70x costly than the same atomic operations
Doesn’t support open LLM models yet. I hope it does so soon.
Because of its cost, it is not a practical replacement of automated frontend testing use case (usually done via playwright). But the support for Open LLM models (ideally via Ollama) can change that.
Overall, I like the choices the project owners have made for the project, it aligns with my mental model. Keeping it simple and handy for quick production-grade implementations. Naturally, I found myself sticking to this one more compared to other similar projects. I will continue using it for use cases such as scraping and automating repeated tasks. And when the support for Open LLM models is there, I would try it for the automated testing as well.
⭐ Ratings and metrics
Based on my experience, I would rate this project as following
Production readiness: 9/10
Docs rating: 9/10
Time to POC(proof of concept): less than an hour
Author: Sean McGuire @mcguiresean_, Anirudh Kamath @kamathematic, Jeremy Press @jeremypress
Demo | Source
🛡 License: MIT
Tech Stack: Typescript, JavaScript
Note: In my trials, I always build the project from the source code to make sure that I test what I see on GitHub. Not the docker build, not the hosted version.
If you discovered an interesting Open-Source project and want me to feature it in the newsletter, get in touch with me.
To support this newsletter and Open-Source authors, follow #OpenSourceDiscovery on LinkedIn and Twitter