
The web data layer for Yalc workflows. MCP server plus 6 language SDKs plus 500 free pages a month. There's no reason not to use it.
Firecrawl turns the web into structured data Claude can read. Four core verbs: scrape (one URL to clean markdown or JSON), crawl (follow links across a whole site), search (find relevant pages by query), and interact (click, scroll, navigate JavaScript heavy sites). Output is markdown by default, optimized for LLM context windows, with optional structured JSON via schemas.
For Yalc workflows, Firecrawl is the canonical web intake. When a prompt says "look up this competitor's pricing," "extract structured data from these 50 vendor pages," "monitor changes on this product page," or "search the web for fintech news this week," Firecrawl is the layer that handles the wire. Yalc's job is upstream (what to fetch) and downstream (what to do with the result). Firecrawl handles the actual fetching, JS rendering, anti bot, and parsing.
Firecrawl sits at the **intake** node for any web sourced data. It complements Crustdata: where Crustdata gives you structured B2B databases, Firecrawl gives you whatever else (vendor pricing pages, product changelogs, competitor blogs, public company sites, news pages).
The web intake node. Yalc invokes Firecrawl when the answer lives on a public web page rather than in a database. Output flows downstream into whatever the workflow needs (Notion writeback, Claude analysis, comparison report).
Copy paste prompts for Claude Code that invoke Firecrawl.
Yalc has Firecrawl integration via Claude's native HTTP tool plus a first party `web-browsing` skill that wraps the four core verbs. The Firecrawl MCP server is also registered, which means Claude can call Firecrawl directly during a Yalc session as a native tool.
✓ Yalc skill available. View on GitHub.Firecrawl runs a real free tier (500 pages) which is enough to build and validate any Yalc scraping workflow before paying anything. Paid tiers (Hobby, Standard, Growth) escalate based on monthly page volume and concurrency. Annual billing offers two months free.
The product is open source (100,000 plus GitHub stars), so for self hosted use the only cost is your own infrastructure. The hosted offering is what most Yalc workflows use because it handles the messy parts (anti bot, JS rendering, caching, scaling) so you don't.
500 pages a month. Right for piloting and low volume scraping.
Volume tiers. Higher tiers add concurrency and faster crawling.
Open source. Run it yourself. Right when data sensitivity prohibits third party scraping.
Cheaper at small to mid scale unless your data is unusually sensitive (then self host the open source). The product handles JS rendering, anti bot, retries, and caching. Building and maintaining all of that yourself is real work. Firecrawl is the right buy versus build call for most teams.
Yes. Firecrawl renders JS by default, with smart wait that intelligently times the load. SPAs and React apps work without manual configuration.
For LinkedIn, no. LinkedIn aggressively blocks general scrapers. Use Unipile (API based) instead. For Reddit, technically yes, but use Apify's Reddit actors for production volume because they're battle tested.
500 pages a month. No card required. Genuinely enough to build a working Yalc workflow and validate before you pay anything. The pricing for paid tiers is reasonable when you actually need volume.
Pass a JSON schema to the scrape endpoint. Firecrawl runs the page through an LLM with the schema, returns structured JSON. Works most of the time. Complex schemas may need a few iterations.
Same core engine, open source under MIT. Self hosting requires you to manage the infra (browsers, queues, scaling). The hosted version is the convenient option for most teams.
Open source. Your data on your machine.