Prospecting · Claude Code skill

Apify Reddit Scraping skill and the Yalc Framework

The single source of truth for Reddit scraping in Yalc workflows. Knows the working actor IDs, the broken paths, and the per-client spend caps.

Yalc Fit Score
9/10
License
MIT (Yalc)
Actor
oAuCIx3ItNrs2okjQ
Budget
Per-client cap
Last reviewed
2026-04-29
Trigger phrases

Say this to fire the Apify Reddit Scraping skill

Any of these natural language phrases activates the skill inside Claude Code.

scraping Reddit
fetching Reddit posts
running the Apify actor
polling /new/
scraping /hot/
Reddit keyword search
What it does

Apify Reddit Scraping, plainly

The Apify Reddit Scraping skill is the canonical Yalc wrapper for Reddit data collection via Apify. It encapsulates the learned Earleads playbook: which actor IDs work (`oAuCIx3ItNrs2okjQ`), which paths return clean data (`/new/`, `/hot/`), which fail silently (keyword search on this actor is broken), and how to enforce per-client spend caps (logs at `logs/apify_spend_{agent}.json`).

For any Yalc workflow that touches Reddit (monitoring, repurposing, content discovery), this skill is the gateway. Direct Apify API calls bypass the playbook and risk burning budget or shipping bad data. The skill is the abstraction.

Where it slots in

Position in the GTM operating system

Intake
Enrich
Score
Route
Draft
Send
Listen

The Apify Reddit Scraping skill sits at the **intake** node for any Reddit-sourced data. It powers the daily Reddit monitoring agents (per client), the repurposing agents, and on-demand thread discovery.

The skill enforces per-client budget caps. Apify charges per compute unit; Earleads' rule of thumb is $2/day per monitoring client and $0.50-$1.50/day per community agent. Without the skill, an unbounded actor run can rack up significant charges fast.

The Yalc Framework

Running the Apify Reddit Scraping skill end to end

Workflow position

The Reddit data intake. Yalc invokes this skill whenever a workflow needs Reddit posts, comments, or subreddit feeds. Output is structured JSON ready for downstream classification (sentiment, content fit) or writeback (Notion).

Required inputs

  • → Subreddit names to scrape (each as a separate run; maxItems is global, not per-sub)
  • → [object Object]
  • → Per-run budget cap and item limit

Outputs

  • → JSON of Reddit posts with author, body, score, comment count, timestamp
  • → Spend log entry at logs/apify_spend_{agent}.json
  • → Error log if the actor failed or hit rate limits

Chaining recommendations

Upstreamlaunchd schedule or Yalc prompt → apify-reddit-scraping
DownstreamReddit JSON → reddit-thread-writer (content) or sentiment-analysis (intel) or notion-page-writer (writeback)

Anti patterns to avoid

Don't use keyword search on actor `oAuCIx3ItNrs2okjQ`. It's broken (returns garbage). Only `/new/` and `/hot/` feeds work reliably.
Don't scrape multiple subreddits in one actor run. maxItems is global across all startUrls. Run each subreddit separately to get accurate counts.
Don't skip the budget pre-flight check. The skill estimates cost before running. Always honor the estimate; production agents have learned the hard way.
Operator take

Pros, cons, who it's for

Pros

  • Encapsulates Earleads' learned Reddit scraping playbook
  • Per-client budget caps prevent runaway charges
  • Knows which actor paths work and which don't
  • Outputs clean JSON ready for downstream processing
  • Used in production by 8+ daily monitoring agents

Cons

  • Specific to actor `oAuCIx3ItNrs2okjQ`. Other actors require different inputs.
  • Keyword search is unsupported (actor limitation, not skill limitation)
  • Apify pricing fluctuates with actor compute units. Budget caps require occasional adjustment.
  • Requires existing per-client agent config files for full automation

Who it's for

  • Reddit GEO operators running daily monitoring or repurposing
  • Earleads agents (8+ active in production)
  • Any Yalc workflow that needs Reddit data at production volume
Dependencies

What this skill expects to find

MCP servers

Environment variables

The skill assumes per-client spend logs exist under `logs/apify_spend_{agent}.json`. If a new client doesn't have a log yet, the skill creates one on first run. Daily caps are configured in the per-client agent YAML.

Related

The Apify Reddit Scraping ecosystem inside Yalc

Alternatives

Skills that overlap

FAQ

Frequently asked

Why this specific actor (oAuCIx3ItNrs2okjQ)?

Earleads tested several Reddit actors. This one returns the cleanest JSON, has the most stable rate limits, and is actively maintained. Other actors either have stale data or unstable schemas.

How are budget caps enforced?

Before each actor run, the skill reads the per-client spend log and estimates the cost of the planned scrape. If the day's spend would exceed the cap, the run is aborted with a clear error message.

What's the recommended scrape cadence per subreddit?

For monitoring (catch new posts), run /new/ every 1 to 4 hours. For trending content, run /hot/ once per day. Don't poll more frequently than 1 hour without a strong reason.

How does the skill handle Reddit rate limits?

Rate limits surface as 429 errors from Apify. The skill catches these, waits 60 seconds, and retries once. If the second attempt fails, the run is logged and the schedule moves on.

Why doesn't keyword search work on this actor?

Verified empirically in the Earleads playbook. The actor's keyword search returns inconsistent or empty results. Use subreddit feed scraping instead and filter results client-side via Claude.

Can I scrape comments, not just posts?

Yes via a different actor input mode. The skill exposes a `--include-comments` flag. Note that comment scraping multiplies cost (often 5x to 10x the post scrape) so use sparingly.

Get the Apify Reddit Scraping skill

Clone the Yalc skill set, drop in your env, run from your next Claude Code session.

gh repo clone Othmane-Khadri/YALC-the-GTM-operating-system && cp -r YALC-the-GTM-operating-system/.claude/skills/apify-reddit-scraping ./.claude/skills/