Methodology · Cohort

What we scrape

Cohort runs a weekly listening pipeline across 19 peptide and GLP-1 subreddits. The current corpus is 8,322 posts and 12,256 comments — roughly 20,500 records, spanning r/Wegovy, r/Mounjaro, r/Ozempic, r/Zepbound, r/Semaglutide, r/Retatrutide, r/tirzepatidecompound, r/PCOS, r/MaintenancePhase, r/MounjaroMaintenance, r/loseit, r/PeptideStacks, r/peptides, r/researchcompounds, r/retatrutidetalk, r/bodyhackguide, r/glp1, r/OzempicForWeightLoss, and r/WegovyWeightLoss.

We use Reddit's public Atom/RSS feeds at /r/<sub>/top.rss?t=<window>. No OAuth, no scraping that breaks rate limits, no private data. Three time windows per sub (month, year, all-time) deduplicated by Reddit post ID. Each run yields ~200–300 unique posts per sub.

What we analyze

The raw posts get sent through Anthropic's Claude with structured prompts that extract: the symptom catalog with frequency counts, the top questions the community is asking, the recurring confusion topics, the weight-loss reality check, and the notable threads worth surfacing. Then we walk back through the comment threads beneath the highest-signal posts to capture the peer corrections and the inter-thread arguments — the stuff that doesn't make it into post bodies.

We hash author IDs before sending anything to Claude. The analyzer's system prompt explicitly forbids including usernames in any output. Quotes are paraphrased to ≤8 words. Nothing identifies a Redditor by name.

What we publish

Three things, in this order:

Q&A pages at /questions — short, direct answers to the questions the cohort actually asks. Each one is anchored on N posts across N subs.
Pillar guides — long-form on the highest-leverage topics: the off-ramp, the dopamine displacement story, eating-enough math, the constipation playbook.
Reddit megaposts — Field Reports that go back to the source subreddits with a synthesis of what the community itself has surfaced. The format is data-led, cites specific posts, and ends in a DM rather than a public pitch.

What we'd send you

If you DM'd asking for the raw scrape + analysis templates, here's the short version: scripts live in scripts/reddit/ in the Cohort repo. The pipeline is six scripts:

scrape.ts — RSS scraper, deduplicated across time windows
scrape-comments.ts — sub-level top comments
scrape-post-comments.ts — per-post threads
analyze-by-sub.ts — per-sub markdown report via Claude
analyze-thread-comments.ts — per-thread synthesis
extract-seo-questions.ts — ranked content backlog (no Claude needed, pure markdown parsing)

A full weekly run is ~$0.30 in API costs and ~75 minutes of wall-clock time. Affordable enough to run weekly without thought.

What we won't do

Publish identifiable user content. Hashed before send, paraphrased on output.
Sell the corpus or any analysis derived from it. This is editorial input for Cohort, not a data product.
Make medical claims. We synthesize what the cohort reports. We do not prescribe.
Trust LLM output blindly. Every published synthesis has been read by a human before it goes out.

How we listen to the cohort.

What we scrape

What we analyze

What we publish

What we'd send you

What we won't do

If you're on a GLP-1 yourself.