Methodology
How we detect, classify, and publish South African disruption events.
1. Data sources
We aggregate from public sources only: union press feeds, political party news channels, and established South African news outlets. We do not scrape sites whose terms of service prohibit it. Every event links back to its source.
2. Detection pipeline
- Scrapers run every 15 minutes against each source
- Articles are keyword-filtered for disruption signals (strike, protest, march, blockade, etc.)
- A first-pass AI classifier (triage) decides whether each candidate describes a real, concrete event
- Approved candidates pass to a second AI extractor that pulls structured details: type, severity, location, date, organiser
- Location is geocoded to lat/lng + suburb via Google Geocoding API
- High-confidence, high-severity events publish automatically; uncertain ones go to human moderation
3. Severity scale
- Low — small picket, single workplace, no road impact
- Medium — planned demonstration with road or service impact in a single suburb
- High — regional shutdown, multiple sites, major highway risk, taxi strike
- Critical — nationwide stayaway, riot conditions reported, multiple injuries
4. Accuracy
Each AI extraction produces a confidence score (0.0–1.0). Events below 0.6 confidence are routed to manual review rather than auto-published. Our target is under 10% false positives on auto-published events; we audit weekly and tighten prompts when drift appears.
5. POPIA & ethical guardrails
- No personal information of individuals is stored
- We never name private individuals — only public officials in their official capacity
- All copy is politically neutral; the classifier is explicitly prompted to reject partisan framing
- Removal requests via privacy@disruption.co.za are processed within 7 days
6. Update cadence
Scrapers run on a 15-minute rotation; AI classification runs every 10 minutes; the public site rebuilds pages every 5 minutes via Next.js Incremental Static Regeneration. End-to-end, a new event typically reaches this page within 30 minutes of being reported by a primary source.