Operations
How the project runs day-to-day. The automated work the system performs continuously, the human work that happens around it, and the external surfaces the project maintains.
For who does the work, see GOVERNANCE.md. For how the money flows, see ECONOMICS.md. For the moderation system, see MODERATION.md.
Automated jobs
The system runs continuously without human intervention for the bulk of its work. Failures page the on-call.
Continuous (every minute)
- API request serving. The primary workload. Cloudflare in front, Postgres behind.
- Webhook delivery. Outgoing webhook calls to subscribed consumers, with retry queue.
- Edit auto-approval. New submissions matching auto-approval criteria are processed within seconds of arrival.
Hourly
- High-priority scrapers. Top-50 airlines and top-30 airports get checked every hour for changes to baggage policies, fee tables, and contact information. The scrapers run in parallel from Fly.io containers with rate limiting per origin.
- Cache warming. Pages that have been touched in the last hour are re-rendered and pushed to the edge cache.
- Search index update. Meilisearch index is refreshed with edits from the previous hour.
Every 6 hours
- Medium-priority scrapers. Airlines 51–200 and airports 31–150.
- Stale-data detection. Any field with a last-verified date older than its category's freshness threshold is flagged for verification. Fees: 90 days. Contacts: 60 days. Baggage rules: 180 days. The flagged items appear in contributors' verification queues.
- Provenance score recompute. The confidence_score on every fact is recalculated based on age, verifier count, and source quality.
- Sitemap regeneration. XML sitemaps for the consumer site, with priority and lastmod values.
Daily (00:00 UTC)
- All other scrapers. Long-tail airlines and airports. Roughly 400 scrapers run with rate limiting; the full cycle takes 6–8 hours.
- Full dataset snapshot. A tar.gz of the entire dataset is committed to
flighthelp/dataand uploaded to R2 for bulk download. - Reputation recompute. All contributor reputation scores updated based on the day's activity.
- Badge evaluation. New badges awarded to contributors who crossed thresholds.
- Edit log compaction. The detailed edit log gets a daily summary published to the activity feed.
- Database backup. Postgres dump to encrypted S3, with 30-day rolling retention plus monthly archives indefinitely.
- Translation freshness check. For each translated string, flag those whose source text has changed since the translation date.
Weekly (Sunday 02:00 UTC)
- Newsletter draft. Curated digest of the week's most-significant changes — major airline policy updates, new scenarios, notable disputes, new contributor highlights. Sent to subscribers Monday morning.
- Top contributor recognition. Sunday-of-week notifications to contributors who made the top of any leaderboard.
- Moderator queue audit. Old queue items get extra attention; moderator workload distribution is reviewed.
- API consumer health check. Webhook subscribers with failing deliveries are notified to fix or disable.
Monthly
- Public statistics report. Edits processed, scenarios added, contributors active, top airlines by edit volume, top airports — published to a stats page.
- Moderation transparency report. Aggregate moderation statistics: edits processed, approval rates, rejection codes distribution, dispute resolutions, bans.
- Translation push. Strings sent to Crowdin/Weblate for community translation; completed translations pulled in.
- Scraper health audit. Scrapers that have failed more than 3 times in the past month are surfaced for community fix.
Quarterly
- Transparency report. Financial statements (income, expenses by category, runway), board minutes summary, governance changes, major decisions.
- Schema review. Pending schema change proposals are batched and reviewed.
- Regulatory review. Aviation lawyer contributors review pending regulation changes and notify the rules-engine maintainers.
- Roadmap update. Public roadmap updated with what was shipped and what's planned for the next quarter.
Annually
- Annual report. Long-form public document covering the year's work, finances, governance, community growth, technical changes.
- Contributor recognition cycle. Postcards and video calls for top contributors (see CONTRIBUTORS.md).
- Board election. Core team rotation according to staggered terms.
- Funding cycle. Grant applications for the following year prepared and submitted.
Human roles
The project employs as few people as possible. Most work is volunteer. Paid roles exist where the work cannot be reliably volunteered.
Paid roles (typical at maturity)
- Executive director (1). Operations, fundraising, governance facilitation, public face. The most senior paid role.
- Engineering lead (1). Technical direction, on-call rotation organizer, infrastructure decisions.
- Engineers (2–4). Maintain the core repos, build new features, respond to incidents. Hired from the contributor community where possible.
- Community manager (1). Onboarding, contributor support, dispute facilitation when moderators escalate, public communications.
- Legal counsel (fractional). General counsel on retainer; jurisdiction-specific counsel hired ad hoc when issues arise.
- Bookkeeper (fractional). Quarterly close, transparency report preparation.
Total paid headcount at maturity: ~6 FTE. Total payroll cost: ~$700K–$900K depending on location.
Volunteer roles
- Moderators (~30 at maturity)
- Translators (~100 active at any time, ~500 contributing across years)
- Scraper maintainers (~50 active)
- Trusted contributors (~500 active)
- Regular contributors (~10,000 active monthly)
- Aviation lawyers and consumer advocates (informal advisory, ~10 active)
- Core team / board (5–9)
The volunteer-to-paid ratio is intentionally extreme. Most public-infrastructure projects fail when they become too dependent on paid headcount — they can't sustain the burn, and the volunteer culture withers because volunteers feel redundant. flighthelp's design keeps paid headcount in a supporting role to the community, not the other way around.
On-call
A small rotation of paid engineers (typically 3–4 people) handles after-hours incidents. The on-call:
- Responds to PagerDuty alerts
- Triages incidents
- Communicates on the status page
- Decides whether to wake other team members
What pages:
- API uptime below 99% over 10-minute window
- Webhook delivery error rate above 5%
- Database replication lag above 60 seconds
- Cloudflare edge errors above 1%
- Security alerts (suspicious access patterns, credential exposure)
What doesn't page (but does notify):
- Individual scraper failures (community fixes these on their own time)
- Edit queue backlog (moderators handle on their schedule)
- Low-priority content issues
The on-call rotation is paid extra on top of base salary for the inconvenience. The project does not pretend that 2 AM incidents are "part of the job."
Communications surface
The project maintains several outbound channels.
Status page (status.flighthelp.net)
Real-time uptime for the API, the website, the database, and the search index. Incident history. Scheduled maintenance windows. Major incidents get a public post-mortem within 72 hours.
Newsletter
Weekly digest published Monday morning. Subscribed via email. Contents: significant data changes from the week, new scenarios published, contributor highlights, governance updates, upcoming events. Roughly 1,500 words. Archived publicly.
Blog
Long-form posts on infrequent cadence (typically 1–2 per month). Topics: technical decisions, governance changes, partnership announcements, in-depth investigations into airline policy patterns, retrospectives on incidents.
Social media
A presence on Mastodon and Bluesky. The accounts post about significant updates and link to the blog. No engagement metrics drive content. The accounts do not appear on Twitter/X by default but are mirrored read-only there.
Mailing lists
announce@— low-volume, official announcements onlydiscuss@— open discussion, community-drivendev@— developer discussion for the schemas, engine, and APImoderators@— private list for moderatorscore@— private list for the core team
GitHub
Issues and PRs are the primary work surface for engineering. Discussions are used for proposals and broader community input. The roadmap is a GitHub project board, publicly visible.
Office hours
A weekly hour-long video call, open to anyone, where a rotating core team member is available for questions. Recorded and posted afterwards. The format is informal: bring a question, get an answer.
Incident response
When something breaks:
- Detect. Automated monitoring catches most issues. Community reports catch others.
- Acknowledge. On-call posts to the status page within 5 minutes of detection.
- Investigate. On-call diagnoses, brings in additional team members as needed.
- Communicate. Status updates every 30 minutes minimum until resolved.
- Resolve. Service restored.
- Post-mortem. Public write-up within 72 hours describing what happened, why, what was done, what's being done to prevent recurrence.
The post-mortem template is blameless: it focuses on system failures, not individual mistakes. Every post-mortem is archived publicly.
Data freshness as a continuous concern
The most important operational signal is data freshness, because freshness is the moat (Principle 3).
The project tracks:
- % of fact-carrying entities verified in the last 90 days — target: >70%
- % of high-traffic airline data verified in the last 30 days — target: >85%
- % of scenario pages with active maintainer — target: 100%
- Median age of contact-method success-rate samples — target: <30 days
When any of these slip below threshold, the system surfaces targeted verification tasks to contributors. If contributor capacity isn't sufficient, the core team can hire short-term contractors or run a targeted campaign.