Open Source Strategy

The project's open-source posture is structural, not promotional. Specific licensing choices are deliberate and load-bearing. This document explains what's open, what's not, what license each piece carries, and why.

For the repository structure these licenses apply to, see REPOSITORIES.md. For the legal implications, see LEGAL.md.

The general principle

Open everything that builds trust and capability. Close only what protects the community from manipulation.

The default is open. Closing requires justification.

The justifications that have been accepted, and the only ones currently in force, are:

Opening this would let bad actors game the system (moderation tooling, reputation engine, abuse signals)
Opening this would violate user privacy (internal analytics, contributor risk scoring)

All other components — schemas, engine, dataset, scrapers, website, contributor app, docs — are open.

License choices, by component

Dataset: CC-BY 4.0

The dataset is the project's most-shared asset. It must be reusable by everyone — commercial, non-commercial, academic, journalistic, AI training, government.

Why CC-BY:

Permits commercial use (required by the mission; if the data were non-commercial, AI labs and commercial integrations couldn't ground their products on it)
Requires attribution (preserves credit and helps the project become known to the downstream consumer's users)
Well-understood by lawyers internationally
Compatible with the major open-data licenses (it can be combined with ODbL, ODC-BY, and other open-data sets)

Why not CC0:

CC0 removes the attribution requirement. The project relies on attribution as the cultural mechanism that makes the work visible. Without attribution, the work fades; with it, downstream consumers gradually learn about the project and become contributors, funders, or commercial customers.

Why not CC-BY-SA:

Share-Alike (copyleft for data) sounds appealing — forces downstream open data — but in practice, it deters commercial adoption. AI labs in particular are nervous about CC-BY-SA because the contagion implications are ambiguous. Some other projects have lost adoption by choosing CC-BY-SA over CC-BY.

Why not ODbL:

The Open Database License has stronger Share-Alike requirements for derivative databases. Same problem as CC-BY-SA but more so. OpenStreetMap uses it; OSM also fights with downstream consumers about compliance regularly. The project chooses friction-free adoption over copyleft propagation for the dataset.

Schemas: MIT

The schemas need maximum permissiveness. Anyone modeling air-travel data should be able to adopt them with zero legal review.

Why MIT:

Industry-standard for libraries and specifications
Trivially compatible with closed-source consumers
No copyleft surface that scares enterprise legal teams
The schemas alone have no commercial value; permissiveness loses nothing

Why not Apache 2.0:

Apache adds patent grants and contributor license terms. These are valuable for code but mostly unnecessary for schemas. MIT is shorter and the existing community trust around it is higher for spec-style projects.

Why not public-domain dedication:

Some jurisdictions don't permit waiving copyright. MIT is functionally equivalent in result and works everywhere.

Rules engine: MIT

Same reasoning as schemas. The engine's value comes from universal adoption, not from copyleft.

A specific consideration: the engine is the most likely target for incorporation into commercial closed-source products (claims companies, insurance underwriters, AI grounding pipelines). Making this frictionless is a feature, not a bug. The audited public version becomes the reference; the closed integrations are leaf consumers that the project doesn't need to control.

Scrapers: MIT

The scrapers are community-maintained and frequently break (when airlines change their websites). Making them MIT lets contributors fork, fix, and contribute back without legal overhead.

A consideration: scrapers contain encoded knowledge about specific airline websites. If a scraper is forked into a closed-source project, the project loses some of that knowledge to the closed fork. The community-maintenance dynamic accepts this trade-off: most contributors prefer to push fixes upstream, and the few who don't would not contribute anyway.

Website: AGPL-3.0

The reference website is the only copyleft component. The reasoning is precise.

Why AGPL:

Prevents a parasitic fork — someone takes the codebase, runs a competing service with the project's own technology, contributes nothing back
Ensures that if a fork ships improvements, those improvements come back to the project (or remain available to the public)
Strong enough copyleft to be meaningful, well-understood enough to not be exotic

The trade-off:

Companies cannot run a proprietary internal fork without legal complications
This is intentional. Companies wanting to integrate flighthelp use the API. The source code of the reference site is for the open community.

Why not GPL-3.0:

The AGPL closes the "network use" loophole. GPL-3.0 would let someone run a modified version as a web service without distributing source code. AGPL closes that.

Why not MIT for the website:

MIT on the website would permit anyone to fork it into a closed-source competitor running their own commercial service. The project's main visible surface would lose its open status. AGPL prevents this.

Contributor PWA: AGPL-3.0

Same reasoning as the website. It's part of the same monorepo.

Long-form documentation: CC-BY 4.0

Documentation can be reused, translated, and cited. CC-BY preserves attribution.

Governance documents: CC0

The governance documents (bylaws, code of conduct, conflict-of-interest policy, succession plan) are released under CC0 — true public-domain dedication.

Why CC0:

The project benefits from other public-interest projects adopting good governance. CC0 lowers the bar to copying these documents.
The governance docs have no commercial value; there is nothing to protect.
This is what Software Freedom Conservancy, Open Knowledge Foundation, and other infrastructure-aligned organizations do with their internal documents.

Case-law archive: CC-BY 4.0 (where permitted)

Court rulings and regulator opinions are typically public-domain in their underlying form. The project's curation (metadata, summaries, citation links) is licensed CC-BY 4.0. The underlying documents retain their original status.

Closed components

Three components are deliberately not open. The same publication of that fact is part of the openness posture — the project does not hide the existence of closed code, only the contents.

Moderation tooling

The internal queue interface, the anti-spam heuristics, the IP and device fingerprint clustering, the abuse-signal scoring, the auto-approval thresholds.

Why closed: opening this gives bad actors a runbook. They can:

Calibrate spam campaigns to fly under the auto-approval thresholds
Mask their device fingerprints to evade cluster detection
Time their submissions to dodge rate limits
Game the new-contributor scrutiny period

The high-level approach (graduated trust, public reason codes, audit log) is documented in MODERATION.md. The implementation details are private.

Reputation engine

The exact weights and decay functions used to compute contributor reputation scores.

Why closed: opening this makes reputation farming trivial. A bad actor with the exact formula can:

Submit the cheapest-weighted edits at maximum cadence
Time their activity to maximize recency-weighted decay
Coordinate with collaborators to optimize for the formula

The general structure (what inputs matter, what time-decay applies) is documented in CONTRIBUTORS.md. The specific numbers are not.

Internal analytics

The page-view counts, search queries, error rates, conversion through verification queues, individual contributor activity timelines.

Why closed: opening this would create:

Privacy risk for contributors (individual activity timelines)
Information asymmetry exploitable by bad actors (which pages are most visited; which scenarios are most queried)
A target for surveillance (legal demands for traffic data)

The aggregate statistics (total visits, total edits, total contributors) are published quarterly in the transparency report. The raw analytics are not.

License selection process

When adding a new component, the project chooses a license based on:

What is this for? Infrastructure (engine, schema, library) → MIT. Application (site, PWA) → AGPL. Data → CC-BY. Documentation → CC-BY. Governance → CC0.
Who needs to use this? If the answer includes commercial closed-source products, the license must permit that. If only open-source forks should exist, copyleft applies.
What's the worst-case fork? If someone running a closed competitor with this code would be devastating, copyleft. If a closed competitor is merely possible but the project's strengths (community, freshness, trust) make it unlikely to be successful, permissive.

The decision is documented in the PR proposing the new component. License changes within a component (e.g., relicensing from MIT to Apache) require board approval.

Compatibility map

The licenses interact in specific ways:

CC-BY data + MIT engine + AGPL site: The site (AGPL) can use the engine (MIT) and the data (CC-BY). Downstream consumers using the site must comply with AGPL; downstream consumers using only the engine and data don't.
MIT engine in a closed commercial product: Permitted. The product must retain the MIT notice but doesn't need to be open.
CC-BY data in a closed commercial product: Permitted with attribution.
CC-BY-SA photos in a closed commercial product: Not permitted; the closed product would need to license itself under CC-BY-SA, which generally isn't viable.

This is intentional. Photos are the only contribution type the project asks for CC-BY-SA on, because they carry stronger individual moral-rights interests, and contributors expect their personal photos to not be folded into closed products.

What gets published when

New code is open from the first commit. There is no internal-then-public step. The repos are created public and stay public.

Exceptions:

Security patches developed under embargo are kept private until coordinated disclosure
Pre-release version of governance changes are discussed in board sessions before public proposal
New regime additions to the rules engine are typically developed in a public draft branch with full visibility, but the merge timing is announced rather than continuous

When license changes happen

The project commits to:

Not changing existing licenses to be more restrictive
Adding optional permissive dual-licenses when a downstream consumer needs them and the change doesn't compromise the open community

For example, the project would consider offering Apache 2.0 as a dual-license on the engine if a specific enterprise needed the patent grant. The project would not relicense the engine away from MIT or remove the MIT option.

This commitment is in the bylaws and cannot be reversed by a simple board vote. The reasoning: downstream consumers must be able to depend on the licenses they integrated against. A bait-and-switch licensing change would destroy the trust the project depends on.