Schemas

The schemas are the foundation of the entire project. Everything else — the rules engine, the data, the API, the reference site — is structured by them. If the schemas are right, everything above them can be built. If they are wrong, every layer above them inherits the mistake.

This document covers the philosophy of schema design, the versioning policy, the release process, and how downstream consumers should adopt them. The field-by-field specification of each entity is in DATA-MODELS.md.

What the schemas are

The schemas are JSON Schema (Draft 2020-12) definitions for every entity in the system. Each entity (Airline, Airport, BaggageRule, ContactMethod, Fee, Scenario, Regulation, Contributor, Edit, Source, Badge) gets its own schema file. From those schema files, the project generates:

  • OpenAPI 3.1 fragments (for API documentation)
  • TypeScript types (using json-schema-to-typescript)
  • Python types (Pydantic v2 models)
  • Go structs
  • PHP classes (using cuyz/valinor or equivalent)
  • Rust structs (with serde derive)
  • Java records
  • Swift Codable structs

The generation is one-way: the JSON Schema is canonical, the language bindings are downstream artifacts. A change to a generated TypeScript file is meaningless; the source of truth is the JSON Schema.

The repository is flighthelp/schema, licensed MIT.

Why JSON Schema and not Protobuf, GraphQL SDL, or OpenAPI

JSON Schema was chosen deliberately over the alternatives.

Protobuf is wire-efficient and language-portable, but it conflates serialization with type definition, requires a build step that is awkward in many ecosystems, and has a less expressive constraint language. The constraint expressivity matters: airline data has constraints (max weight 0–999 kg, IATA codes exactly two characters, currency codes from ISO 4217) that schemas should enforce.

GraphQL SDL is good for API surface design but is weaker as a data definition language. It does not handle versioning well, has limited constraint expressivity, and forces a particular API shape.

OpenAPI is for documenting APIs, not for defining data. The project uses OpenAPI for API documentation, but OpenAPI internally uses JSON Schema for its data definitions. Picking JSON Schema directly avoids the indirection.

JSON Schema wins on three axes: it is the most widely-supported constraint language across ecosystems, it is human-readable, and it composes well (the $ref system lets entities reference each other cleanly). Every modern language has at least one library that consumes JSON Schema directly. It is the closest thing to a universal data definition language that exists.

Design principles for individual schemas

Every field is documented. No field exists without a description in the JSON Schema. The description is the canonical documentation; the generated docs and language bindings inherit it.

Constraints are encoded, not commented. If a field has bounds, the bounds go in the schema (minimum, maximum, pattern, format, enum). The constraint becomes part of the validation contract. Downstream tools can rely on it.

Required fields are minimal. Only fields without which the entity is meaningless are required. Everything else is optional. This makes the schemas tolerant of partial data, which is the reality of community-verified datasets.

Enumerations are stable. If a field is an enum (alliance: star | oneworld | skyteam | none), adding a new value is a minor version bump; removing or renaming a value is a major version bump. Enums are documented with the meaning of each value.

Identifiers are typed. Fields that reference other entities use typed reference patterns (airline_id, airport_id, regulation_id) rather than raw strings. The patterns are documented and validated. This catches the most common integration bug — passing the wrong kind of ID — at validation time.

Timestamps are ISO 8601 with timezone. No naive timestamps. Every time field is format: date-time per RFC 3339.

Localizable strings are first-class. Fields that vary by locale (common_name, notes, title) use a structured localizable-string pattern: an object keyed by IETF BCP 47 language tag, with at least an en key as fallback.

Money is structured. Any field representing a price has both amount (decimal) and currency (ISO 4217). No "USD25.99 baked into a string." The schema rejects ambiguity.

Provenance is required. Every fact-carrying entity (BaggageRule, ContactMethod, Fee, Regulation) carries last_verified_at, verifier_count, and sources[]. The schema enforces this so the project cannot accidentally publish data without provenance.

Versioning policy

Strict semver, with public migration paths.

Major version bump (1.0.0 → 2.0.0) for any change that breaks downstream consumers: removing a field, renaming a field, changing a field's type, removing an enum value, adding a required field, narrowing a constraint (e.g., changing maxLength from 200 to 100).

Minor version bump (1.0.0 → 1.1.0) for backwards-compatible additions: adding an optional field, adding an enum value, adding a new entity schema, widening a constraint (e.g., changing maxLength from 100 to 200).

Patch version bump (1.0.0 → 1.0.1) for documentation changes, description clarifications, and schema-internal refactors that do not affect validation behavior.

Every release publishes:

  • A tagged git release with the new schemas
  • An updated set of language bindings (npm, PyPI, Composer, etc.)
  • A MIGRATION.md describing what changed and what consumers need to do
  • An updated CHANGELOG.md

Major versions are supported for at least 36 months after the next major version ships. During that window, security and clarification patches are backported. After 36 months, the major version is marked end-of-life with at least 6 months of warning.

Schema change process

Schema changes are governance-relevant because they affect every downstream consumer. The process:

  1. Proposal. Anyone — core team, contributor, external builder — can open an issue in flighthelp/schema proposing a change. The issue must describe the change, the motivation, the downstream impact, and at least one example use case.

  2. Discussion period. A 14-day public comment period. The proposal is visible on GitHub and is also linked from the weekly newsletter. Comments from external builders carry equal weight with internal comments.

  3. Decision. A simple majority of the core team approves or rejects the proposal. Rationale is published. If the proposal involves a major version bump, the decision additionally requires advisory approval from a quorum of trusted contributors.

  4. Implementation. Approved changes are implemented as a PR against flighthelp/schema. The PR includes the schema diff, updated language bindings, updated documentation, and MIGRATION.md updates.

  5. Release. Once merged, the new version is tagged and published to all language ecosystems within 24 hours.

For urgent schema fixes (e.g., a bug that causes validation to incorrectly reject valid data), the discussion period can be shortened to 48 hours by core team consensus, with public justification.

How downstream consumers should depend on schemas

Pin to a major version. Use ^1.0.0 (npm), ~=1.0 (pip), ^1.0 (Composer). Accept minor and patch updates automatically. Do not accept major updates without review.

Validate at the boundary. When data enters your system from the flighthelp API, validate it against the schema once at the ingestion point. Inside your system, you can trust it. This catches API version skew immediately rather than failing deep in your application logic.

Subscribe to the changelog. The flighthelp/schema repo publishes a GitHub release feed and a JSON changelog. Subscribe to either. When a major version is announced, you have 36 months to migrate.

Do not extend the schemas in incompatible ways. If you need additional fields for your application, add them in a separate type that composes the flighthelp schema, rather than modifying it. This keeps your code interoperable with other consumers.

What the schemas are not

Not an ontology. The schemas describe how data is represented, not what concepts mean in a philosophical sense. There is no attempt to model the full conceptual space of air travel. The scope is operational: the data the project actually serves.

Not a standards body output. flighthelp is a project, not a standards organization. The schemas are open and adoptable, but the project does not pretend to be ISO, IATA, or the OpenTravel Alliance. If a real standards body eventually publishes compatible schemas, the project will move toward them.

Not frozen. The schemas are designed for stability, not stasis. Air travel changes — new regulations, new ticket structures, new contact channels (WhatsApp didn't exist in 2004). The schemas evolve with the domain. The versioning policy exists so they can evolve without breaking downstream consumers.