|
| 1 | +# Data directory |
| 2 | + |
| 3 | +Purpose-built utilities, schemas, and workflows that power our Liquid `{% data %}` and `{% indented_data_reference %}` tags, reusable content, UI strings, and feature metadata. This subject focuses on how we read, validate, and serve files in `data/` across languages. |
| 4 | + |
| 5 | +## Purpose & scope |
| 6 | +- Provide a consistent API (`getDataByLanguage`, `getDeepDataByLanguage`) to load `data/` files for Liquid rendering and server contexts. |
| 7 | +- Enforce schemas for critical data (features, variables, learning tracks, release notes, tables, glossaries, code languages, CTAs). |
| 8 | +- Ship CLI and CI helpers that keep `data/` clean (orphaned feature detection, deleted-feature PR guardrails). |
| 9 | +- Exclude: content authoring guidance (see `content/`), page routing (see `src/app`/`src/frame`), and general linter rules (see `src/content-linter`). |
| 10 | + |
| 11 | +## Architecture & key assets |
| 12 | +- `lib/get-data.ts`: translation-aware loader with memoized reads, forced-English exceptions, and UI data merging; used by Liquid tags and server contexts. |
| 13 | +- `lib/data-directory.ts` + `lib/filename-to-key.ts`: generic walker that turns files into dotted-key objects with optional preprocessing. |
| 14 | +- `lib/data-schemas/`: AJV schema registry that auto-discovers `data/tables/*.yml` schemas and registers other critical shapes (features, variables, learning tracks, release notes, glossaries, code languages, CTAs). |
| 15 | +- Middleware: `middleware/data-tables.ts` caches table data into `req.context.tables` (English). |
| 16 | +- Scripts: `scripts/find-orphaned-features/*` (detect/delete unused `data/features/*.yml`) and `scripts/deleted-features-pr-comment.ts` (warn on feature deletions in PRs). |
| 17 | +- Tests: `tests/` cover schema validation, data loading, key normalization, and orphan detection fixtures. |
| 18 | + |
| 19 | +## Data loading contracts |
| 20 | +- `lib/get-data.ts` |
| 21 | + - `getDataByLanguage(dottedPath, langCode)`: Returns a single value (YAML/MD/variables/reusables/ui/glossaries/release-notes/product-examples). |
| 22 | + - `getDeepDataByLanguage(dottedPath, langCode)`: Returns nested objects for an entire subtree (e.g., `tables`, `features`). |
| 23 | + - Translation fallbacks: If a localized file is missing or unparsable, falls back to English. Certain files are forced-English (`ALWAYS_ENGLISH_YAML_FILES`, `ALWAYS_ENGLISH_MD_FILES`). |
| 24 | + - Memoization: Caches reads except in `NODE_ENV=development` to simplify local debugging. |
| 25 | +- `lib/data-directory.ts` |
| 26 | + - Recursively walks a directory, filters by extensions (`.json`, `.md/.markdown`, `.yml`) and ignore patterns, and emits a dotted-key object using `filename-to-key`. |
| 27 | + - Optional `preprocess` hook for content transformation (used in tests/prior scripts). |
| 28 | + |
| 29 | +## Schemas and validation |
| 30 | +- Schema registry: `lib/data-schemas/index.ts` maps data paths to schema modules; auto-registers any `data/tables/*.yml` that has a matching `data-schemas/tables/{name}.ts`. |
| 31 | +- Tests: `src/data-directory/tests/data-schemas.ts` loads schemas via AJV and asserts every registered file validates. |
| 32 | +- Adding a schema: |
| 33 | + 1. Create `src/data-directory/lib/data-schemas/<name>.ts` (or `tables/<table>.ts`). |
| 34 | + 2. If non-table, add to `manualSchemas` in `data-schemas/index.ts`; table schemas are auto-detected. |
| 35 | + 3. Run tests (see below). |
| 36 | + |
| 37 | +## Middleware |
| 38 | +- `middleware/data-tables.ts` populates `req.context.tables` with `getDeepDataByLanguage('tables', 'en')`. Intended for server/Express contexts where table data is needed without per-request file IO. |
| 39 | + |
| 40 | +## Scripts & workflows |
| 41 | +- `npm run find-orphaned-features -- --source-directory data/features --output orphans.json` |
| 42 | + - Scans pages, reusables, variables (all languages) for `{% ifversion %}` feature references and reports unused `data/features/*.yml`. |
| 43 | +- `npm run find-orphaned-features delete -- orphans.json --max 10` |
| 44 | + - Deletes up to N orphaned feature files (English root) after manual review. |
| 45 | +- `npm run deleted-features-pr-comment -- <owner> <repo> <base_sha> <head_sha>` |
| 46 | + - Generates Markdown warning if a PR removes or renames feature files; used in CI (requires `GITHUB_TOKEN`). |
| 47 | + |
| 48 | +## Testing |
| 49 | +- All tests: `npm test -- src/data-directory/tests` |
| 50 | +- Targeted: |
| 51 | + - Schemas: `npm test -- src/data-directory/tests/data-schemas.ts` |
| 52 | + - Orphans: `npm test -- src/data-directory/tests/orphaned-features.ts` |
| 53 | + - Loader basics: `npm test -- src/data-directory/tests/get-data.ts` |
| 54 | + |
| 55 | +## Data conventions and consumers |
| 56 | +- File locations: Everything under `data/` (English and localized mirrors). Reusables/variables/ui are read via dotted paths (`reusables.foo.bar`, `variables.product.prodname_ghe_server`, `ui.pages.home`). |
| 57 | +- Markdown in data: Frontmatter is stripped by `gray-matter`; content is trimmed. |
| 58 | +- Downstream consumers: |
| 59 | + - Liquid tags: `content-render/liquid/data.ts`, `indented-data-reference.ts` |
| 60 | + - Content linter: `content-linter/lib/linting-rules/liquid-data-tags.ts`, `frontmatter-intro-links.ts` |
| 61 | + - Server: `app/lib/app-router-context.ts`, `app/lib/server-context-utils.ts` |
| 62 | + - Metrics/tests: `content-render/tests`, `content-linter/tests/site-data-references.ts` |
| 63 | +- Translation notes: |
| 64 | + - Fallbacks ensure missing localized YAML/MD reads from English. |
| 65 | + - Specific files are forced-English to avoid corrupt translations (see constants in `get-data.ts`). |
| 66 | + |
| 67 | +## Setup & usage tips |
| 68 | +- Ensure `data/` exists relative to project root; schemas auto-scan `data/tables` at runtime. |
| 69 | +- Set `DEBUG_JIT_DATA_READS=true` to log every on-disk read from the data loaders; useful alongside tests or local runs to trace which data files are touched. |
| 70 | +- When adding a new data directory: |
| 71 | + - Prefer YAML for structured data; add schema if shape matters to correctness. |
| 72 | + - Add README under `data/<dir>/` when introducing new contracts. |
| 73 | + - Update `manualSchemas` if not a table. |
| 74 | + |
| 75 | +## Ownership & escalation |
| 76 | +- Primary: Docs Engineering. |
| 77 | +- Content changes: Docs Content (docs-content). |
| 78 | + |
| 79 | +## Current state & next steps |
| 80 | +- Current state: KTLO; minimal changes expected. Update this README when touching data loaders, schemas, or scripts. |
| 81 | +- Next steps: Keep the schema registry aligned with new data shapes and rerun `npm test -- src/data-directory/tests` when data contracts change. |
0 commit comments