Skip to content

Commit 95fadcf

Browse files
authored
docs: update data-directory README (#58807)
1 parent 7e0d2a2 commit 95fadcf

File tree

1 file changed

+81
-0
lines changed

1 file changed

+81
-0
lines changed

src/data-directory/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Data directory
2+
3+
Purpose-built utilities, schemas, and workflows that power our Liquid `{% data %}` and `{% indented_data_reference %}` tags, reusable content, UI strings, and feature metadata. This subject focuses on how we read, validate, and serve files in `data/` across languages.
4+
5+
## Purpose & scope
6+
- Provide a consistent API (`getDataByLanguage`, `getDeepDataByLanguage`) to load `data/` files for Liquid rendering and server contexts.
7+
- Enforce schemas for critical data (features, variables, learning tracks, release notes, tables, glossaries, code languages, CTAs).
8+
- Ship CLI and CI helpers that keep `data/` clean (orphaned feature detection, deleted-feature PR guardrails).
9+
- Exclude: content authoring guidance (see `content/`), page routing (see `src/app`/`src/frame`), and general linter rules (see `src/content-linter`).
10+
11+
## Architecture & key assets
12+
- `lib/get-data.ts`: translation-aware loader with memoized reads, forced-English exceptions, and UI data merging; used by Liquid tags and server contexts.
13+
- `lib/data-directory.ts` + `lib/filename-to-key.ts`: generic walker that turns files into dotted-key objects with optional preprocessing.
14+
- `lib/data-schemas/`: AJV schema registry that auto-discovers `data/tables/*.yml` schemas and registers other critical shapes (features, variables, learning tracks, release notes, glossaries, code languages, CTAs).
15+
- Middleware: `middleware/data-tables.ts` caches table data into `req.context.tables` (English).
16+
- Scripts: `scripts/find-orphaned-features/*` (detect/delete unused `data/features/*.yml`) and `scripts/deleted-features-pr-comment.ts` (warn on feature deletions in PRs).
17+
- Tests: `tests/` cover schema validation, data loading, key normalization, and orphan detection fixtures.
18+
19+
## Data loading contracts
20+
- `lib/get-data.ts`
21+
- `getDataByLanguage(dottedPath, langCode)`: Returns a single value (YAML/MD/variables/reusables/ui/glossaries/release-notes/product-examples).
22+
- `getDeepDataByLanguage(dottedPath, langCode)`: Returns nested objects for an entire subtree (e.g., `tables`, `features`).
23+
- Translation fallbacks: If a localized file is missing or unparsable, falls back to English. Certain files are forced-English (`ALWAYS_ENGLISH_YAML_FILES`, `ALWAYS_ENGLISH_MD_FILES`).
24+
- Memoization: Caches reads except in `NODE_ENV=development` to simplify local debugging.
25+
- `lib/data-directory.ts`
26+
- Recursively walks a directory, filters by extensions (`.json`, `.md/.markdown`, `.yml`) and ignore patterns, and emits a dotted-key object using `filename-to-key`.
27+
- Optional `preprocess` hook for content transformation (used in tests/prior scripts).
28+
29+
## Schemas and validation
30+
- Schema registry: `lib/data-schemas/index.ts` maps data paths to schema modules; auto-registers any `data/tables/*.yml` that has a matching `data-schemas/tables/{name}.ts`.
31+
- Tests: `src/data-directory/tests/data-schemas.ts` loads schemas via AJV and asserts every registered file validates.
32+
- Adding a schema:
33+
1. Create `src/data-directory/lib/data-schemas/<name>.ts` (or `tables/<table>.ts`).
34+
2. If non-table, add to `manualSchemas` in `data-schemas/index.ts`; table schemas are auto-detected.
35+
3. Run tests (see below).
36+
37+
## Middleware
38+
- `middleware/data-tables.ts` populates `req.context.tables` with `getDeepDataByLanguage('tables', 'en')`. Intended for server/Express contexts where table data is needed without per-request file IO.
39+
40+
## Scripts & workflows
41+
- `npm run find-orphaned-features -- --source-directory data/features --output orphans.json`
42+
- Scans pages, reusables, variables (all languages) for `{% ifversion %}` feature references and reports unused `data/features/*.yml`.
43+
- `npm run find-orphaned-features delete -- orphans.json --max 10`
44+
- Deletes up to N orphaned feature files (English root) after manual review.
45+
- `npm run deleted-features-pr-comment -- <owner> <repo> <base_sha> <head_sha>`
46+
- Generates Markdown warning if a PR removes or renames feature files; used in CI (requires `GITHUB_TOKEN`).
47+
48+
## Testing
49+
- All tests: `npm test -- src/data-directory/tests`
50+
- Targeted:
51+
- Schemas: `npm test -- src/data-directory/tests/data-schemas.ts`
52+
- Orphans: `npm test -- src/data-directory/tests/orphaned-features.ts`
53+
- Loader basics: `npm test -- src/data-directory/tests/get-data.ts`
54+
55+
## Data conventions and consumers
56+
- File locations: Everything under `data/` (English and localized mirrors). Reusables/variables/ui are read via dotted paths (`reusables.foo.bar`, `variables.product.prodname_ghe_server`, `ui.pages.home`).
57+
- Markdown in data: Frontmatter is stripped by `gray-matter`; content is trimmed.
58+
- Downstream consumers:
59+
- Liquid tags: `content-render/liquid/data.ts`, `indented-data-reference.ts`
60+
- Content linter: `content-linter/lib/linting-rules/liquid-data-tags.ts`, `frontmatter-intro-links.ts`
61+
- Server: `app/lib/app-router-context.ts`, `app/lib/server-context-utils.ts`
62+
- Metrics/tests: `content-render/tests`, `content-linter/tests/site-data-references.ts`
63+
- Translation notes:
64+
- Fallbacks ensure missing localized YAML/MD reads from English.
65+
- Specific files are forced-English to avoid corrupt translations (see constants in `get-data.ts`).
66+
67+
## Setup & usage tips
68+
- Ensure `data/` exists relative to project root; schemas auto-scan `data/tables` at runtime.
69+
- Set `DEBUG_JIT_DATA_READS=true` to log every on-disk read from the data loaders; useful alongside tests or local runs to trace which data files are touched.
70+
- When adding a new data directory:
71+
- Prefer YAML for structured data; add schema if shape matters to correctness.
72+
- Add README under `data/<dir>/` when introducing new contracts.
73+
- Update `manualSchemas` if not a table.
74+
75+
## Ownership & escalation
76+
- Primary: Docs Engineering.
77+
- Content changes: Docs Content (docs-content).
78+
79+
## Current state & next steps
80+
- Current state: KTLO; minimal changes expected. Update this README when touching data loaders, schemas, or scripts.
81+
- Next steps: Keep the schema registry aligned with new data shapes and rerun `npm test -- src/data-directory/tests` when data contracts change.

0 commit comments

Comments
 (0)