Skip to content

feat(compile): migrate legacy {{ workspace }} markers and validate checkout-aware paths#1263

Merged
jamesadevine merged 4 commits into
mainfrom
feat/deprecate-legacy-path-markers
Jul 1, 2026
Merged

feat(compile): migrate legacy {{ workspace }} markers and validate checkout-aware paths#1263
jamesadevine merged 4 commits into
mainfrom
feat/deprecate-legacy-path-markers

Conversation

@jamesadevine

Copy link
Copy Markdown
Collaborator

Summary

Before the native-IR migration, the compiler folded the directory markers {{ workspace }}, {{ working_directory }} and {{ trigger_repo_directory }} across the entire generated YAML — including user-authored steps: / post-steps: / setup: / teardown:. After the migration these markers flow through verbatim and are no longer substituted (the reported regression).

Rather than restore runtime substitution of a fixed-path anchor — an antipattern under multi-checkout, where $(Build.SourcesDirectory) is the shared root of every checked-out repo — this PR migrates existing sources and adds checkout-aware validation:

  • Codemod 0004_legacy_path_markers — rewrites the markers (interior whitespace ignored, so both {{ workspace }} and {{workspace}} are handled) in every front-matter string scalar to the explicit ADO path they resolved to, derived from the source's own workspace: / repos::

    • {{ workspace }} / {{ working_directory }}$(Build.SourcesDirectory), $(Build.SourcesDirectory)/$(Build.Repository.Name), or $(Build.SourcesDirectory)/<alias>
    • {{ trigger_repo_directory }} → the trigger ("self") repo dir

    Idempotent, pure, registered after m0001; ${{ parameters }} and {{#runtime-import}} spans are preserved.

  • Warning-only path_layout_check pass (hooked into compile_pipeline_inner) — flags $(Build.SourcesDirectory)/<seg> references to declared-but-not-checked-out repos, the multi-checkout self-subfolder form used under a single checkout, {{#runtime-import …}} targets to not-checked-out repos, and deprecated markers left in the agent body (which the codemod cannot migrate). Never fails the compile.

  • Shared pure resolvers — extracted resolve_effective_workspace (pure core) from compute_effective_workspace, plus resolve_working_directory_expr and contains_template_marker, reused by both the codemod and the validation pass to avoid drift.

  • Docsdocs/codemods.md (new section), docs/front-matter.md (deprecation note), and the AGENTS.md architecture tree.

Test plan

  • cargo build
  • cargo test ✓ — 2310+ pass, including:
    • 11 codemod unit tests (single/multi/alias resolution, no-space variant, span preservation, idempotency, all four step blocks)
    • 10 path_layout_check unit tests (each warning case + dedup + extraction helpers)
    • 1 end-to-end CLI integration test (compile_migrates_legacy_workspace_marker_in_steps) asserting source rewrite + lock content + codemod warning
  • cargo clippy --all-targets --all-features ✓ clean

…eckout-aware paths

Before the native-IR migration, the compiler folded {{ workspace }},
{{ working_directory }} and {{ trigger_repo_directory }} markers across
the whole generated YAML (including custom steps). After the migration
these markers flow through verbatim and are no longer substituted.

Rather than restore runtime substitution of a fixed-path anchor (an
antipattern under multi-checkout, where $(Build.SourcesDirectory) is the
shared root of every checked-out repo), this:

- Adds codemod 0004_legacy_path_markers, which rewrites the markers in
  front matter to the explicit ADO path they resolved to, derived from
  the source's own workspace:/repos:.
- Adds a warning-only path_layout_check pass that flags checkout-aware
  path mistakes (references to not-checked-out repos, the multi-checkout
  self-subfolder form under single checkout, runtime-import targets to
  not-checked-out repos) and deprecated markers left in the agent body.
- Extracts shared pure resolvers (resolve_working_directory_expr,
  contains_template_marker) from compute_effective_workspace.

Co-authored-by: Copilot <[email protected]>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — clean architecture, solid tests, correct codemod ordering. Two issues worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • contains_template_marker / replace_marker can disagree on pathological input (src/compile/common.rs + src/compile/codemods/0004_legacy_path_markers.rs)

    The two functions advance differently on a non-matching {{ ... }} span:

    • contains_template_marker skips ahead to i = start + close + 2 (past the }}).
    • replace_marker advances by one character at a time.

    This means for an input like "{{ bad {{ workspace }} }}", contains_template_marker finds the outer {{, its inner content (" bad {{ workspace }} ") trims to "bad {{ workspace }}""workspace", then jumps to i = 22 — past the closing }} — and returns false. But replace_marker falls through char-by-char and does find and replace the inner {{ workspace }}.

    Impact: the value_has_any_marker early-exit gate in apply_codemod can return false while replace_in_value would have found a real marker. The codemod exits Ok(false) and leaves the marker unsubstituted. This only triggers on doubly-nested {{ ... }} forms which are extremely unlikely in real YAML, but the two helper functions should behave consistently. One fix: in contains_template_marker, on a non-match advance by one code-point (same as replace_marker) rather than past }}.

  • sources_dir_segments advances i to start, not start + seg.len() (src/compile/path_layout_check.rs, line ~786)

    After extracting seg, the loop sets i = start (the position immediately after $(Build.SourcesDirectory)/), not i = start + seg.len(). For a back-to-back double-prefix like "$(Build.SourcesDirectory)/$(Build.SourcesDirectory)/foo", the first pass extracts "$(Build.SourcesDirectory)" as a segment (stops at the embedded /), then the second pass re-searches from start and correctly extracts "foo". No infinite loop is possible, but the spurious "$(Build.SourcesDirectory)" segment is pushed unnecessarily. It doesn't trigger a false warning (no repo alias would match that string), but i = start + seg.len() is the intended advancement.

⚠️ Suggestions

  • replace_marker — prefer ?/unwrap_or over bare expect (src/compile/codemods/0004_legacy_path_markers.rs, line ~287)
    let ch = input[i..].chars().next().expect("non-empty remainder");
    The invariant (while i < input.len()chars().next() is Some) is correct for valid UTF-8 &str, but .expect() on a hot path inside a string-walking loop reads as a potential panic to reviewers. Consider:
    let Some(ch) = input[i..].chars().next() else { break };
    This makes the termination condition explicit and avoids the expect.

✅ What Looks Good

  • Pure/side-effect-free refactoring: Extracting resolve_effective_workspace as a private pure fn with the eprintln! advisory living only in compute_effective_workspace is exactly the right separation for codemod use.
  • Detection-before-deserialization: The early marker scan before lower_repos deserialization is a good performance pattern.
  • Codemod ordering: Registering m0004 after m0001_repos_unified is correct — repos: is in canonical form by the time derive_checkout_aliases runs.
  • Idempotency: The test explicitly verifies a second apply_codemod call is a no-op. ✓
  • Span preservation: ${{ parameters.x }} and {{#runtime-import ...}} are preserved correctly; the integration test covers this. ✓
  • Warning deduplication: warnings.sort(); warnings.dedup(); is a clean way to avoid repeated advisories when the same bad path appears in multiple steps.

Generated by Rust PR Reviewer for issue #1263 · 298.7 AIC · ⌖ 12.7 AIC · ⊞ 34.9K ·

Address Rust PR review feedback on #1263:

- contains_template_marker now advances one code-point past a non-matching {{ (matching replace_marker) so nested {{ ... }} forms are detected consistently and the codemod early-exit gate cannot skip a substitutable marker.

- sources_dir_segments advances i by start + seg.len() instead of start, avoiding re-scanning an extracted segment on back-to-back double prefixes.

- replace_marker uses a let-else break instead of expect() on the string-walk hot path.

Adds regression tests for the nested-marker consistency and double-prefix segment cases.

Co-authored-by: Copilot <[email protected]>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — clean refactor, solid test coverage, and a well-motivated approach. One real (low-severity) bug to fix and a couple of minor observations.

Findings

🐛 Bugs / Logic Issues

replace_marker / contains_template_marker${{ marker_name }} is not excluded

Both helpers check for {{ without testing whether it's preceded by $. This means ${{ workspace }} (not valid ADO syntax, but theoretically writable) is treated as a legacy marker:

  • contains_template_marker("${{ workspace }}", "workspace")true (false positive, triggers codemod)
  • replace_marker("${{ workspace }}", "workspace", repl)"$<repl>" (outputs $$(Build.SourcesDirectory), which is broken)

The PR's description says ${{ parameters }} spans are preserved — that's true only because parameters.x ≠ "workspace", not because ${{ is explicitly excluded. A future marker name that happens to match an ADO template variable would silently corrupt output.

The existing marker_present_ignores_other_spans test does not cover ${{ workspace }} specifically, so this gap is invisible in CI.

Suggested fix for both helpers: skip any {{ that is immediately preceded by $:

// in contains_template_marker
let preceded_by_dollar = marker_start > 0 && input.as_bytes()[marker_start - 1] == b'$';
if !preceded_by_dollar {
    if let Some(close) = input[start..].find("}}") ... { return true; }
}

// in replace_marker
if input[i..].starts_with("{{") {
    let preceded_by_dollar = i > 0 && input.as_bytes()[i - 1] == b'$';
    if !preceded_by_dollar { ... }
}

⚠️ Suggestions

  • path_layout_check.rs:50collect_path_layout_warnings checks $(Build.SourcesDirectory)/<seg> only in step blocks (setup/steps/post_steps/teardown), but the codemod migrates ALL front-matter string scalars. If a description: or some other field contained a non-checked-out repo reference it would be migrated by the codemod but silently pass the validation pass. Low risk in practice (non-step fields aren't executed as paths) but creates a small scope inconsistency worth a comment explaining the deliberate narrowing.

  • path_layout_check.rs:159sources_dir_segments: when seg is empty (the char right after the prefix is a stopper like space or quote), i is set to start + 0 = start, i.e. pointing immediately past the prefix. The subsequent find then starts from there and cannot re-match the same prefix, so there's no infinite-loop risk — but this invariant isn't stated in the comment. The existing i = start + seg.len() comment discusses back-to-back prefixes but doesn't address the empty-segment case; a one-liner would help future readers.

✅ What Looks Good

  • The extraction of resolve_effective_workspace as a pure core (no stderr side effects) with compute_effective_workspace and resolve_working_directory_expr as thin wrappers is exactly the right way to share logic between a codemod (must stay side-effect-free) and the normal compile path.
  • Detection-before-deserialization pattern in apply_codemod (scanning for any marker before deserializing repos:) is a good cheap fast path.
  • The nested_marker_detection_matches_replacement regression test is a nice guard ensuring contains_template_marker and replace_marker agree on doubly-nested input.
  • sort + dedup for warning deduplication is correct and deterministic.
  • Test coverage is thorough: idempotency, all four step blocks, multi/single checkout resolution, span preservation, and an end-to-end CLI integration test.

Generated by Rust PR Reviewer for issue #1263 · 217.6 AIC · ⌖ 12.5 AIC · ⊞ 36.5K ·

Address second Rust PR review on #1263:

contains_template_marker and replace_marker now skip any {{ immediately preceded by a dollar sign, so an ADO template expression like ${{ workspace }} is neither detected as a legacy marker nor substituted (which would have produced a stray $ before the replacement).

Adds a regression test covering the exclusion for both helpers, and documents the deliberate narrowing of the step-path check to executable step blocks plus the empty-segment no-progress-risk invariant in sources_dir_segments.

Co-authored-by: Copilot <[email protected]>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good — sound logic, well-tested, no bugs or security issues found. Two minor visibility nits.


Findings

⚠️ Suggestions

  • src/compile/common.rs:555resolve_working_directory_expr is declared pub but is only consumed crate-internally (by m0004_legacy_path_markers). contains_template_marker right below it is correctly pub(crate). Consider making resolve_working_directory_expr pub(crate) as well to keep the API surface consistent and signal that it is not part of the public crate interface.

  • src/compile/path_layout_check.rs:735collect_path_layout_warnings is pub inside a private module (mod path_layout_check in mod.rs). The private module already prevents accidental external exposure, but pub(super) or pub(crate) would make the intent explicit and silence any future lint about "unreachable pub" if the module visibility changes.


✅ What Looks Good

  • Detection-first codemod: the early value_has_any_marker gate avoids deserialising repos: on the common path (no markers present). Smart.
  • Consistency between detect and replace: contains_template_marker and replace_marker use the same "advance one byte past the opening {{" strategy when a span doesn't match, so they agree on every input including the pathological doubly-nested case explicitly covered by the nested_marker_detection_matches_replacement test.
  • ${{ parameters.x }} guard: the preceded_by_dollar check in both helpers correctly skips ADO template expressions that would produce a stray $ if substituted. Tested.
  • Idempotency: the resolved expressions contain no {{ }}, so a second codemod run is provably a no-op. Tested.
  • Clean refactor of compute_effective_workspace: extracting the pure resolve_effective_workspace core (returns (encoded, warn_flag)) neatly separates the side-effectful warning from the testable resolution logic without duplicating any path.
  • sources_dir_segments overflow safety: advancing by start + seg.len() (rather than start) prevents re-matching the same prefix in the back-to-back double-prefix edge case, documented and tested.
  • Warning deduplication: warnings.sort(); warnings.dedup(); keeps output deterministic without pulling in a HashSet; the dedup test confirms repeated identical references emit exactly one advisory.

Generated by Rust PR Reviewer for issue #1263 · 191.2 AIC · ⌖ 12.2 AIC · ⊞ 34.9K ·

Address third Rust PR review on #1263: resolve_working_directory_expr and collect_path_layout_warnings are only used within the crate, so narrow them from pub to pub(crate) to keep the public API surface honest.

Co-authored-by: Copilot <[email protected]>
@jamesadevine jamesadevine merged commit 9a01c07 into main Jul 1, 2026
3 checks passed
@jamesadevine jamesadevine deleted the feat/deprecate-legacy-path-markers branch July 1, 2026 10:17
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid implementation — logic is correct, well-tested, and the design choices (codemod vs. runtime substitution, warning-only validation) are sound. A few minor points worth noting for follow-up.

Findings

✅ What Looks Good

  • resolve_effective_workspace extraction is clean. The pure-core / side-effectful-wrapper split is exactly the right pattern for codemod reuse — codemods stay side-effect-free while the compile path still surfaces the "workspace: repo with no additional checkouts" advisory via compute_effective_workspace.
  • replace_marker / contains_template_marker consistency is carefully guarded. The ${{ ... }} dollar-escape check prevents both false positives in the detector and the stray-$ corruption that would result in the replacer. The regression test for doubly-nested {{ bad {{ workspace }} }} is a good catch.
  • Idempotency is correctly achieved: resolved $(Build.SourcesDirectory...) expressions contain no {{ }}, so a second codemod run early-exits immediately.
  • Codemod ordering dependency is satisfied: m0004 is registered after m0001 (repos_unified), so repos: is already in the unified form when derive_checkout_aliases deserialises it.
  • Warning deduplication via sort() + dedup() is correct and tested.
  • Test coverage is thorough: 11 codemod unit tests, 10 path-layout-check unit tests, 1 end-to-end CLI integration test.

⚠️ Suggestions

  • [src/compile/codemods/0004_legacy_path_markers.rs] Missing test for {{ trigger_repo_directory }} in single-checkout: generate_trigger_repo_directory([]) returns $(Build.SourcesDirectory) (verified in pre-existing tests), but there is no codemod-level test exercising this path. trigger_repo_directory_marker_multi_checkout covers the multi-checkout case; a trigger_repo_directory_marker_single_checkout test would close the gap.

  • [src/compile/path_layout_check.rs:117] Body deprecation warning for trigger_repo_directory is slightly generic: The message says "Replace it with an explicit $(Build.SourcesDirectory)... path" for all three deprecated markers uniformly. For trigger_repo_directory specifically, pointing at $(Build.SourcesDirectory)/$(Build.Repository.Name) (multi-checkout) vs. $(Build.SourcesDirectory) (single-checkout) would make the advisory more actionable without adding much complexity.

  • [src/compile/path_layout_check.rs:173] sources_dir_segments stopper list is slightly narrow: The take_while terminators are /, whitespace, ", ', \. Common bash metacharacters &, |, ;, `, ) are absent. In practice whitespace protects most real-world patterns (/other &&→ segment other), but a no-space form like /other&&ls would extract segment other&&ls, which won't match any declared alias and silently produces no warning. Acceptable for warning-only code — just worth documenting the known limitation.

  • [src/compile/codemods/0004_legacy_path_markers.rs:68] Codemod propagates alias-validation errors: When markers are present and workspace: badAlias is set (alias not in checkout:), resolve_working_directory_exprresolve_effective_workspace will bail before full validation runs. The error message is descriptive, and the compile would fail at validation anyway, so this is consistent — but it may be surprising to see a codemod step emit a workspace-alias error. A code comment here noting the intentional early-fail would help future maintainers.

Generated by Rust PR Reviewer for issue #1263 · 266.6 AIC · ⌖ 12.3 AIC · ⊞ 35K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant