[refactor] Semantic function clustering: verified duplicates, outliers, scattered helpers & util reimplementations

### &#128295; Semantic Function Clustering Analysis

*Automated analysis of `github/gh-aw` &mdash; 871 Go source files (excl. `_test.go`); mostly `pkg/workflow` (393) & `pkg/cli` (309).* Parallel finder agents clustered functions by purpose; each candidate was then adversarially re-checked against the real code. Build-tag (`*_wasm.go`) variants, engine-specific log parsers, factory-backed codemods, and dual-format renderers were inspected and rejected.

#### Summary

| Category | Findings | Top item |
|---|---|---|
| Duplicate functions | 9 | Safe-output `parse*Config` scaffold (~10 handlers) |
| Outliers (wrong file) | 4 | Concurrency/engine-API logic in `notify_comment.go` |
| Scattered / util-reimpl | 7 | Dedup reimplemented vs `sliceutil` |
| Generics | 1 | `typeutil.Lookup[T]` |
| Near-clone files | 1 | `inline_skill_extractor.go` &harr; `sub_agent_extractor.go` |

**Top 3 fixes (high impact, low risk):** (1) collapse the two near-identical parser extractor files; (2) adopt existing `sliceutil`/`stringutil`/`repoutil` utils for ~10 local reimplementations; (3) extract the safe-output `parse*Config` scaffold (absorbs the 12&times; preprocess wrapper).

---

### 1. Duplicate / Near-Duplicate Functions

<details><summary><b>pkg/workflow</b> &mdash; 5</summary>

**1a. Safe-output `parse*Config` scaffold copy-pasted ~10&times; (HIGH)** &mdash; same 5-step boilerplate (key check &rarr; `preprocessIntFieldAsString("max")` &rarr; `parseConfigScaffold` w/ near-identical `onError` &rarr; default-max). Sites: `assign_to_user.go:20`, `unassign_from_user.go:19`, `assign_to_agent.go:25`, `add_reviewer.go:21`, `add_comment.go:28`, `close_entity_helpers.go:104`, `comment_memory.go:26` (+ create_issue/discussion/pull_request). *Fix:* extend `parseConfigScaffold` (or `parseSafeOutputConfigWithMax[T]`) to take templatable fields + default-max; the `*_entity_helpers.go` generics show the target.

**1b. Inline `[]any&rarr;[]string` loops reimplement `parseStringSliceAny` (HIGH)** &mdash; `parse_helpers.go:70` already does it, re-coded ~17&times; inline: `tools_parser.go` (11&times;, e.g. `:230`,`:301`,`:321`), `comment.go:84`, `repo_memory.go` (2&times;), `safe_outputs_messages_config.go`, `mcp_config_types.go`, `claude_tools.go`. *Fix:* call `parseStringSliceAny` (see model `role_checks.go:186`).

**1c. `toolCallMap` upsert duplicated ~6&times; across engine log parsers (HIGH)** &mdash; `claude_logs.go:386`, `codex_logs.go:130`&`167`, `copilot_logs.go:102`,`394`,`435`. *Fix:* `recordToolCall(toolCallMap, name, inputSize)`.

**1d. `preprocess<Field>`+"Invalid X"+`return nil` wrapper 12&times; (HIGH)** &mdash; `add_comment.go` (4&times;), `assign_to_user.go` (2&times;), `comment_memory.go` (2&times;), `assign_to_agent.go`, `add_reviewer.go`, `reply_to_pr_review_comment.go`, `noop.go`. *Fix:* fold into 1a via `preprocessFields(...)`.

**1e. "cap max at 50" idiom 3&times; (MED)** &mdash; `call_workflow.go:57`, `dispatch_workflow.go:65`, `dispatch_repository.go:107`. *Fix:* `capMax(maxPtr, limit, log)`.

</details>

<details><summary><b>pkg/cli</b> &mdash; 4</summary>

**2a. Human-comment counting loop reimplemented (HIGH)** &mdash; fetch `issues/%d/comments`, skip `isBotUser`, count: `outcome_eval_issue.go:44`, `outcome_eval_pr.go:107`, `outcome_eval_comment.go:64` (adds time filter). *Fix:* `countHumanComments(repo, num, after)` in `outcome_eval.go`.

**2b. Secret-prompt clones (HIGH)** &mdash; `engine_secrets.go:291/342/389` (`promptForCopilotPATUnified`/`...SystemToken...`/`...GenericAPIKey...`) share the same `huh` password-form &rarr; setenv &rarr; upload skeleton; differ only in intro text + Copilot validate. *Fix:* `promptAndStoreSecret(req, config, intro, extraValidate, successMsg)`.

**2c. Two parallel update-check subsystems (MED)** &mdash; `update_check.go` vs `compile_update_check.go`: `updateLastCheckTime:140`/`updateCompileUpdateCheckTime:298` near-identical (differ by logger+perm); `shouldCheckForUpdate`/`shouldRunCompileUpdateCheck` share throttle core. *Fix:* shared `lastCheckThrottle{file,interval,perm,log}`.

**2d. `gh secret set` upload reimplemented (LOW)** &mdash; `engine_secrets.go:480` & `add_interactive_secrets.go:51`. *Fix:* share `setRepoSecret(name,value,repo)`.

</details>

---

### 2. Outlier Functions (wrong file)

`pkg/workflow` is well-decomposed; mismatches cluster in `notify_comment.go`.

| Function | Now in | Belongs in | Why |
|---|---|---|---|
| `isGroupConcurrencyQueueEnabled`/`parseGroupConcurrencyQueueFeatureValue` `:632/644` | `notify_comment.go` | `concurrency.go` | concurrency-flag logic; already called from `concurrency.go:84` (HIGH) |
| `getEngineAPIHosts` `:764` | `notify_comment.go` | `engine_api_targets.go` | engine-API hostname table (HIGH) |
| `toEnvVarCase` `:740` | `notify_comment.go` | `strings.go` | generic string transformer (MED) |
| `splitShellTokens` `:231` | `gh_cli_permissions.go` | `shell.go` | generic shell tokenizer; single-caller (MED) |

---

### 3. Scattered Helpers & Util Reimplementations

The repo ships shared util packages; these are local reimplementations of them (&#10003; = verified in this run).

<details open><summary>7 findings</summary>

**3a. &#10003; Order-preserving dedup vs `sliceutil.Deduplicate`/`MergeUnique` (HIGH)** &mdash; `claude_tools.go:413 dedupeAllowedTools` (verified identical body), `docker.go:225 mergeDockerImages` (=`MergeUnique`), `central_slash_command_workflow.go:487 uniqueSorted`, `parser/tools_merger.go:184 mergeAllowedArrays`, + inline loops in `parser/frontmatter_hash.go:~238`, `parser/import_field_extractor.go:466`.

**3b. &#10003; Pass-through wrappers over `stringutil` (HIGH, nuanced)** &mdash; `parser/schema_suggestions.go:234/244` `FindClosestMatches`/`LevenshteinDistance` are one-line delegations (verified). *Fix:* import `stringutil` directly &mdash; unless retained as a deliberate re-export (maintainer call).

**3c. Truncate-with-ellipsis inline 4&times; &rarr; `stringutil.Truncate` (MED)** &mdash; `audit_report.go:693`,`749`,`784`, `audit_report_analysis.go:57` (note: inline len = N+3 vs Truncate cap-at-N).

**3d. `parseRepoSlugLiteral` vs `repoutil.SplitRepoSlug` (MED)** &mdash; `dispatch_workflow_validation.go:176`; contrast the good `cli/engine_secrets.go:694 splitRepoSlug` which delegates.

**3e. `containsAny` reimplemented in 3 packages (MED)** &mdash; `cli/audit_agentic_analysis.go:492`, `errorutil/errors.go:55 containsErrorSubstring`, `linters/errormessage/errormessage.go:213 containsAnyWholeWord`. *Fix:* add `stringutil.ContainsAny(s, subs...)`.

</details>

---

### 4. Generics Opportunity

**`typeutil.LookupString`/`LookupMap` (`convert.go:150/169`)** are identical apart from the asserted type &rarr; collapse to `func Lookup[T any](m map[string]any, key string) (T, bool)`, keep the two as one-line wrappers (MED; ~3 call sites).

---

### 5. Near-Clone Files &mdash; highest single win

**`pkg/parser/inline_skill_extractor.go` &harr; `sub_agent_extractor.go`** are structurally line-for-line identical, differing only by marker (`skill:` vs `agent:`), result struct (`InlineSkill`/`InlineSubAgent`, identical fields), valid-field set, and wording.

<details><summary>Paired functions</summary>

| inline_skill_extractor.go | sub_agent_extractor.go |
|---|---|
| `ValidateInlineSkillsFrontmatter:18` | `ValidateInlineSubAgentsFrontmatter:103` |
| `ValidateInlineSkillsInBody:28` | `ValidateInlineSubAgentsInBody:123` |
| `validateInlineSkillFrontmatterFields:44` | `validateSubAgentFrontmatterFields:143` |
| `GetEngineSkillDir:70` | `GetEngineSubAgentDir:178` |
| `ExtractInlineSkills:91` | `ExtractInlineSubAgents:248` |
| `validateUniqueInlineSkillNames:116` | `validateUniqueSubAgentNames:273` |
| `extractInlineSkill:137` | `extractInlineSubAgent:294` |
| `collectInlineSkillH2Positions:129`+regex`:89` | `collectH2Positions:286`+regex`:236` |

`nextInlineSkillH2After:148`/`nextH2After:305` are byte-identical; both H2 regexes are `(?m)^##[ \t]`.

</details>

*Fix:* one parameterized `extractInlineSections(markdown, spec)` + `validateInlineSections(body, spec)` driven by a small `inlineSectionSpec{kind, sepRegex, validFields, validFieldList}`; collapse `InlineSkill`/`InlineSubAgent`&rarr;`InlineSection` (or thin aliases), and `GetEngine*Dir`&rarr;`engineConfigDir(engineID, subdir)`; keep one `h2HeadingRegex`+`collectH2Positions`.

---

### Priority-Ordered Recommendations

**P1 (high impact, mechanical):** collapse the parser extractor files (&sect;5); swap local reimplementations for `sliceutil`/`stringutil`/`repoutil` (&sect;3a,3c,3d); extract the `parse*Config` scaffold + preprocess wrapper (&sect;1a,1d).
**P2:** relocate the 4 outliers (&sect;2, start with the cross-file-consumed pair); extract `countHumanComments`/`promptAndStoreSecret`/`recordToolCall` (&sect;2a,2b,1c); add `stringutil.ContainsAny` + `[]any&rarr;[]string` adoption (&sect;3e,1b).
**P3:** unify update-check subsystems (&sect;2c); add `typeutil.Lookup[T]` (&sect;4); decide on the `stringutil` re-export wrappers (&sect;3b).

### Checklist
- [ ] Collapse parser inline-skill/sub-agent extractors
- [ ] Adopt `sliceutil`/`stringutil`/`repoutil` utilities at reimpl sites
- [ ] Extract safe-output config-parse scaffold + preprocess wrapper
- [ ] Relocate outlier functions; update imports
- [ ] Extract `countHumanComments`/`promptAndStoreSecret`/`recordToolCall`
- [ ] `go build ./...` + full tests to confirm no behavior change

### Metadata
871 Go files analyzed (excl. `_test.go`) &mdash; `pkg/workflow` 393, `pkg/cli` 309, `pkg/parser` 42, `pkg/console` 26. **22 confirmed findings** (9 dup &middot; 4 outlier &middot; 7 scattered/util &middot; 1 generics &middot; 1 near-clone). Method: parallel semantic finders &rarr; adversarial per-finding verification &rarr; Serena/gopls + naming clustering. Date 2026-06-01.

**References:** [&sect;26728321552](https://github.com/github/gh-aw/actions/runs/26728321552)




> Generated by [&#128295; Semantic Function Refactoring](https://github.com/github/gh-aw/actions/runs/26728321552) &middot; opus48 9.4M &middot; [&#9719;](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fsemantic-function-refactor%22&type=issues)
> - [x] expires  on Jun 3, 2026, 12:17 AM UTC

Function	Now in	Belongs in	Why
`isGroupConcurrencyQueueEnabled`/`parseGroupConcurrencyQueueFeatureValue` `:632/644`	`notify_comment.go`	`concurrency.go`	concurrency-flag logic; already called from `concurrency.go:84` (HIGH)
`getEngineAPIHosts` `:764`	`notify_comment.go`	`engine_api_targets.go`	engine-API hostname table (HIGH)
`toEnvVarCase` `:740`	`notify_comment.go`	`strings.go`	generic string transformer (MED)
`splitShellTokens` `:231`	`gh_cli_permissions.go`	`shell.go`	generic shell tokenizer; single-caller (MED)

inline_skill_extractor.go	sub_agent_extractor.go
`ValidateInlineSkillsFrontmatter:18`	`ValidateInlineSubAgentsFrontmatter:103`
`ValidateInlineSkillsInBody:28`	`ValidateInlineSubAgentsInBody:123`
`validateInlineSkillFrontmatterFields:44`	`validateSubAgentFrontmatterFields:143`
`GetEngineSkillDir:70`	`GetEngineSubAgentDir:178`
`ExtractInlineSkills:91`	`ExtractInlineSubAgents:248`
`validateUniqueInlineSkillNames:116`	`validateUniqueSubAgentNames:273`
`extractInlineSkill:137`	`extractInlineSubAgent:294`
`collectInlineSkillH2Positions:129`+regex`:89`	`collectH2Positions:286`+regex`:236`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor] Semantic function clustering: verified duplicates, outliers, scattered helpers & util reimplementations #36160

🔧 Semantic Function Clustering Analysis

Summary

1. Duplicate / Near-Duplicate Functions

2. Outlier Functions (wrong file)

3. Scattered Helpers & Util Reimplementations

4. Generics Opportunity

5. Near-Clone Files — highest single win

Priority-Ordered Recommendations

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Category	Findings	Top item
Duplicate functions	9	Safe-output `parse*Config` scaffold (~10 handlers)
Outliers (wrong file)	4	Concurrency/engine-API logic in `notify_comment.go`
Scattered / util-reimpl	7	Dedup reimplemented vs `sliceutil`
Generics	1	`typeutil.Lookup[T]`
Near-clone files	1	`inline_skill_extractor.go` ↔ `sub_agent_extractor.go`

Uh oh!

[refactor] Semantic function clustering: verified duplicates, outliers, scattered helpers & util reimplementations #36160

Description

🔧 Semantic Function Clustering Analysis

Summary

1. Duplicate / Near-Duplicate Functions

2. Outlier Functions (wrong file)

3. Scattered Helpers & Util Reimplementations

4. Generics Opportunity

5. Near-Clone Files — highest single win

Priority-Ordered Recommendations

Checklist

Metadata

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions