Skip to content

fix aggregate data step missing some entities#1

Open
Meeran-Tofiq wants to merge 1 commit intogunslingerOP:mainfrom
Meeran-Tofiq:main
Open

fix aggregate data step missing some entities#1
Meeran-Tofiq wants to merge 1 commit intogunslingerOP:mainfrom
Meeran-Tofiq:main

Conversation

@Meeran-Tofiq
Copy link

Change 1 (HIGH IMPACT): Harvest tables from ALL files' touches_data

File: nodes.py, insert after line 4022 (after the for result in results: loop)

Add a new step that collects table references from touches_data of ALL files across ALL components - not just core-kind files. This catches tables from migrations,
schemas, configs, etc.

Change 2: Better dedup that merges operations

File: nodes.py, line 4025

Replace the last-wins dict comprehension with a loop that merges operations and from_component when the same table appears from multiple sources.

Change 3: Expand core_kinds and raise limits

File: nodes.py

  • Line 3430: Add migration, schema, config, seed, factory, type, interface, middleware to core_kinds
  • Line 3443: Raise cap from 25 to 50
  • Line 3470: Raise truncation from 30K to 80K chars (or slim down per-file payload to just path/kind/summary/touches_data)

Change 4: Make LLM cleanup less aggressive

File: nodes.py, in _cleanup_extracted_tables prompt (~line 3584)

Add instruction: "When in doubt, KEEP the table. Only filter entries that are clearly NOT data storage identifiers."

Change 5 (OPTIONAL): Persist data_structures in SummarizeFiles

File: nodes.py, ~line 1867

Currently data_structures is requested from the LLM but silently dropped from the result. Adding it would enable downstream nodes to use it. Requires re-running
SummarizeFiles for existing projects.

Change 1 (HIGH IMPACT): Harvest tables from ALL files' touches_data

 File: nodes.py, insert after line 4022 (after the for result in results: loop)

 Add a new step that collects table references from touches_data of ALL files across ALL components - not just core-kind files. This catches tables from migrations,
 schemas, configs, etc.

 Change 2: Better dedup that merges operations

 File: nodes.py, line 4025

 Replace the last-wins dict comprehension with a loop that merges operations and from_component when the same table appears from multiple sources.

 Change 3: Expand core_kinds and raise limits

 File: nodes.py
 - Line 3430: Add migration, schema, config, seed, factory, type, interface, middleware to core_kinds
 - Line 3443: Raise cap from 25 to 50
 - Line 3470: Raise truncation from 30K to 80K chars (or slim down per-file payload to just path/kind/summary/touches_data)

 Change 4: Make LLM cleanup less aggressive

 File: nodes.py, in _cleanup_extracted_tables prompt (~line 3584)

 Add instruction: "When in doubt, KEEP the table. Only filter entries that are clearly NOT data storage identifiers."

 Change 5 (OPTIONAL): Persist data_structures in SummarizeFiles

 File: nodes.py, ~line 1867

 Currently data_structures is requested from the LLM but silently dropped from the result. Adding it would enable downstream nodes to use it. Requires re-running
 SummarizeFiles for existing projects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant