fix aggregate data step missing some entities by Meeran-Tofiq · Pull Request #1 · gunslingerOP/PocketFlow-Tutorial-Codebase-Knowledge

Meeran-Tofiq · 2026-02-05T12:26:39Z

Change 1 (HIGH IMPACT): Harvest tables from ALL files' touches_data

File: nodes.py, insert after line 4022 (after the for result in results: loop)

Add a new step that collects table references from touches_data of ALL files across ALL components - not just core-kind files. This catches tables from migrations,
schemas, configs, etc.

Change 2: Better dedup that merges operations

File: nodes.py, line 4025

Replace the last-wins dict comprehension with a loop that merges operations and from_component when the same table appears from multiple sources.

Change 3: Expand core_kinds and raise limits

File: nodes.py

Line 3430: Add migration, schema, config, seed, factory, type, interface, middleware to core_kinds
Line 3443: Raise cap from 25 to 50
Line 3470: Raise truncation from 30K to 80K chars (or slim down per-file payload to just path/kind/summary/touches_data)

Change 4: Make LLM cleanup less aggressive

File: nodes.py, in _cleanup_extracted_tables prompt (~line 3584)

Add instruction: "When in doubt, KEEP the table. Only filter entries that are clearly NOT data storage identifiers."

Change 5 (OPTIONAL): Persist data_structures in SummarizeFiles

File: nodes.py, ~line 1867

Currently data_structures is requested from the LLM but silently dropped from the result. Adding it would enable downstream nodes to use it. Requires re-running
SummarizeFiles for existing projects.

Change 1 (HIGH IMPACT): Harvest tables from ALL files' touches_data File: nodes.py, insert after line 4022 (after the for result in results: loop) Add a new step that collects table references from touches_data of ALL files across ALL components - not just core-kind files. This catches tables from migrations, schemas, configs, etc. Change 2: Better dedup that merges operations File: nodes.py, line 4025 Replace the last-wins dict comprehension with a loop that merges operations and from_component when the same table appears from multiple sources. Change 3: Expand core_kinds and raise limits File: nodes.py - Line 3430: Add migration, schema, config, seed, factory, type, interface, middleware to core_kinds - Line 3443: Raise cap from 25 to 50 - Line 3470: Raise truncation from 30K to 80K chars (or slim down per-file payload to just path/kind/summary/touches_data) Change 4: Make LLM cleanup less aggressive File: nodes.py, in _cleanup_extracted_tables prompt (~line 3584) Add instruction: "When in doubt, KEEP the table. Only filter entries that are clearly NOT data storage identifiers." Change 5 (OPTIONAL): Persist data_structures in SummarizeFiles File: nodes.py, ~line 1867 Currently data_structures is requested from the LLM but silently dropped from the result. Adding it would enable downstream nodes to use it. Requires re-running SummarizeFiles for existing projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix aggregate data step missing some entities#1

fix aggregate data step missing some entities#1
Meeran-Tofiq wants to merge 1 commit intogunslingerOP:mainfrom
Meeran-Tofiq:main

Meeran-Tofiq commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Meeran-Tofiq commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant