Add extensions framework with entity management and graph sync#548
Add extensions framework with entity management and graph sync#548
Conversation
- Introduced support for ledger materialization jobs in the Dagster definitions. - Added installation and loading of the postgres_scanner in the DuckDB connection pool. - Updated the RoboLedger operations docstring to reflect new materialization features. - Adjusted test assertions to expect 16 accounts instead of 15, ensuring accuracy in transaction validation.
…ledger-graph-sync
- Added the "Entity" table to the materialization order to support ENTITY_HAS_TRANSACTION relationships. - Updated the connection string in `build_postgres_connstr` to clarify its usage context for DuckDB's postgres_scanner. - Modified SQL queries in `_staging_sql` to include the tenant schema in `postgres_scan` calls, ensuring accurate data retrieval for various tables.
- Introduced a new method to update entity information from the CompanyInfo connector, enhancing the entity management workflow. - Simplified the entity creation process in the EntityGraphService by removing the controlled ingestion pattern and directly inserting into the OLTP database. - Updated SQL materialization logic to ensure accurate retrieval of entity data, including additional fields for better data integrity. - Enhanced unit tests to reflect changes in entity creation and update flows, ensuring robust testing of the new functionality.
- Included the entity router in the main ledger router to enhance modularity and support for entity-related operations. - This change improves the organization of the API routes within the ledger module.
- Updated environment configuration to replace `ROBOLEDGER_DATABASE_URL` with `EXTENSIONS_DATABASE_URL` for better clarity and modularity. - Modified database initialization scripts to create the `extensions` database and its test counterpart. - Adjusted migration scripts and entry points to support the new extensions database structure. - Enhanced the Dagster job definitions to include materialization for extensions data, ensuring compatibility with the new database schema. - Introduced new models and API responses for extensions, facilitating integration with existing systems.
…sed on external source
- Modified the `_staging_sql` function to generate the `qname` field with dynamic prefixes ('qb:', 'xero:', 'plaid:', or 'rl:') based on the `external_source` value, enhancing data representation in the materialization process.
…sion - Revised the description of the RoboLedger extension to specify that it utilizes OLTP tables in the `extensions` database with a schema-per-tenant approach, enhancing clarity on the database architecture.
- Renamed and updated the materialization job and configuration classes from `Ledger` to `Extensions` to reflect the new focus on extensions data. - Modified the materialization process to support extensions data from the OLTP database, including changes to the API for triggering materialization. - Enhanced the `MaterializeRequest` model to include a `source` field for specifying the data source during materialization. - Updated logging messages and job tags to align with the new extensions functionality.
- Updated the `MaterializeRequest` model to allow `source` to be optional, with auto-detection based on graph type. - Added validation to ensure the correct data source is used for entity and generic graphs during materialization. - Modified the file upload logic to prevent uploads for entity graphs, directing users to use the materialization endpoint instead. - Improved unit tests to reflect changes in graph type handling and validation logic.
… deprecation. The accounting demo is currently disabled due to changes in the entity graph handling, with a note to re-enable it when relevant features are ready.
…eline and related models - Changed the database references in the QuickBooks pipeline to reflect the use of the extensions PostgreSQL database instead of the roboledger database. - Updated descriptions and comments in the codebase to ensure consistency with the new database structure. - Adjusted model documentation to clarify that entity details and fiscal period controls now pertain to the extensions database.
- Updated the README to provide a clearer description of the RoboSystems platform, emphasizing its capabilities in connecting data sources and building domain ontologies. - Introduced new unit tests for the materialization pipeline, covering various scenarios including connection handling, table generation, and source validation for entity and generic graphs. - Added tests for the materialize endpoint to ensure correct source inference and validation based on graph types, enhancing the robustness of the API.
|
@claude please review this PR |
|
Claude finished @jfrench9's task in 2m 24s —— View job PR Review: Add extensions framework with entity management and graph syncTasks:
Overall this is a solid, well-structured PR. The architecture is clean — the SecurityCredentials in SQL strings sent to Graph API —
# materialize.py:95
connstr = f"dbname={dbname} user={user} password={password} host={host} port={port}"
# ...then at :145:
FROM postgres_scan('{c}', '{s}', 'entities') # c = connstr with passwordThis is an accepted pattern in postgres_scanner usage, but worth auditing whether the Graph API logs incoming SQL queries in staging/prod. If it does, consider filtering or truncating query logs for Bugs
When a non-dimension relationship table fails to stage, the error is appended to # materialize.py:507-514
except Exception as e:
error_msg = f"Failed to stage {table_name}: {e!s}"
logger.warning(error_msg)
if table_name in RELATIONSHIP_TABLES and "DIMENSION" in table_name:
logger.info(f"Skipping {table_name} (no dimension data)")
else:
result.errors.append(error_msg)
# ← result.status never set to "error" hereDimension junction tables are never populated but always queried — The
Both
The function signature accepts Code QualityFeature flag naming mismatch — The renamed module is # extensions.py:64
if not env.LEDGER_ENABLED:
raise RuntimeError(
"Extensions database access attempted but LEDGER_ENABLED is false. ..."
)The env var name and the error message both say Tests in
# test_materialize_source.py:48-52
# Simulate the inference logic from the endpoint
if req.source is None:
req.source = "extensions" if graph.graph_type == "entity" else "staged"
assert req.source == "extensions"
# ← Tests the test's own code, not the production endpointAccount deletion scoped to source type, not connection — The existing Minor Notes
Summary: The core architecture is sound. The two issues most worth fixing before merge are the silent success status on staging failures (which could produce corrupted graphs with no visible error), and the non-deterministic entity fetch without ordering/filtering. |
- Updated the `_get_engine` function to provide clearer instructions for enabling the extensions OLTP database. - Refactored the `build_postgres_connstr` function to remove the unnecessary `graph_id` parameter, simplifying connection string construction. - Adjusted entity retrieval queries in the ledger router to filter for parent entities, improving data accuracy. - Enhanced unit tests to reflect changes in connection string handling and entity query logic, ensuring comprehensive coverage.
|
@claude have we fixed all issues? |
|
Claude finished @jfrench9's task in 1m 38s —— View job Here's the status of every issue from the previous review: Issue Status
Details✅ Bug fixes confirmed
❌ Tests still inline production logic (
# test_materialize_source.py:49-52
if req.source is None:
req.source = "extensions" if graph.graph_type == "entity" else "staged"
assert req.source == "extensions"This tests the test's own code. If the inference logic in ❌ Account deletion scoped to source type ( The ℹ️ Credentials in SQL strings — Summary: 3 of 5 actionable bugs are fixed. The two remaining items ( |
- Added the `resolve_materialization_source` function to infer and validate the materialization source based on the graph type, enhancing clarity and maintainability. - Refactored the materialization logic to utilize the new function, streamlining source inference and validation for entity and generic graphs. - Updated unit tests to cover various scenarios for source resolution, ensuring robust validation and error handling for incompatible source and graph type combinations.
- Added `EXTENSIONS_ENABLED` feature flag to the environment configuration, allowing control over the extensions OLTP database and related functionalities. - Updated the README to clarify the role of the extensions in the RoboSystems platform, including the removal of the `LEDGER_ENABLED` references in favor of a more modular approach. - Refactored the entrypoint and database access logic to utilize the new `EXTENSIONS_ENABLED` flag, ensuring proper handling of extensions migrations and database access. - Enhanced comments and documentation throughout the codebase to reflect the changes in feature flagging and database structure.
|
@claude can you review the latest changes. We have implemented a new EXTENSIONS_ENABLED to properly gate the extensions database creation |
|
Claude finished @jfrench9's task in 1m 8s —— View job PR Review: Latest Changes (EXTENSIONS_ENABLED)Tasks:
EXTENSIONS_ENABLED — Correctly Implemented ✅The naming issue from the prior review is fully resolved. The implementation is clean:
# Controls the extensions PostgreSQL database (engine, migrations, Dagster jobs).
# Required for any extension module (RoboLedger, RoboInvestor, etc.).
EXTENSIONS_ENABLED = get_bool_env("EXTENSIONS_ENABLED", ...)
# Controls the /v1/ledger/* API routes. Requires EXTENSIONS_ENABLED=true.
LEDGER_ENABLED = get_bool_env("LEDGER_ENABLED", ...)
The two-flag design is well-thought-out: Prior Review — Updated Issue Tracker
Issue 6 is now fixed: The One Outstanding IssueAccount deletion scoped to source type ( Summary: All 4 bugs flagged in the prior review are fixed, the |
Summary
Introduces a generalized extensions framework that replaces the previous
ledger-scoped module structure, enabling support for multiple extension types (starting with RoboLedger). This PR also adds a new entity management system for the ledger API and enhances the graph materialization pipeline to support dynamic synchronization between the entity graph and extension data.Key Changes
Architecture:
ledger→extensionsRefactorledgermodule hierarchy toextensions, with RoboLedger becoming a sub-module underextensions/roboledger/ledgertoextensionsnamespacingroboledgertoextensionsdb/ledger.py→db/extensions.py) and extended it to support the new entity tablesEntity Management
EntitySQLAlchemy model and corresponding API models for CRUD operations/ledger/entityrouter with full entity creation and update endpoints1eec7fa2156e) for theentitiestableGraph Materialization & Sync
robosystems/operations/extensions/materialize.py(540+ lines) — a comprehensive materialization engine that syncs graph data into extension tablesentity_graph_service.pyto simplify and streamline entity creation/update processesDagster Integration
robosystems/dagster/jobs/extensions.pywith job definitions for extension materialization workflowsInfrastructure
Breaking Changes
robosystems.models.ledger,robosystems.db.ledger,robosystems.operations.ledger, androbosystems.models.api.ledgermust be updated to theirextensionsequivalentsmigrations/ledger.ini→migrations/extensions.ini; existing deployments will need migration path updates.env.exampleupdates)Testing
tests/operations/extensions/test_materialize.py)tests/routers/ledger/test_entity.py)tests/routers/graphs/test_materialize_source.py)Infrastructure Considerations
🤖 Generated with Claude Code
Branch Info:
feature/ledger-graph-syncmainCo-Authored-By: Claude [email protected]