fix(webapp): Don't add internal logs to materialized view task_events_search_mv_v1e#3069
fix(webapp): Don't add internal logs to materialized view task_events_search_mv_v1e#3069
Conversation
This will prevent internal logs to be added.
|
WalkthroughA ClickHouse migration updates the materialized view Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| -- +goose Up | ||
| -- We drop the existing MV and recreate it with the new filter condition | ||
| DROP VIEW IF EXISTS trigger_dev.task_events_search_mv_v1; | ||
|
|
||
| CREATE MATERIALIZED VIEW IF NOT EXISTS trigger_dev.task_events_search_mv_v1 | ||
| TO trigger_dev.task_events_search_v1 AS | ||
| SELECT | ||
| environment_id, | ||
| organization_id, | ||
| project_id, | ||
| trace_id, | ||
| span_id, | ||
| run_id, | ||
| task_identifier, | ||
| start_time, | ||
| inserted_at, | ||
| message, | ||
| kind, | ||
| status, | ||
| duration, | ||
| parent_span_id, | ||
| attributes_text, | ||
| fromUnixTimestamp64Nano(toUnixTimestamp64Nano(start_time) + toInt64(duration)) AS triggered_timestamp | ||
| FROM trigger_dev.task_events_v2 | ||
| WHERE | ||
| trace_id != '' -- New condition added here | ||
| AND kind != 'DEBUG_EVENT' | ||
| AND status != 'PARTIAL' | ||
| AND NOT (kind = 'SPAN_EVENT' AND attributes_text = '{}') | ||
| AND kind != 'ANCESTOR_OVERRIDE' | ||
| AND message != 'trigger.dev/start'; |
There was a problem hiding this comment.
🚩 MV recreation does not backfill: pre-existing invalid rows remain in target table
The migration drops and recreates the materialized view with the new trace_id != '' filter, but materialized views in ClickHouse only apply to newly inserted data. Any rows already in trigger_dev.task_events_search_v1 with trace_id = '' will remain unless manually deleted. The PR description mentions "deleted the old invalid rows" during testing, but this migration alone does not clean up stale data. If deployed without a separate cleanup step (e.g., ALTER TABLE ... DELETE WHERE trace_id = ''), the existing invalid rows will persist in the search table indefinitely. This is likely already understood by the author based on the PR description, but there's no automated cleanup embedded in the migration itself.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql (1)
1-31: Up migration looks correct — but existing bad rows in the target table are not cleaned up.The recreated MV will only filter new inserts. Rows with
trace_id = ''already materialized intotask_events_search_v1will persist. The PR description mentions manual deletion, but consider adding an explicitALTER TABLE ... DELETE(orDELETE FROM) in the up migration so this is reproducible across environments.Suggested addition after the CREATE (line 31)
+-- Clean up previously materialized rows with empty trace_id +ALTER TABLE trigger_dev.task_events_search_v1 DELETE WHERE trace_id = '';
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql
🧰 Additional context used
📓 Path-based instructions (1)
internal-packages/clickhouse/schema/**/*.sql
📄 CodeRabbit inference engine (CLAUDE.md)
internal-packages/clickhouse/schema/**/*.sql: ClickHouse migrations must use Goose format with-- +goose Upand-- +goose Downmarkers
Follow ClickHouse naming conventions:raw_prefix for input tables,_v1,_v2suffixes for versioning,_mv_v1suffix for materialized views
Files:
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/database/prisma/migrations/**/*.sql : When editing the Prisma schema, remove extraneous migration lines related to specific tables: `_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`, `SecretStore_key_idx`, and unrelated `TaskRun` indexes
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : Follow ClickHouse naming conventions: `raw_` prefix for input tables, `_v1`, `_v2` suffixes for versioning, `_mv_v1` suffix for materialized views
Applied to files:
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/database/prisma/migrations/**/*.sql : When editing the Prisma schema, remove extraneous migration lines related to specific tables: `_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`, `SecretStore_key_idx`, and unrelated `TaskRun` indexes
Applied to files:
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql
📚 Learning: 2026-01-15T11:50:06.067Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : ClickHouse migrations must use Goose format with `-- +goose Up` and `-- +goose Down` markers
Applied to files:
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: sdk-compat / Deno Runtime
- GitHub Check: typecheck / typecheck
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
🔇 Additional comments (1)
internal-packages/clickhouse/schema/017_update_materialized_conditions_to_task_events_search_v1.sql (1)
33-62: Down migration correctly reverts the change.The column list and remaining WHERE conditions are identical to the up migration minus the
trace_id != ''filter. Looks good.
| // This should be removed once we clear the old inserts, 30 DAYS, the materialized view excludes events without trace_id) | ||
| queryBuilder.where("trace_id != ''", { | ||
| environmentId, | ||
| }); |
There was a problem hiding this comment.
🚩 Temporary query-side filter will need cleanup after 30-day retention window
The comment on line 239 states the trace_id != '' query-side filter should be removed once old inserts age out (30 days). Since the materialized view now excludes empty trace_id rows, only pre-existing rows in the target table need this filter. There's no tracking mechanism (e.g., a TODO issue link) to ensure this cleanup happens. Consider linking to a tracking issue to avoid leaving dead code.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts (1)
239-242: Unnecessary parameter passed to parameterless WHERE clause.The clause
trace_id != ''contains no parameterized placeholder, yet{ environmentId }is passed as the params argument. This is dead/misleading —environmentIdis already correctly bound on Line 244. Remove the unused params object to avoid confusion.Suggested fix
// This should be removed once we clear the old inserts, 30 DAYS, the materialized view excludes events without trace_id) - queryBuilder.where("trace_id != ''", { - environmentId, - }); + queryBuilder.where("trace_id != ''");
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead
**/*.{ts,tsx}: Always import tasks from@trigger.dev/sdk, never use@trigger.dev/sdk/v3or deprecatedclient.defineJobpattern
Every Trigger.dev task must be exported and have a uniqueidproperty with no timeouts in the run function
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use zod for validation in packages/core and apps/webapp
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Use function declarations instead of default exports
Import from
@trigger.dev/coreusing subpaths only, never import from root
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
apps/webapp/app/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
Access all environment variables through the
envexport ofenv.server.tsinstead of directly accessingprocess.envin the Trigger.dev webapp
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
apps/webapp/**/*.{ts,tsx}
📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)
apps/webapp/**/*.{ts,tsx}: When importing from@trigger.dev/corein the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webappAccess environment variables via
envexport fromapps/webapp/app/env.server.ts, never useprocess.envdirectly
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
**/*.ts
📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)
**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier before committing
Files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/database/prisma/migrations/**/*.sql : When editing the Prisma schema, remove extraneous migration lines related to specific tables: `_BackgroundWorkerToBackgroundWorkerFile`, `_BackgroundWorkerToTaskQueue`, `_TaskRunToTaskRunTag`, `_WaitpointRunConnections`, `_completedWaitpoints`, `SecretStore_key_idx`, and unrelated `TaskRun` indexes
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : ClickHouse migrations must use Goose format with `-- +goose Up` and `-- +goose Down` markers
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-15T11:50:06.067Z
Learning: Applies to internal-packages/clickhouse/schema/**/*.sql : Follow ClickHouse naming conventions: `raw_` prefix for input tables, `_v1`, `_v2` suffixes for versioning, `_mv_v1` suffix for materialized views
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2175
File: apps/webapp/app/services/environmentMetricsRepository.server.ts:202-207
Timestamp: 2025-06-14T08:07:46.625Z
Learning: In apps/webapp/app/services/environmentMetricsRepository.server.ts, the ClickHouse methods (getTaskActivity, getCurrentRunningStats, getAverageDurations) intentionally do not filter by the `tasks` parameter at the ClickHouse level, even though the tasks parameter is accepted by the public methods. This is done on purpose as there is not much benefit from adding that filtering at the ClickHouse layer.
📚 Learning: 2025-06-14T08:07:46.625Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 2175
File: apps/webapp/app/services/environmentMetricsRepository.server.ts:202-207
Timestamp: 2025-06-14T08:07:46.625Z
Learning: In apps/webapp/app/services/environmentMetricsRepository.server.ts, the ClickHouse methods (getTaskActivity, getCurrentRunningStats, getAverageDurations) intentionally do not filter by the `tasks` parameter at the ClickHouse level, even though the tasks parameter is accepted by the public methods. This is done on purpose as there is not much benefit from adding that filtering at the ClickHouse layer.
Applied to files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.
Applied to files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
📚 Learning: 2026-02-06T19:53:38.843Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 2994
File: apps/webapp/app/presenters/v3/DeploymentListPresenter.server.ts:233-237
Timestamp: 2026-02-06T19:53:38.843Z
Learning: When constructing Vercel dashboard URLs from deployment IDs, always strip the dpl_ prefix from the ID. Implement this by transforming the ID with .replace(/^dpl_/, "") before concatenating into the URL: https://vercel.com/${teamSlug}/${projectName}/${cleanedDeploymentId}. Consider centralizing this logic in a small helper (e.g., getVercelDeploymentId(id) or a URL builder) and add tests to verify both prefixed and non-prefixed inputs.
Applied to files:
apps/webapp/app/presenters/v3/LogsListPresenter.server.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
- GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
- GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
- GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
- GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
- GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
- GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
- GitHub Check: sdk-compat / Cloudflare Workers
- GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
- GitHub Check: sdk-compat / Bun Runtime
- GitHub Check: typecheck / typecheck
- GitHub Check: sdk-compat / Deno Runtime
This will prevent internal logs to be added to the task_events_search_table
Closes #
✅ Checklist
Testing
Ran the migration, deleted the old invalid rows and ran new tasks.
The undesired logs are not added to the table.
Changelog
Updated the MATERIALIZED VIEW to also filter for
trace_id != ''