This folder provides the paper-aligned live-style Docker workflow for comparative evaluation of Rilot routing policies.
docker-compose.live.yml: starts Rilot, ten high-consumption zone simulators, and Prometheus.config.live.json: 10-zone live-style config with per-zone share cap (max_request_share_percent).prometheus.yml: scrape config for/metrics.scripts/run_live_experiment.sh: primary comparative runner (10 zones + local dynamic carbon API) that writes toresult_live/.scripts/run_experiment.sh: compatibility wrapper that delegates torun_live_experiment.sh.scripts/run_comparative_evaluation.py: request-level and summary report generator.scripts/carbon-signal-api.js: local ElectricityMap-compatible dynamic API for reproducible live-style runs.carbon-traces/us-grid-sample.csv: sample trace format.carbon-traces/electricitymap-latest-sample.json: ElectricityMap-style local fixture.
./scripts/run_live_experiment.shOutputs are written to ./result_live/comparative-live by default.
Both scripts generate charts.html automatically in that folder.
Generated output includes:
- per-request latency CSV
- per-mode Prometheus snapshots
- summary CSV/JSON/Markdown tables for paper-ready comparison
- baseline-relative trade-off metrics (exposure/CO2e savings, latency delta, error rate, CPU sample delta, memory sample delta)
requests.csv now includes explainability fields:
request_regionandselected_regioncross_region_reroute(true/false)selected_carbon_intensity_g_per_kwhzone_filter_reasons(per-zone eligibility/constraint reason, e.g.added-latency>50,share-cap,eligible)carbon_saved_vs_worst_g_per_kwhdecision_reasondecision_reason_brief
The comparative summary now also reports reroute counts per mode:
cross_region_rerouteseast_to_west_rerouteswest_to_east_reroutes
Resource-overhead fields are also included per scenario:
cpu_percent_sample,cpu_sample_method,cpu_delta_percent_vs_baselinememory_mb_sample,memory_delta_mb_vs_baseline
Run a longer-duration case study (same setup, larger workload):
REQUESTS_PER_REGION=1000 ./scripts/run_live_experiment.shRun 10-zone live-style study with high-consuming workloads:
./scripts/run_live_experiment.shThis uses:
docker-compose.live.ymlconfig.live.jsonas base, rewritten intoconfig.live.dynamic.jsonfor run-time overrides- local dynamic ElectricityMap-compatible API (
scripts/carbon-signal-api.js) - route
"/heavy?burn_ms=40"by default - no carbon cache (
carbon.cache_ttl_seconds=0) - random carbon intensities in
[100, 700]by default - output directory
result_live/comparative-live(stable path) - cross-region RTT emulation enabled by default (
RILOT_EMULATE_CROSS_REGION_RTT=true)
Useful overrides:
REQUESTS_PER_REGION=500 ./scripts/run_live_experiment.sh
RILOT_EMULATE_CROSS_REGION_RTT=false ./scripts/run_live_experiment.sh
CARBON_API_MIN_G=150 CARBON_API_MAX_G=650 ./scripts/run_live_experiment.sh# Live ElectricityMap mode (requires API key)
CARBON_PROVIDER_OVERRIDE=electricitymap \
ELECTRICITYMAP_API_KEY_OVERRIDE=<your_api_key> \
REQUESTS_PER_REGION=1000 \
./scripts/run_live_experiment.shEnable/disable timeout robustness scenario:
ENABLE_FAILURE_SCENARIO=1 ./scripts/run_live_experiment.sh
ENABLE_FAILURE_SCENARIO=0 ./scripts/run_live_experiment.shRun weight sensitivity analysis:
python3 ./scripts/run_weight_sensitivity.pyGenerate an interactive chart dashboard from the latest comparative run:
node ./scripts/charts.jsOptional:
# Use a specific run folder
node ./scripts/charts.js --input-dir ./result_live/comparative-live
# Use live result base
node ./scripts/charts.js --results-base ./result_liveThis writes charts.html into the selected comparative result folder.
- Carbon-aware modes can reduce carbon-intensity exposure while keeping latency stable; in many runs the gain is modest (for example, ~1-2%) when regional carbon values are close.
latency_firsttypically minimizes response time at the cost of higher carbon exposure, which is why multi-objective modes (balanced,carbon_first) are included.- If CPU columns are
0.0, host CPU sampling was not captured for that run; avoid making compute-overhead claims from that dataset. - If memory columns are empty, memory sampling was not captured for that run.
- To increase signal separation, run longer workloads and/or use traces with wider regional carbon spread (high-carbon vs low-carbon regions).
Run these commands and include the generated folders in your supplementary package:
cd research-kit
ENABLE_FAILURE_SCENARIO=1 REQUESTS_PER_REGION=1000 ./scripts/run_live_experiment.shExpected outputs:
result_live/comparative-live/summary.{md,csv,json}result_live/comparative-live/requests.csvresult_live/comparative-live/metrics-*.promresult_live/comparative-live/charts.html
Failure/operational evidence is captured by scenario carbon_first_provider_timeout in summary.*.
Use this row to demonstrate timeout/fallback behavior and service stability under degraded carbon-signal conditions.
Default comparative scenario order in summary.*:
carbon_firstbalancedlatency_firstcarbon_first_provider_timeout(whenENABLE_FAILURE_SCENARIO=1)explicit_cross_region_to_green(when fixture has a clear greener region)baseline_no_carbon_strict_localbaseline_no_carbon_latency_firstbaseline_no_carbon_balanced
Fairness/user-impact evidence is captured in reroute columns:
cross_region_rerouteseast_to_west_rerouteswest_to_east_reroutes
Use these with latency/error metrics to report trade-offs and justify policy guardrails.
Fairness/locality tuning knobs (in config.live.json policy):
- Reduce
w_carbonand increasew_latencyfor user-facing routes. - Set tighter
constraints.max_added_latency_msandconstraints.p95_latency_budget_ms. - Set
constraints.max_request_share_percentto cap per-zone request concentration (for example20). - Use
route_class=strict-localfor critical locality-sensitive routes. - Limit migration scope via
constraints.zone_allowlistand zonetags.
docs/research-toolkit.mddocs/runtime-behavior.mddocs/config-reference.md