Skip to content

Commit 2746f4a

Browse files
committed
gh-138122: Add exception profiling mode to the sampling profiler
When profiling code it's often valuable to understand what code paths are executing while exceptions are being handled. The new --mode=exception option filters samples to only include threads that have an active exception, making it easy to identify hot spots in error handling code without noise from normal execution paths. This works by checking both the thread's current_exception (for exceptions being raised/propagated) and exc_info (for exceptions being handled in except blocks). The feature integrates with the existing profiler infrastructure including the live TUI, flamegraph output, and gecko format for Firefox Profiler visualization.
1 parent 5b19c75 commit 2746f4a

File tree

17 files changed

+805
-17
lines changed

17 files changed

+805
-17
lines changed

Doc/library/profiling.sampling.rst

Lines changed: 73 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -470,9 +470,10 @@ which you can use to judge whether the data is sufficient for your analysis.
470470
Profiling modes
471471
===============
472472

473-
The sampling profiler supports three modes that control which samples are
473+
The sampling profiler supports four modes that control which samples are
474474
recorded. The mode determines what the profile measures: total elapsed time,
475-
CPU execution time, or time spent holding the global interpreter lock.
475+
CPU execution time, time spent holding the global interpreter lock, or
476+
exception handling.
476477

477478

478479
Wall-clock mode
@@ -553,6 +554,73 @@ single-threaded programs to distinguish Python execution time from time spent
553554
in C extensions or I/O.
554555

555556

557+
Exception mode
558+
--------------
559+
560+
Exception mode (``--mode=exception``) records samples only when a thread has
561+
an active exception::
562+
563+
python -m profiling.sampling run --mode=exception script.py
564+
565+
Samples are recorded in two situations: when an exception is being propagated
566+
up the call stack (after ``raise`` but before being caught), or when code is
567+
executing inside an ``except`` block where exception information is still
568+
present in the thread state.
569+
570+
The profiler detects exceptions by checking two internal thread state fields:
571+
the ``current_exception`` field (set when an exception is actively propagating)
572+
and the ``exc_info`` field (set when inside an ``except`` block). When either
573+
field contains an exception object, the thread is considered to have an active
574+
exception.
575+
576+
The following example illustrates which code regions are captured:
577+
578+
.. code-block:: python
579+
580+
def example():
581+
try:
582+
raise ValueError("error") # Captured: exception being raised
583+
except ValueError:
584+
process_error() # Captured: inside except block
585+
finally:
586+
cleanup() # NOT captured: exception already handled
587+
588+
def example_propagating():
589+
try:
590+
try:
591+
raise ValueError("error")
592+
finally:
593+
cleanup() # Captured: exception propagating through
594+
except ValueError:
595+
pass
596+
597+
def example_no_exception():
598+
try:
599+
do_work()
600+
finally:
601+
cleanup() # NOT captured: no exception involved
602+
603+
Note that ``finally`` blocks are only captured when an exception is actively
604+
propagating through them. Once an ``except`` block finishes executing, Python
605+
clears the exception information before running any subsequent ``finally``
606+
block. Similarly, ``finally`` blocks that run during normal execution (when no
607+
exception was raised) are not captured because no exception state is present.
608+
609+
This mode is useful for understanding where your program spends time handling
610+
errors. Exception handling can be a significant source of overhead in code
611+
that uses exceptions for flow control (such as ``StopIteration`` in iterators)
612+
or in applications that process many error conditions (such as network servers
613+
handling connection failures).
614+
615+
Exception mode helps answer questions like "how much time is spent handling
616+
exceptions?" and "which exception handlers are the most expensive?" It can
617+
reveal hidden performance costs in code that catches and processes many
618+
exceptions, even when those exceptions are handled gracefully. For example,
619+
if a parsing library uses exceptions internally to signal format errors, this
620+
mode will capture time spent in those handlers even if the calling code never
621+
sees the exceptions.
622+
623+
556624
Output formats
557625
==============
558626

@@ -1006,8 +1074,9 @@ Mode options
10061074

10071075
.. option:: --mode <mode>
10081076

1009-
Sampling mode: ``wall`` (default), ``cpu``, or ``gil``.
1010-
The ``cpu`` and ``gil`` modes are incompatible with ``--async-aware``.
1077+
Sampling mode: ``wall`` (default), ``cpu``, ``gil``, or ``exception``.
1078+
The ``cpu``, ``gil``, and ``exception`` modes are incompatible with
1079+
``--async-aware``.
10111080

10121081
.. option:: --async-mode <mode>
10131082

Include/internal/pycore_debug_offsets.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,18 @@ typedef struct _Py_DebugOffsets {
110110
uint64_t status;
111111
uint64_t holds_gil;
112112
uint64_t gil_requested;
113+
uint64_t current_exception;
114+
uint64_t exc_info;
115+
uint64_t exc_state;
113116
} thread_state;
114117

118+
// Exception stack item offset;
119+
struct {
120+
uint64_t size;
121+
uint64_t exc_value;
122+
uint64_t previous_item;
123+
} err_stackitem;
124+
115125
// InterpreterFrame offset;
116126
struct _interpreter_frame {
117127
uint64_t size;
@@ -282,6 +292,14 @@ typedef struct _Py_DebugOffsets {
282292
.status = offsetof(PyThreadState, _status), \
283293
.holds_gil = offsetof(PyThreadState, holds_gil), \
284294
.gil_requested = offsetof(PyThreadState, gil_requested), \
295+
.current_exception = offsetof(PyThreadState, current_exception), \
296+
.exc_info = offsetof(PyThreadState, exc_info), \
297+
.exc_state = offsetof(PyThreadState, exc_state), \
298+
}, \
299+
.err_stackitem = { \
300+
.size = sizeof(_PyErr_StackItem), \
301+
.exc_value = offsetof(_PyErr_StackItem, exc_value), \
302+
.previous_item = offsetof(_PyErr_StackItem, previous_item), \
285303
}, \
286304
.interpreter_frame = { \
287305
.size = sizeof(_PyInterpreterFrame), \

Lib/profiling/sampling/_flamegraph_assets/flamegraph.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -717,6 +717,10 @@ function populateThreadStats(data, selectedThreadId = null) {
717717

718718
const gcPctElem = document.getElementById('gc-pct');
719719
if (gcPctElem) gcPctElem.textContent = `${(threadStats.gc_pct || 0).toFixed(1)}%`;
720+
721+
// Exception stats
722+
const excPctElem = document.getElementById('exc-pct');
723+
if (excPctElem) excPctElem.textContent = `${(threadStats.has_exception_pct || 0).toFixed(1)}%`;
720724
}
721725

722726
// ============================================================================

Lib/profiling/sampling/_flamegraph_assets/flamegraph_template.html

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,10 @@ <h3 class="section-title">Runtime Stats</h3>
161161
<div class="stat-tile-value" id="gc-pct">--</div>
162162
<div class="stat-tile-label">GC</div>
163163
</div>
164+
<div class="stat-tile stat-tile--red" id="exc-stat">
165+
<div class="stat-tile-value" id="exc-pct">--</div>
166+
<div class="stat-tile-label">Exception</div>
167+
</div>
164168
</div>
165169
</div>
166170
</section>

Lib/profiling/sampling/cli.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
PROFILING_MODE_WALL,
1717
PROFILING_MODE_CPU,
1818
PROFILING_MODE_GIL,
19+
PROFILING_MODE_EXCEPTION,
1920
SORT_MODE_NSAMPLES,
2021
SORT_MODE_TOTTIME,
2122
SORT_MODE_CUMTIME,
@@ -90,6 +91,7 @@ def _parse_mode(mode_string):
9091
"wall": PROFILING_MODE_WALL,
9192
"cpu": PROFILING_MODE_CPU,
9293
"gil": PROFILING_MODE_GIL,
94+
"exception": PROFILING_MODE_EXCEPTION,
9395
}
9496
return mode_map[mode_string]
9597

@@ -213,10 +215,12 @@ def _add_mode_options(parser):
213215
mode_group = parser.add_argument_group("Mode options")
214216
mode_group.add_argument(
215217
"--mode",
216-
choices=["wall", "cpu", "gil"],
218+
choices=["wall", "cpu", "gil", "exception"],
217219
default="wall",
218220
help="Sampling mode: wall (all samples), cpu (only samples when thread is on CPU), "
219-
"gil (only samples when thread holds the GIL). Incompatible with --async-aware",
221+
"gil (only samples when thread holds the GIL), "
222+
"exception (only samples when thread has an active exception). "
223+
"Incompatible with --async-aware",
220224
)
221225
mode_group.add_argument(
222226
"--async-mode",

Lib/profiling/sampling/collector.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
THREAD_STATUS_ON_CPU,
66
THREAD_STATUS_GIL_REQUESTED,
77
THREAD_STATUS_UNKNOWN,
8+
THREAD_STATUS_HAS_EXCEPTION,
89
)
910

1011
try:
@@ -170,7 +171,7 @@ def _collect_thread_status_stats(self, stack_frames):
170171
171172
Returns:
172173
tuple: (aggregate_status_counts, has_gc_frame, per_thread_stats)
173-
- aggregate_status_counts: dict with has_gil, on_cpu, etc.
174+
- aggregate_status_counts: dict with has_gil, on_cpu, has_exception, etc.
174175
- has_gc_frame: bool indicating if any thread has GC frames
175176
- per_thread_stats: dict mapping thread_id to per-thread counts
176177
"""
@@ -179,6 +180,7 @@ def _collect_thread_status_stats(self, stack_frames):
179180
"on_cpu": 0,
180181
"gil_requested": 0,
181182
"unknown": 0,
183+
"has_exception": 0,
182184
"total": 0,
183185
}
184186
has_gc_frame = False
@@ -200,6 +202,8 @@ def _collect_thread_status_stats(self, stack_frames):
200202
status_counts["gil_requested"] += 1
201203
if status_flags & THREAD_STATUS_UNKNOWN:
202204
status_counts["unknown"] += 1
205+
if status_flags & THREAD_STATUS_HAS_EXCEPTION:
206+
status_counts["has_exception"] += 1
203207

204208
# Track per-thread statistics
205209
thread_id = getattr(thread_info, "thread_id", None)
@@ -210,6 +214,7 @@ def _collect_thread_status_stats(self, stack_frames):
210214
"on_cpu": 0,
211215
"gil_requested": 0,
212216
"unknown": 0,
217+
"has_exception": 0,
213218
"total": 0,
214219
"gc_samples": 0,
215220
}
@@ -225,6 +230,8 @@ def _collect_thread_status_stats(self, stack_frames):
225230
thread_stats["gil_requested"] += 1
226231
if status_flags & THREAD_STATUS_UNKNOWN:
227232
thread_stats["unknown"] += 1
233+
if status_flags & THREAD_STATUS_HAS_EXCEPTION:
234+
thread_stats["has_exception"] += 1
228235

229236
# Check for GC frames in this thread
230237
frames = getattr(thread_info, "frame_info", None)

Lib/profiling/sampling/constants.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
PROFILING_MODE_CPU = 1
66
PROFILING_MODE_GIL = 2
77
PROFILING_MODE_ALL = 3 # Combines GIL + CPU checks
8+
PROFILING_MODE_EXCEPTION = 4 # Only samples when thread has an active exception
89

910
# Sort mode constants
1011
SORT_MODE_NSAMPLES = 0
@@ -25,10 +26,12 @@
2526
THREAD_STATUS_ON_CPU,
2627
THREAD_STATUS_UNKNOWN,
2728
THREAD_STATUS_GIL_REQUESTED,
29+
THREAD_STATUS_HAS_EXCEPTION,
2830
)
2931
except ImportError:
3032
# Fallback for tests or when module is not available
3133
THREAD_STATUS_HAS_GIL = (1 << 0)
3234
THREAD_STATUS_ON_CPU = (1 << 1)
3335
THREAD_STATUS_UNKNOWN = (1 << 2)
3436
THREAD_STATUS_GIL_REQUESTED = (1 << 3)
37+
THREAD_STATUS_HAS_EXCEPTION = (1 << 4)

Lib/profiling/sampling/gecko_collector.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,14 @@
99
from .collector import Collector
1010
from .opcode_utils import get_opcode_info, format_opcode
1111
try:
12-
from _remote_debugging import THREAD_STATUS_HAS_GIL, THREAD_STATUS_ON_CPU, THREAD_STATUS_UNKNOWN, THREAD_STATUS_GIL_REQUESTED
12+
from _remote_debugging import THREAD_STATUS_HAS_GIL, THREAD_STATUS_ON_CPU, THREAD_STATUS_UNKNOWN, THREAD_STATUS_GIL_REQUESTED, THREAD_STATUS_HAS_EXCEPTION
1313
except ImportError:
1414
# Fallback if module not available (shouldn't happen in normal use)
1515
THREAD_STATUS_HAS_GIL = (1 << 0)
1616
THREAD_STATUS_ON_CPU = (1 << 1)
1717
THREAD_STATUS_UNKNOWN = (1 << 2)
1818
THREAD_STATUS_GIL_REQUESTED = (1 << 3)
19+
THREAD_STATUS_HAS_EXCEPTION = (1 << 4)
1920

2021

2122
# Categories matching Firefox Profiler expectations
@@ -28,6 +29,7 @@
2829
{"name": "CPU", "color": "purple", "subcategories": ["Other"]},
2930
{"name": "Code Type", "color": "red", "subcategories": ["Other"]},
3031
{"name": "Opcodes", "color": "magenta", "subcategories": ["Other"]},
32+
{"name": "Exception", "color": "lightblue", "subcategories": ["Other"]},
3133
]
3234

3335
# Category indices
@@ -39,6 +41,7 @@
3941
CATEGORY_CPU = 5
4042
CATEGORY_CODE_TYPE = 6
4143
CATEGORY_OPCODES = 7
44+
CATEGORY_EXCEPTION = 8
4245

4346
# Subcategory indices
4447
DEFAULT_SUBCATEGORY = 0
@@ -88,6 +91,8 @@ def __init__(self, sample_interval_usec, *, skip_idle=False, opcodes=False):
8891
self.python_code_start = {} # Thread running Python code (has GIL)
8992
self.native_code_start = {} # Thread running native code (on CPU without GIL)
9093
self.gil_wait_start = {} # Thread waiting for GIL
94+
self.exception_start = {} # Thread has an exception set
95+
self.no_exception_start = {} # Thread has no exception set
9196

9297
# GC event tracking: track GC start time per thread
9398
self.gc_start_per_thread = {} # tid -> start_time
@@ -204,6 +209,13 @@ def collect(self, stack_frames):
204209
self._add_marker(tid, "Waiting for GIL", self.gil_wait_start.pop(tid),
205210
current_time, CATEGORY_GIL)
206211

212+
# Track exception state (Has Exception / No Exception)
213+
has_exception = bool(status_flags & THREAD_STATUS_HAS_EXCEPTION)
214+
self._track_state_transition(
215+
tid, has_exception, self.exception_start, self.no_exception_start,
216+
"Has Exception", "No Exception", CATEGORY_EXCEPTION, current_time
217+
)
218+
207219
# Track GC events by detecting <GC> frames in the stack trace
208220
# This leverages the improved GC frame tracking from commit 336366fd7ca
209221
# which precisely identifies the thread that initiated GC collection
@@ -622,6 +634,8 @@ def _finalize_markers(self):
622634
(self.native_code_start, "Native Code", CATEGORY_CODE_TYPE),
623635
(self.gil_wait_start, "Waiting for GIL", CATEGORY_GIL),
624636
(self.gc_start_per_thread, "GC Collecting", CATEGORY_GC),
637+
(self.exception_start, "Has Exception", CATEGORY_EXCEPTION),
638+
(self.no_exception_start, "No Exception", CATEGORY_EXCEPTION),
625639
]
626640

627641
for state_dict, marker_name, category in marker_states:

Lib/profiling/sampling/live_collector/collector.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
THREAD_STATUS_ON_CPU,
1818
THREAD_STATUS_UNKNOWN,
1919
THREAD_STATUS_GIL_REQUESTED,
20+
THREAD_STATUS_HAS_EXCEPTION,
2021
PROFILING_MODE_CPU,
2122
PROFILING_MODE_GIL,
2223
PROFILING_MODE_WALL,
@@ -61,6 +62,7 @@ class ThreadData:
6162
on_cpu: int = 0
6263
gil_requested: int = 0
6364
unknown: int = 0
65+
has_exception: int = 0
6466
total: int = 0 # Total status samples for this thread
6567

6668
# Sample counts
@@ -82,6 +84,8 @@ def increment_status_flag(self, status_flags):
8284
self.gil_requested += 1
8385
if status_flags & THREAD_STATUS_UNKNOWN:
8486
self.unknown += 1
87+
if status_flags & THREAD_STATUS_HAS_EXCEPTION:
88+
self.has_exception += 1
8589
self.total += 1
8690

8791
def as_status_dict(self):
@@ -91,6 +95,7 @@ def as_status_dict(self):
9195
"on_cpu": self.on_cpu,
9296
"gil_requested": self.gil_requested,
9397
"unknown": self.unknown,
98+
"has_exception": self.has_exception,
9499
"total": self.total,
95100
}
96101

@@ -160,6 +165,7 @@ def __init__(
160165
"on_cpu": 0,
161166
"gil_requested": 0,
162167
"unknown": 0,
168+
"has_exception": 0,
163169
"total": 0, # Total thread count across all samples
164170
}
165171
self.gc_frame_samples = 0 # Track samples with GC frames
@@ -359,6 +365,7 @@ def collect(self, stack_frames):
359365
thread_data.on_cpu += stats.get("on_cpu", 0)
360366
thread_data.gil_requested += stats.get("gil_requested", 0)
361367
thread_data.unknown += stats.get("unknown", 0)
368+
thread_data.has_exception += stats.get("has_exception", 0)
362369
thread_data.total += stats.get("total", 0)
363370
if stats.get("gc_samples", 0):
364371
thread_data.gc_frame_samples += stats["gc_samples"]
@@ -723,6 +730,7 @@ def reset_stats(self):
723730
"on_cpu": 0,
724731
"gil_requested": 0,
725732
"unknown": 0,
733+
"has_exception": 0,
726734
"total": 0,
727735
}
728736
self.gc_frame_samples = 0

0 commit comments

Comments
 (0)