Skip to content

Commit f863807

Browse files
committed
pythongh-137122: Improve the profiling section in the 3.15 what's new document
1 parent 572c780 commit f863807

File tree

1 file changed

+139
-13
lines changed

1 file changed

+139
-13
lines changed

Doc/whatsnew/3.15.rst

Lines changed: 139 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -96,35 +96,161 @@ performance issues in production environments.
9696
Key features include:
9797

9898
* **Zero-overhead profiling**: Attach to any running Python process without
99-
affecting its performance
100-
* **No code modification required**: Profile existing applications without restart
101-
* **Real-time statistics**: Monitor sampling quality during data collection
102-
* **Multiple output formats**: Generate both detailed statistics and flamegraph data
103-
* **Thread-aware profiling**: Option to profile all threads or just the main thread
99+
affecting its performance. Ideal for production debugging where you can't afford
100+
to restart or slow down your application.
104101

105-
Profile process 1234 for 10 seconds with default settings:
102+
* **No code modification required**: Profile existing applications without restart.
103+
Simply point the profiler at a running process by PID and start collecting data.
104+
105+
* **Flexible target modes**:
106+
107+
* Profile running processes by PID (``attach``) - attach to already-running applications
108+
* Run and profile scripts directly (``run``) - profile from the very start of execution
109+
* Execute and profile modules (``run -m``) - profile packages run as ``python -m module``
110+
111+
* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
112+
113+
* **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
114+
network waits, and blocking operations. Use this to understand where your program spends
115+
calendar time, including when waiting for external resources.
116+
* **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
117+
and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
118+
* **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
119+
Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
120+
121+
* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
122+
essential for understanding multi-threaded application behavior.
123+
124+
* **Multiple output formats**: Choose the visualization that best fits your workflow:
125+
126+
* ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
127+
timing with direct and cumulative samples. Best for detailed analysis and integration with
128+
existing Python profiling tools.
129+
* ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
130+
specifically designed for creating flamegraphs with external tools like Brendan Gregg's
131+
FlameGraph scripts or speedscope.
132+
* ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
133+
Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
134+
hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
135+
* ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
136+
(https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
137+
timeline-based analysis with features like stack charts, markers, and network activity.
138+
* ``--heatmap``: Generates an interactive HTML heatmap visualization with line-level sample
139+
counts. Creates a directory with per-file heatmaps showing exactly where time is spent
140+
at the source code level.
141+
142+
* **Live interactive mode**: Real-time TUI profiler with a top-like interface (``--live``).
143+
Monitor performance as your application runs with interactive sorting and filtering.
144+
145+
* **Async-aware profiling**: Profile async/await code with task-based stack reconstruction
146+
(``--async-aware``). See which coroutines are consuming time, with options to show only
147+
running tasks or all tasks including those waiting.
148+
149+
* **Advanced sorting options**: Sort by direct samples, total time, cumulative time,
150+
sample percentage, cumulative percentage, or function name. Quickly identify hot spots
151+
by sorting functions by where they appear most in stack traces.
152+
153+
* **Flexible output control**: Limit results to top N functions (``-l``), customize sorting,
154+
and disable summary sections for cleaner output suitable for automation.
155+
156+
**Basic usage examples:**
157+
158+
Attach to a running process and get quick profiling stats:
159+
160+
.. code-block:: shell
161+
162+
python -m profiling.sampling attach 1234
163+
164+
Profile a script from the start of its execution:
165+
166+
.. code-block:: shell
167+
168+
python -m profiling.sampling run myscript.py arg1 arg2
169+
170+
Profile a module (like profiling ``python -m http.server``):
171+
172+
.. code-block:: shell
173+
174+
python -m profiling.sampling run -m http.server
175+
176+
**Understanding different profiling modes:**
177+
178+
Investigate why your web server feels slow (includes I/O waits):
179+
180+
.. code-block:: shell
181+
182+
python -m profiling.sampling attach --mode wall 1234
183+
184+
Find CPU-intensive functions (excludes I/O and sleep time):
185+
186+
.. code-block:: shell
187+
188+
python -m profiling.sampling attach --mode cpu 1234
189+
190+
Debug GIL contention in multi-threaded applications:
191+
192+
.. code-block:: shell
193+
194+
python -m profiling.sampling attach --mode gil -a 1234
195+
196+
**Visualization and output formats:**
197+
198+
Generate an interactive flamegraph for visual analysis (opens in browser):
199+
200+
.. code-block:: shell
201+
202+
python -m profiling.sampling attach --flamegraph 1234
203+
204+
Upload to Firefox Profiler for timeline-based analysis:
205+
206+
.. code-block:: shell
207+
208+
python -m profiling.sampling attach --gecko -o profile.json 1234
209+
# Then upload profile.json to https://profiler.firefox.com
210+
211+
Generate collapsed stacks for custom processing:
212+
213+
.. code-block:: shell
214+
215+
python -m profiling.sampling attach --collapsed -o stacks.txt 1234
216+
217+
Generate a line-level heatmap showing exactly where time is spent:
218+
219+
.. code-block:: shell
220+
221+
python -m profiling.sampling attach --heatmap 1234
222+
223+
**Advanced usage:**
224+
225+
Profile all threads with real-time sampling statistics:
226+
227+
.. code-block:: shell
228+
229+
python -m profiling.sampling attach -a --realtime-stats 1234
230+
231+
High-frequency sampling (1ms intervals) for 60 seconds:
106232

107233
.. code-block:: shell
108234
109-
python -m profiling.sampling 1234
235+
python -m profiling.sampling attach -i 1000 -d 60 1234
110236
111-
Profile with custom interval and duration, save to file:
237+
Show only the top 20 CPU-consuming functions:
112238

113239
.. code-block:: shell
114240
115-
python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
241+
python -m profiling.sampling attach --sort tottime -l 20 1234
116242
117-
Generate collapsed stacks for flamegraph:
243+
Use interactive live mode to monitor performance in real-time:
118244

119245
.. code-block:: shell
120246
121-
python -m profiling.sampling --collapsed 1234
247+
python -m profiling.sampling attach --live 1234
122248
123-
Profile all threads and sort by total time:
249+
Profile async code with task-aware stack reconstruction:
124250

125251
.. code-block:: shell
126252
127-
python -m profiling.sampling -a --sort-tottime 1234
253+
python -m profiling.sampling run --async-aware myscript.py
128254
129255
The profiler generates statistical estimates of where time is spent:
130256

0 commit comments

Comments
 (0)