@@ -96,35 +96,161 @@ performance issues in production environments.
9696Key features include:
9797
9898* **Zero-overhead profiling **: Attach to any running Python process without
99- affecting its performance
100- * **No code modification required **: Profile existing applications without restart
101- * **Real-time statistics **: Monitor sampling quality during data collection
102- * **Multiple output formats **: Generate both detailed statistics and flamegraph data
103- * **Thread-aware profiling **: Option to profile all threads or just the main thread
99+ affecting its performance. Ideal for production debugging where you can't afford
100+ to restart or slow down your application.
104101
105- Profile process 1234 for 10 seconds with default settings:
102+ * **No code modification required **: Profile existing applications without restart.
103+ Simply point the profiler at a running process by PID and start collecting data.
104+
105+ * **Flexible target modes **:
106+
107+ * Profile running processes by PID (``attach ``) - attach to already-running applications
108+ * Run and profile scripts directly (``run ``) - profile from the very start of execution
109+ * Execute and profile modules (``run -m ``) - profile packages run as ``python -m module ``
110+
111+ * **Multiple profiling modes **: Choose what to measure based on your performance investigation:
112+
113+ * **Wall-clock time ** (``--mode wall ``, default): Measures real elapsed time including I/O,
114+ network waits, and blocking operations. Use this to understand where your program spends
115+ calendar time, including when waiting for external resources.
116+ * **CPU time ** (``--mode cpu ``): Measures only active CPU execution time, excluding I/O waits
117+ and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
118+ * **GIL-holding time ** (``--mode gil ``): Measures time spent holding Python's Global Interpreter
119+ Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
120+
121+ * **Thread-aware profiling **: Option to profile all threads (``-a ``) or just the main thread,
122+ essential for understanding multi-threaded application behavior.
123+
124+ * **Multiple output formats **: Choose the visualization that best fits your workflow:
125+
126+ * ``--pstats ``: Detailed tabular statistics compatible with :mod: `pstats `. Shows function-level
127+ timing with direct and cumulative samples. Best for detailed analysis and integration with
128+ existing Python profiling tools.
129+ * ``--collapsed ``: Generates collapsed stack traces (one line per stack). This format is
130+ specifically designed for creating flamegraphs with external tools like Brendan Gregg's
131+ FlameGraph scripts or speedscope.
132+ * ``--flamegraph ``: Generates a self-contained interactive HTML flamegraph using D3.js.
133+ Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
134+ hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
135+ * ``--gecko ``: Generates Gecko Profiler format compatible with Firefox Profiler
136+ (https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
137+ timeline-based analysis with features like stack charts, markers, and network activity.
138+ * ``--heatmap ``: Generates an interactive HTML heatmap visualization with line-level sample
139+ counts. Creates a directory with per-file heatmaps showing exactly where time is spent
140+ at the source code level.
141+
142+ * **Live interactive mode **: Real-time TUI profiler with a top-like interface (``--live ``).
143+ Monitor performance as your application runs with interactive sorting and filtering.
144+
145+ * **Async-aware profiling **: Profile async/await code with task-based stack reconstruction
146+ (``--async-aware ``). See which coroutines are consuming time, with options to show only
147+ running tasks or all tasks including those waiting.
148+
149+ * **Advanced sorting options **: Sort by direct samples, total time, cumulative time,
150+ sample percentage, cumulative percentage, or function name. Quickly identify hot spots
151+ by sorting functions by where they appear most in stack traces.
152+
153+ * **Flexible output control **: Limit results to top N functions (``-l ``), customize sorting,
154+ and disable summary sections for cleaner output suitable for automation.
155+
156+ **Basic usage examples: **
157+
158+ Attach to a running process and get quick profiling stats:
159+
160+ .. code-block :: shell
161+
162+ python -m profiling.sampling attach 1234
163+
164+ Profile a script from the start of its execution:
165+
166+ .. code-block :: shell
167+
168+ python -m profiling.sampling run myscript.py arg1 arg2
169+
170+ Profile a module (like profiling ``python -m http.server ``):
171+
172+ .. code-block :: shell
173+
174+ python -m profiling.sampling run -m http.server
175+
176+ **Understanding different profiling modes: **
177+
178+ Investigate why your web server feels slow (includes I/O waits):
179+
180+ .. code-block :: shell
181+
182+ python -m profiling.sampling attach --mode wall 1234
183+
184+ Find CPU-intensive functions (excludes I/O and sleep time):
185+
186+ .. code-block :: shell
187+
188+ python -m profiling.sampling attach --mode cpu 1234
189+
190+ Debug GIL contention in multi-threaded applications:
191+
192+ .. code-block :: shell
193+
194+ python -m profiling.sampling attach --mode gil -a 1234
195+
196+ **Visualization and output formats: **
197+
198+ Generate an interactive flamegraph for visual analysis (opens in browser):
199+
200+ .. code-block :: shell
201+
202+ python -m profiling.sampling attach --flamegraph 1234
203+
204+ Upload to Firefox Profiler for timeline-based analysis:
205+
206+ .. code-block :: shell
207+
208+ python -m profiling.sampling attach --gecko -o profile.json 1234
209+ # Then upload profile.json to https://profiler.firefox.com
210+
211+ Generate collapsed stacks for custom processing:
212+
213+ .. code-block :: shell
214+
215+ python -m profiling.sampling attach --collapsed -o stacks.txt 1234
216+
217+ Generate a line-level heatmap showing exactly where time is spent:
218+
219+ .. code-block :: shell
220+
221+ python -m profiling.sampling attach --heatmap 1234
222+
223+ **Advanced usage: **
224+
225+ Profile all threads with real-time sampling statistics:
226+
227+ .. code-block :: shell
228+
229+ python -m profiling.sampling attach -a --realtime-stats 1234
230+
231+ High-frequency sampling (1ms intervals) for 60 seconds:
106232
107233.. code-block :: shell
108234
109- python -m profiling.sampling 1234
235+ python -m profiling.sampling attach -i 1000 -d 60 1234
110236
111- Profile with custom interval and duration, save to file :
237+ Show only the top 20 CPU-consuming functions :
112238
113239.. code-block :: shell
114240
115- python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
241+ python -m profiling.sampling attach --sort tottime -l 20 1234
116242
117- Generate collapsed stacks for flamegraph :
243+ Use interactive live mode to monitor performance in real-time :
118244
119245.. code-block :: shell
120246
121- python -m profiling.sampling --collapsed 1234
247+ python -m profiling.sampling attach --live 1234
122248
123- Profile all threads and sort by total time :
249+ Profile async code with task-aware stack reconstruction :
124250
125251.. code-block :: shell
126252
127- python -m profiling.sampling -a --sort-tottime 1234
253+ python -m profiling.sampling run --async-aware myscript.py
128254
129255 The profiler generates statistical estimates of where time is spent:
130256
0 commit comments