Fix PDF bloat for off-axis scatter with per-point colors #30746

FazeelUsmani · 2025-11-12T09:41:35Z

PR summary

Problem

When using scatter() with per-point colors and then moving the axes limits so all points are off-screen, the PDF backend still writes all marker paths to the file, resulting in unnecessarily large PDFs (~400KB instead of ~7KB for 1000 points).

Solution

Added bounds checking in draw_path_collection (backend_pdf.py:2121-2125) to skip markers outside the visible canvas. This mirrors the existing optimization already present in draw_markers (lines 2157-2159).

Implementation

Check marker offsets (xo, yo) against canvas bounds (0, 0) to (width*72, height*72)
Skip the output() call for out-of-bounds markers
Conservative, backend-specific change (PDF only)
No API changes or user-facing behavior modifications

Reproduction Example

import numpy as np
import matplotlib.pyplot as plt

x = np.random.random(1000)
y = np.random.random(1000)
c = np.random.random(1000)

# Before fix: ~410 KB PDF
# After fix: ~7-15 KB PDF
plt.figure()
plt.scatter(x, y, c=c)
plt.xlim(20, 30)  # Move all points off-axis (data is 0-1 range)
plt.savefig('scatter_offaxis.pdf')

Testing

Added three new tests in test_backend_pdf.py:

test_scatter_offaxis_colored_pdf_size
test_scatter_offaxis_colored_visual
test_scatter_mixed_onoff_axis

PR checklist

"Fixes Off-axes scatter() points unnecessarily saved to PDF when coloured #2488" is in the body of the PR description to [(https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)]
New and changed code is [tested](https://matplotlib.org/devdocs/devel/testing.html)
[N/A] Plotting related features are demonstrated in an [example](https://matplotlib.org/devdocs/devel/document.html#write-examples-and-tutorials)
[N/A] New Features and API Changes are noted with a [directive and release note](https://matplotlib.org/devdocs/devel/api_changes.html#announce-changes-deprecations-and-new-features)
[N/A] Documentation complies with [general](https://matplotlib.org/devdocs/devel/document.html#write-rest-pages) and [docstring](https://matplotlib.org/devdocs/devel/document.html#write-docstrings) guidelines

Skip emitting markers outside canvas bounds in draw_path_collection to reduce PDF file size when scatter points are off-axis. Fixes matplotlib#2488

jklymak · 2025-11-12T14:59:23Z

This seems to remove a marker if it's center is outside the document. What happens if the marker size is large enough that the edge of the marker should still be shown even if the center is not on the page?

tacaswell · 2025-11-12T15:56:35Z

lib/matplotlib/tests/test_backend_pdf.py

+    # The off-axis colored scatter should be close to empty size
+    # Allow up to 50KB overhead for axes/metadata, but should be much smaller
+    # than if all 1000 markers were written (which would add ~200-400KB)
+    assert size_offaxis_colored < size_empty + 50_000, (


The 50kb padding is confusing to me. The axes should be the same output either way (as the limits are the same etc). If there was anything extra I would expect it to be the structures for the scatter that has no paths (there are some open/close group stuff emitted if I recall correctly). To account for that we could ad ax2.scatter([], []) to the empty test.

It is probably worth also adding a third example where there are markers in in the plot (ax3.scatter(x+20, y+20, c=c)) to test that fully off axis one is about the same size as the empty one and both are smaller than the one with visible markers.

I've updated the tests per your suggestions:

Added ax2.scatter([], []) to the baseline test to match the axes structure exactly

Reduced tolerance from 50KB to 5KB since axes output is now identical

Added a third test case with visible markers (x + 20, y + 20, c=c) that validates:

Visible scatter is >50KB larger than empty

Visible scatter is >50KB larger than off-axis

Now, we can compare all three outputs:

fully off-axis scatter --> small file

empty scatter --> small file

visible scatter --> larger file

tacaswell · 2025-11-12T16:00:56Z

lib/matplotlib/backends/backend_pdf.py

+            # may be partially visible even if its center is outside the canvas.
+            canvas_width = self.file.width * 72
+            canvas_height = self.file.height * 72
+            if not (-max_marker_extent <= xo <= canvas_width + max_marker_extent


Why did you use a maximum extent rather than doing the computation here and having per-maker and per-direction filtering?

We have to compute the extent of every maker no matter what so might as well get the better behavior by doing it here.

I initially used a single max_marker_extent to simplify the bounds check and avoid recalculating the extent for each marker.

Yes, you're right - per-marker filtering is more precise. I've updated the commit.

tacaswell · 2025-11-12T16:02:51Z

lib/matplotlib/backends/backend_pdf.py

+                bbox = path.get_extents(transform)
+                max_marker_extent = max(max_marker_extent,
+                                      bbox.width / 2, bbox.height / 2)
            name = self.file.pathCollectionObject(


Does this write something into the file our only update our Python side sate?

It does not write into file immediately but seems it writes after all drawing is done. Let me fix this by creating path templates only for used path_ids and compute extents for creted paths.

tacaswell · 2025-11-12T16:07:15Z

lib/matplotlib/backends/backend_pdf.py

+                # Get the bounding box of the transformed marker path.
+                # Use get_extents() which is more efficient than transforming
+                # all vertices, and add padding for stroke width.
+                bbox = path.get_extents(transform)


Did you test that this works as expected with log scale and polar?

FazeelUsmani · 2025-11-13T09:14:09Z

This seems to remove a marker if it's center is outside the document. What happens if the marker size is large enough that the edge of the marker should still be shown even if the center is not on the page?

Great catch! You're absolutely right - the initial implementation didn't account for marker size. I've
updated the code to compute per-marker extents (bounding boxes) and only skip markers that are
completely outside the canvas.

The new implementation:

Calculates the extent (half-width, half-height) for each marker path
Checks if the marker's bounding box intersects the canvas: (-extent_x <= xo <= canvas_width + extent_x)
Only skips markers where the entire bounding box is outside the visible area

I've also added test_scatter_large_markers_partial_clip which specifically tests markers with centers
outside the canvas but edges extending into the visible area, checked and these are now correctly rendered.

jklymak · 2025-11-13T17:31:02Z

@FazeelUsmani that looks correc. Nice job!

However, now I wonder if we are trading file-size bloat for run-time bloat? This seems to check the extent of every marker - what if I have 10,000 markers, and they are all in the axes - will this be super slow in needlessly getting all the extents? Does a hybrid of the two approaches seem useful (eg only update the extent if the markers centre is not in the figure?)

FazeelUsmani · 2025-11-14T12:15:03Z

@jklymak Yes, hybrid suits well here. We will get extent only if the marker is off the grid.
Added the changes.

FazeelUsmani · 2025-11-16T17:00:24Z

@jklymak, it's ready for review.

jklymak

Looks good.

FazeelUsmani · 2025-11-17T09:56:42Z

Hi @tacaswell, can you please take another look?

FazeelUsmani · 2025-11-18T14:29:18Z

Thanks! for approving this PR @jklymak.

I've a question: Usually, when the PR is merged after changes are accepted? Is it before release?

jklymak · 2025-11-18T14:33:41Z

The PR requires two approvals and then will be merged

FazeelUsmani · 2025-11-18T15:12:00Z

I see. @jklymak, can you please look at this PR #30756? It has one approval, just need another.

FazeelUsmani · 2025-11-18T15:12:49Z

The PR requires two approvals and then will be merged

@tacaswell waiting for you. Let me know if you've any questions or need clarifications

Fix PDF bloat for off-axis scatter with per-point colors

4104dd5

Skip emitting markers outside canvas bounds in draw_path_collection to reduce PDF file size when scatter points are off-axis. Fixes matplotlib#2488

FazeelUsmani marked this pull request as draft November 12, 2025 09:41

github-actions bot added the backend: pdf label Nov 12, 2025

FazeelUsmani marked this pull request as ready for review November 12, 2025 10:29

Skip off-canvas markers in PDF backend, accounting for marker size

f95d761

tacaswell reviewed Nov 12, 2025

View reviewed changes

FazeelUsmani added 2 commits November 13, 2025 14:34

Fix PDF bloat for off-axis scatter with per-point colors

1f3b3c3

Add tests for scatter PDF optimization with colored markers

1c3e605

implement hybrid approach

3ef8b23

FazeelUsmani requested a review from tacaswell November 16, 2025 16:59

jklymak approved these changes Nov 16, 2025

View reviewed changes

Uh oh!

Fix PDF bloat for off-axis scatter with per-point colors #30746

Are you sure you want to change the base?

Fix PDF bloat for off-axis scatter with per-point colors #30746

Uh oh!

Conversation

FazeelUsmani commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

Problem

Solution

Implementation

Reproduction Example

Testing

PR checklist

Uh oh!

jklymak commented Nov 12, 2025

Uh oh!

tacaswell Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

FazeelUsmani Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

tacaswell Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

FazeelUsmani Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

tacaswell Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

FazeelUsmani Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

tacaswell Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

FazeelUsmani commented Nov 13, 2025

Uh oh!

jklymak commented Nov 13, 2025

Uh oh!

FazeelUsmani commented Nov 14, 2025

Uh oh!

FazeelUsmani commented Nov 16, 2025

Uh oh!

jklymak left a comment

Choose a reason for hiding this comment

Uh oh!

FazeelUsmani commented Nov 17, 2025

Uh oh!

FazeelUsmani commented Nov 18, 2025

Uh oh!

jklymak commented Nov 18, 2025

Uh oh!

FazeelUsmani commented Nov 18, 2025

Uh oh!

FazeelUsmani commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

FazeelUsmani commented Nov 12, 2025 •

edited

Loading