Skip to content

Conversation

@mickleness
Copy link
Contributor

@mickleness mickleness commented Dec 28, 2025

When decoding an uninterlaced 8-bit PNG image, the PNGImageDecoder is basically copying one byte at a time.

This PR uses System.arraycopy instead, and it shows approx a 10% improvement.

This graph shows the time it takes different decoders to convert a byte array into a BufferedImage as the size of the PNG image increases:

Screenshot 2025-12-27 at 9 14 19 PM

(This originally came to my attention when looking at an image in Java 1.8. There the ImageConsumer model took approx 400% longer than ImageIO. I was happy to see in recent JDKs that gap narrowed significantly, but there was still a noticeable 10% discrepancy.)

I haven't tried submitting a performance enhancement PR before; I'm not sure if this issue meets this group's threshold for being worth addressing. And if it does: I'm not sure how to structure a unit test for it.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8374377: PNGImageDecoder Slow For 8-bit PNGs (Enhancement - P5)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/29004/head:pull/29004
$ git checkout pull/29004

Update a local copy of the PR:
$ git checkout pull/29004
$ git pull https://git.openjdk.org/jdk.git pull/29004/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 29004

View PR using the GUI difftool:
$ git pr show -t 29004

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/29004.diff

Using Webrev

Link to Webrev Comment

This shows slightly over a 10% improvement in my tests.

This only applies to non-interlaced 8-bit (IndexColorModel) PNGs.
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 28, 2025

👋 Welcome back jwood! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 28, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Dec 28, 2025

@mickleness The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 28, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 28, 2025

Webrevs

@gredler
Copy link
Contributor

gredler commented Dec 28, 2025

Do you have a JMH test for the attached graph that can be used to replicate locally? If you've not used JMH before, or haven't used it in the context of OpenJDK, you can check existing performance tests in the test/micro directory, or see this example (not part of OpenJDK, just something I've used in the past to test things locally): https://gist.github.com/gredler/e8ff9d52440cd103cd5b7766defff5b8

This corroborates previous (more informal) tests that show approx a 10% improvement.

Without this PR I observed this result on my Mac 26.2 laptop:

```
Benchmark                                                 Mode  Cnt   Score   Error  Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder  avgt   15  25.296 ± 0.521  ms/op
```

With this PR I observed this result:

```
Benchmark                                                 Mode  Cnt   Score   Error  Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder  avgt   15  22.132 ± 0.378  ms/op
```
@mickleness
Copy link
Contributor Author

mickleness commented Dec 29, 2025

@gredler Thanks. I'm not familiar with JMH.

I used your example (and a lot of googling) to set up a basic benchmark that is now attached to this ticket.

For a 2500x2500 px image, the time on my Mac went from approx 25.296 to 22.132.

Is there any interest in me:
A. Creating a graph as the image dimensions change?
B. Contrasting this implementation against ImageIO?

I'm not sure what the long-term usage of this benchmark .java would be. If this PR is accepted: the slower "before" time will not be relevant to anyone going forward.

And the most important question (IMO): is there enough interest in this group to review/accept this proposal? (If not: I can stop spending time on it.)

Before this PR

Benchmark                                                 Mode  Cnt   Score   Error  Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder  avgt   15  25.296 ± 0.521  ms/op

After this PR

Benchmark                                                 Mode  Cnt   Score   Error  Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder  avgt   15  22.132 ± 0.378  ms/op

@liach
Copy link
Member

liach commented Jan 2, 2026

With the patch, the code case 8: bPixels[col+rowOffset] = rowByteBuffer[spos++]; at old line 415/new line 427 is now dead. Should we replace that site with an assertion, or just move the arraycopy code to there instead?

Also this removes the performance comparison. (As of this writing there's a separate `test/micro/org/openjdk/bench/sun/awt/image/PNGImageDecoder_8bit_uninterlaced.java` file used to demonstrate that this change is more performant.)
@mickleness
Copy link
Contributor Author

@liach thanks for checking that far ahead.

I just confirmed that code is NOT dead, though. It should still be used for interlaced 8-bit PNGs. spos is incremented by 1, but col may be incremented by different amounts when we reach col += colInc.

I rewrote the unit test just now so that:

  1. It only tests for correctness (it does NOT try to measure performance anymore)
  2. It tests correctness of 8-bit non-interlaced PNGs (the crux of this bug) and 8-bit interlaced PNGs.

I tried removing the line you mentioned (bPixels[col+rowOffset] = rowByteBuffer[spos++];), and confirmed that the 8-bit interlaced PNG fails without that line.

@jayathirthrao
Copy link
Member

@gredler Thanks. I'm not familiar with JMH.

I used your example (and a lot of googling) to set up a basic benchmark that is now attached to this ticket.

For a 2500x2500 px image, the time on my Mac went from approx 25.296 to 22.132.

Is there any interest in me: A. Creating a graph as the image dimensions change? B. Contrasting this implementation against ImageIO?

I'm not sure what the long-term usage of this benchmark .java would be. If this PR is accepted: the slower "before" time will not be relevant to anyone going forward.

And the most important question (IMO): is there enough interest in this group to review/accept this proposal? (If not: I can stop spending time on it.)

Before this PR

Benchmark                                                 Mode  Cnt   Score   Error  Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder  avgt   15  25.296 ± 0.521  ms/op

After this PR

Benchmark                                                 Mode  Cnt   Score   Error  Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder  avgt   15  22.132 ± 0.378  ms/op

I also see similar improvements when i run the newly added JMH test locally on my Macbook.

Before PR:
Benchmark Mode Cnt Score Error Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder avgt 15 26.428 ± 0.127 ms/op

After PR:
After PR:
Benchmark Mode Cnt Score Error Units
PNGImageDecoder_8bit_uninterlaced.measurePNGImageDecoder avgt 15 22.930 ± 0.125 ms/op

Its good that we are adding a JMH test for performance bench-marking.
I am trying to get information on how frequently these JMH tests are run for Java performance bench-marking. This test will help in future if there are any perf regressions for this particular scenario.

Copy link
Member

@jayathirthrao jayathirthrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Product change looks good to me.
Some changes are needed to make the JMH test run.

I have also given full functional test run after this update.
Will update once i have results.

* or visit www.oracle.com if you need additional information or have any
* questions.
*/
package org.openjdk.bench.sun.misc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package should be org.openjdk.bench.sun.awt.image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated

import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to import import org.openjdk.jmh.annotations.Benchmark; and import org.openjdk.jmh.annotations.Setup; also. Otherwise i am seeing compilation errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated

Copy link
Contributor

@gredler gredler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in PNGImageDecoder makes sense to me, although I haven't had a chance to run the JMH benchmark locally on my end.

* PNGImageDecoder. This method makes sure the enhancement didn't cost us
* any accuracy.
*/
private static void testCorrectness(BufferedImage expected,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the correctness check in the JMH benchmark be removed, since that's handled in the unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is updated


@Benchmark
public void measurePNGImageDecoder(Blackhole bh) throws Exception {
Image img = Toolkit.getDefaultToolkit().createImage(pngImageData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the BufferedImage need to be created this way, or could it be simplified down to a simple ImageIO.read() with a ByteArrayInputStream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImageIO is a different rendering model; the BufferedImage needs to be created this way. (Or maybe (?) with a PixelGrabber or MediaTracker?)

*
* This test has never failed.
*/
public class PNGImageDecoder_8bit_performance {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is now purely a regression test and the performance aspect has been removed, perhaps rename to e.g. PngImageDecoder8BitTest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is renamed

int argb1 = expected.getRGB(x, y);
int argb2 = actual.getRGB(x, y);
if (argb1 != argb2) {
throw new Error("x = " + x + ", y = " + y);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error -> RuntimeException, and I'd probably also include the two colors that didn't match in the message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated

private static void testCorrectness(BufferedImage expected,
BufferedImage actual) {
if (expected.getWidth() != actual.getWidth()) {
throw new Error();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to throw a RuntimeException instead of an Error here and below (at least that seems to be the convention that I've seen elsewhere). Also always best to include a short error message that helps zero in on the exact issue if it ever fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated

BufferedImage expected = models[0].load(imageData);
BufferedImage actual = models[1].load(imageData);

testCorrectness(expected, actual);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in assuming that both models end up using PNGImageDecoder under the covers? If so, won't expected and actual always match, even if there is a bug in PNGImageDecoder? I wonder if it would be better to keep the original BufferedImage around (the one we draw on), use it as expected, and compare it to the two model-generated images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct in assuming that both models end up using PNGImageDecoder under the covers?

It'd make a lot of sense if they shared the same code, but no. It is my understanding we have two separate decoders. ImageIO uses the com.sun.imageio.plugins.png.PNGImageReader , and this PR modifies the sun.awt.image.PNGImageDecoder.

The fact that the PNGImageDecoder is a little slower than the ImageIO classes came to my attention because I was comparing the two decoding models, and the older sun.awt classes appear to be faster than the newer ImageIO classes -- except for this one case. This PR will fix this discrepancy.

(Separately: I've submitted a few other PRs that similarly focus on the older sun.awt decoding classes. See JDK-8357034 , JDK-8356320 , JDK-8356137 , JDK-8351913 . They generally compare the two models against each other for correctness. I also visually inspected the results at the time to triple-check, but in all of those cases ImageIO was already "doing the right thing". I guess I've made it my goal to bring the older sun.awt decoding classes up-to-date.)

I wonder if it would be better to keep the original BufferedImage around (the one we draw on), use it as expected, and compare it to the two model-generated images.

Sure. I updated this test so it avoids decoding with ImageIO.

The PNGImageDecoder_8bit_performance.java /PngImageDecoder8BitTest.java class will test correctness.

This is in response to:
openjdk#29004 (comment)
This is in response to:
hhttps://github.com/openjdk/pull/29004#discussion_r2657759447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants