Devirtualize UTF8 with a specialized text converter by bbowyersmyth · Pull Request #6016 · npgsql/npgsql

bbowyersmyth · 2025-02-08T22:10:09Z

Encoding.UTF8 actually returns a UTF8EncodingSealed object (https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Text/UTF8Encoding.Sealed.cs) that allows for the functions to be devirtualized if accessed via that static.

Npgsql is creating its own UTF8Encoding objects that don't allow for this specialization.

Using a custom UTF8Encoding and a specific TextConverter when that encoding is in use allows greater performance. I did not see much performance differences with the rest of the stack allocations in UTF8EncodingSealed.

Before

Method	Value	Mean	Error	StdDev	Op/s	Gen0	Allocated
Read	x	24.72 ns	0.098 ns	0.087 ns	40,449,395.2	0.0023	24 B
Write	x	NA	NA	NA	NA	NA	NA
Read	xxxxxxxxxx	26.19 ns	0.078 ns	0.069 ns	38,187,844.1	0.0046	48 B
Write	xxxxxxxxxx	18.96 ns	0.027 ns	0.024 ns	52,748,503.0	0.0000	-
Read	xxxx(...)xxxx [100]	35.01 ns	0.649 ns	0.575 ns	28,564,979.3	0.0214	224 B
Write	xxxx(...)xxxx [100]	24.38 ns	0.071 ns	0.059 ns	41,019,998.9	0.0005	5 B
Read	xxxx(...)xxxx [1000]	107.83 ns	0.627 ns	0.556 ns	9,273,447.3	0.1935	2024 B
Write	xxxx(...)xxxx [1000]	71.47 ns	0.165 ns	0.147 ns	13,992,508.8	0.0051	54 B

After

Method	Value	Mean	Error	StdDev	Op/s	Gen0	Allocated
Read	x	18.88 ns	0.161 ns	0.135 ns	52,956,082.8	0.0023	24 B
Write	x	NA	NA	NA	NA	NA	NA
Read	xxxxxxxxxx	20.33 ns	0.093 ns	0.078 ns	49,191,801.5	0.0046	48 B
Write	xxxxxxxxxx	18.83 ns	0.019 ns	0.015 ns	53,103,167.5	0.0000	-
Read	xxxx(...)xxxx [100]	27.33 ns	0.091 ns	0.085 ns	36,587,970.4	0.0214	224 B
Write	xxxx(...)xxxx [100]	21.84 ns	0.028 ns	0.023 ns	45,783,841.6	0.0005	5 B
Read	xxxx(...)xxxx [1000]	98.62 ns	0.773 ns	0.723 ns	10,139,873.0	0.1935	2024 B
Write	xxxx(...)xxxx [1000]	67.47 ns	0.227 ns	0.201 ns	14,822,340.7	0.0051	54 B

bbowyersmyth · 2025-02-08T22:14:56Z

@NinoFloris is this a change you would be interested in? I know there is some desire for small AOT size.

NinoFloris · 2025-02-08T22:29:52Z

Clever observation! Maybe we can devirtualize inside Read and Write? We would add a branch if (Encoding.UTF8 == encoding) but won't need the extra converter and so on.

NinoFloris · 2025-02-08T23:24:26Z

Ah I missed the fact Npgsql actually defines its own UTF8Encoding instance as well. Since we don't call GetPreamble anywhere we can just use Encoding.UTF8 for NpgsqlWriteBuffer.UTF8Encoding. However we still need to use Encoding.UTF8 directly in the converter for the JIT to see it's of type UTF8EncodingSealed.

EDIT:
I recall I did a similar optimization in https://github.com/npgsql/npgsql/blob/main/src/Npgsql/Internal/Converters/JsonConverter.cs#L122-L131 (which should also ideally not depend on CodePage but a straight reference equality to Encoding.UTF8)

bbowyersmyth · 2025-02-09T04:13:11Z

Encoding.UTF8 uses the "replace invalid chars" option whereas the main Npgsql encoding uses the "throw on invalid" setting.

I tried to combine the converter but most things were a regression on the original perf. Had to deconstruct the Read function and then it was good. The current change might be a bit worse for AOT but the JIT likes it.
Only modifying the sync Read at the moment.

Method	Value	Mean	Error	StdDev	Op/s	Gen0	Allocated
Read	x	16.68 ns	0.066 ns	0.058 ns	59,945,087.0	0.0023	24 B
Write	x	NA	NA	NA	NA	NA	NA
Read	xxxxxxxxxx	18.09 ns	0.073 ns	0.068 ns	55,264,866.3	0.0046	48 B
Write	xxxxxxxxxx	19.68 ns	0.028 ns	0.025 ns	50,810,920.2	0.0000	-
Read	xxxx(...)xxxx [100]	25.24 ns	0.079 ns	0.074 ns	39,616,495.6	0.0214	224 B
Write	xxxx(...)xxxx [100]	24.95 ns	0.053 ns	0.050 ns	40,080,983.7	0.0005	5 B
Read	xxxx(...)xxxx [1000]	98.19 ns	1.330 ns	1.244 ns	10,184,553.7	0.1935	2024 B
Write	xxxx(...)xxxx [1000]	72.26 ns	0.242 ns	0.214 ns	13,838,010.8	0.0051	54 B

NinoFloris · 2025-02-09T14:22:57Z

Encoding.UTF8 uses the "replace invalid chars" option whereas the main Npgsql encoding uses the "throw on invalid" setting.

I traced back this decision to #503, however it seems outdated in the current scheme of things. At that time no other encodings were supported (support was introduced in ea9f203). So we're currently in the situation where we only do the strict thing for UTF8, which I don't think makes a lot of sense.

I'd personally be fine using Encoding.UTF8 throughout, @roji what are your thoughts on this?

NinoFloris · 2025-10-29T14:59:58Z

@roji friendly ping, do you think we can use the builtin Encoding.UTF8 over using strict today?

See #6016 (comment)

roji · 2025-10-29T20:01:21Z

Thanks for the ping @NinoFloris.

The point about us only being strict for UTF8 is definittely valid... But I think that throwing for invalid characters (as opposed to the IIRC default behavior of silently replacing them with whatever) is valuable - I remember at least some cases where users unknowingly got corrupt data in their database, and fail-fast behavior seems much better. If anything I'd rather look into some way for us to be strict for all encodings instead.

I don't have all the context here (or even nearly enough), but if this is a tradeoff between slightly better AOT size plus a few nanoseconds vs. throwing for invalid characters, I think I'm more in the latter camp. But is that a valid summary of the decision here?

Devirtualize UTF8 with a specialized text converter

8edc576

_knownUtf8Encoding

402abaa

bbowyersmyth mentioned this pull request Nov 8, 2025

Specialize UTF8 encoding in StringTextConverter writing #6297

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Devirtualize UTF8 with a specialized text converter#6016

Devirtualize UTF8 with a specialized text converter#6016
bbowyersmyth wants to merge 2 commits intonpgsql:mainfrom
bbowyersmyth:users/bruceb/utf8

bbowyersmyth commented Feb 8, 2025

Uh oh!

bbowyersmyth commented Feb 8, 2025

Uh oh!

NinoFloris commented Feb 8, 2025

Uh oh!

NinoFloris commented Feb 8, 2025 •

edited

Loading

Uh oh!

bbowyersmyth commented Feb 9, 2025

Uh oh!

NinoFloris commented Feb 9, 2025 •

edited

Loading

Uh oh!

NinoFloris commented Oct 29, 2025

Uh oh!

roji commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bbowyersmyth commented Feb 8, 2025

Uh oh!

bbowyersmyth commented Feb 8, 2025

Uh oh!

NinoFloris commented Feb 8, 2025

Uh oh!

NinoFloris commented Feb 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbowyersmyth commented Feb 9, 2025

Uh oh!

NinoFloris commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NinoFloris commented Oct 29, 2025

Uh oh!

roji commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NinoFloris commented Feb 8, 2025 •

edited

Loading

NinoFloris commented Feb 9, 2025 •

edited

Loading