-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(str): add invariant to avoid unexpected errors when parsing an invalid UTF8 string #410
Conversation
…valid UTF8 string
I took the approach of adding an invariant checking that the problematic parameter is a valid utf8 before doing anything else. Maybe an error handling around |
Pull Request Test Coverage Report for Build 5437950104
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
I've added a few small remarks
…ion` when the string is not trimmable
@devnix I took the liberty of performing some benchmarks here: #[Groups(['str'])]
final class Utf8Bench
{
public function benchValidUtf8CheckThroughPreg(): void
{
preg_match('/./u', "HELLO");
}
public function benchValidUtf8CheckThroughEncodingAndPreg(): void
{
Psl\invariant(Str\is_utf8("HELLO"), 'Expected $string to be a valid UTF-8 string.');
preg_match('/./u', "HELLO");
}
} The results are rather painful:
As you can see, this way is close to 6 times slower for |
I think using is_utf8 is not the right solution indeed, we could either manually check for null return, or use |
@azjezz Psl\Regex is about 4.6x slower than regular #[Groups(['str'])]
final class Utf8Bench
{
public function benchValidUtf8CheckThroughPreg(): void
{
if (null === preg_replace('/./u', '', "HELLO")) {
throw new \RuntimeException('xxx');
}
}
public function benchValidUtf8CheckThroughEncodingAndPreg(): void
{
Psl\invariant(Str\is_utf8("HELLO"), 'Expected $string to be a valid UTF-8 string.');
preg_replace('/./u', '', "HELLO");
}
public function benchValidUtfWithPslRegex(): void
{
Regex\replace('HELLO', '/./u', '');
}
}
|
Yep, i guess we just do a null check here then, we should try to minimise the performance difference between Psl and PHP stdlib, even though in this case, PHP does not have an alternative for these string functions ( at least not mb variants ). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Added a small question about the suppress here - but apart from that it looks to do what it needs to do :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the Regex component is probably the best option - even though it's less performant than other options.
Fixes #409