Introduce fast path for innerHTML #9926

cdumez · 2023-02-10T16:00:11Z

`236ad6c`

Introduce fast path for innerHTML
https://bugs.webkit.org/show_bug.cgi?id=252054

Reviewed by Darin Adler.

This patch imports the new fast HTML parser from Blink:
https://bugs.chromium.org/p/chromium/issues/detail?id=1407201

This patch adds a new parser that handles a subset of HTML and is only used by
Element::set[Inner|Outer]HTML. While limited, the hope is this parser is
useful for a number of pages. We could expand on the set of cases this handles
in the future.

As a sanity check, there is a debug assertion in place to make sure that the
fast path generates the same output as the full HTML parser would have.

This is a verified 2.25% progression on Speedometer on macOS and 2.60%
progression on iOS.

* Source/WTF/wtf/text/WTFString.h:
* Source/WebCore/Sources.txt:
* Source/WebCore/WebCore.xcodeproj/project.pbxproj:
* Source/WebCore/dom/Document.h:
(WebCore::Document::isDirAttributeDirty const):
(WebCore::Document::setIsDirAttributeDirty):
(WebCore::Document::isTemplateDocument const):
* Source/WebCore/editing/markup.cpp:
(WebCore::createFragmentForMarkup):
* Source/WebCore/html/HTMLBDIElement.cpp:
(WebCore::HTMLBDIElement::HTMLBDIElement):
* Source/WebCore/html/HTMLButtonElement.cpp:
(WebCore::HTMLButtonElement::create):
* Source/WebCore/html/HTMLButtonElement.h:
* Source/WebCore/html/HTMLElement.cpp:
(WebCore::HTMLElement::childrenChanged):
(WebCore::HTMLElement::dirAttributeChanged):

* Source/WebCore/html/HTMLInputElement.cpp:
(WebCore::HTMLInputElement::~HTMLInputElement):
(WebCore::HTMLInputElement::needsSuspensionCallback):
We have an optimization in HTMLInputElement which delays the initialization of
m_inputType when constructed by the HTML parser. However, in the fast-parsing
case, we may now fail parsing (due to limited support) in the middle of parsing
an HTMLInputElement. As a result, we may end up destroying an HTMLInputElement
before its m_inputType gets initialized, which would cause a crash. I added a
couple of null checks for m_inputType so that the HTMLInputElement destructor
can run even if m_inputType wasn't initialized yet.

* Source/WebCore/html/HTMLLabelElement.cpp:
(WebCore::HTMLLabelElement::create):
* Source/WebCore/html/HTMLLabelElement.h:
* Source/WebCore/html/HTMLSelectElement.cpp:
(WebCore::HTMLSelectElement::create):
* Source/WebCore/html/HTMLSelectElement.h:
* Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp: Added.
(WebCore::operator==):
(WebCore::onlyContainsLowercaseASCIILetters):
(WebCore::tagNameHash):
(WebCore::HTMLFastPathParser::HTMLFastPathParser):
(WebCore::HTMLFastPathParser::parse):
(WebCore::HTMLFastPathParser::parseResult const):
(WebCore::HTMLFastPathParser::TagInfo::Tag::create):
(WebCore::HTMLFastPathParser::TagInfo::Tag::allowedInPhrasingOrFlowContent):
(WebCore::HTMLFastPathParser::TagInfo::Tag::allowedInFlowContent):
(WebCore::HTMLFastPathParser::TagInfo::ContainerTag::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::ContainsPhrasingContentTag::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::A::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::AWithPhrasingContent::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::B::create):
(WebCore::HTMLFastPathParser::TagInfo::Footer::create):
(WebCore::HTMLFastPathParser::TagInfo::I::create):
(WebCore::HTMLFastPathParser::TagInfo::Input::create):
(WebCore::HTMLFastPathParser::TagInfo::Option::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::Ol::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::Select::parseChild):
(WebCore::HTMLFastPathParser::TagInfo::Strong::create):
(WebCore::HTMLFastPathParser::TagInfo::Ul::parseChild):
(WebCore::HTMLFastPathParser::parseCompleteInput):
(WebCore::HTMLFastPathParser::isWhitespace):
(WebCore::HTMLFastPathParser::isValidUnquotedAttributeValueChar):
(WebCore::HTMLFastPathParser::isValidAttributeNameChar):
(WebCore::HTMLFastPathParser::isCharAfterTagNameOrAttribute):
(WebCore::HTMLFastPathParser::isCharAfterUnquotedAttribute):
(WebCore::HTMLFastPathParser::skipWhitespace):
(WebCore::HTMLFastPathParser::scanText):
(WebCore::HTMLFastPathParser::scanEscapedText):
(WebCore::HTMLFastPathParser::scanTagName):
(WebCore::HTMLFastPathParser::scanAttributeName):
(WebCore::HTMLFastPathParser::scanAttributeValue):
(WebCore::HTMLFastPathParser::scanEscapedAttributeValue):
(WebCore::HTMLFastPathParser::scanHTMLCharacterReference):
(WebCore::HTMLFastPathParser::didFail):
(WebCore::HTMLFastPathParser::peekNext):
(WebCore::HTMLFastPathParser::consumeNext):
(WebCore::HTMLFastPathParser::parseChildren):
(WebCore::HTMLFastPathParser::processAttribute):
(WebCore::HTMLFastPathParser::parseAttributes):
(WebCore::HTMLFastPathParser::parseSpecificElements):
(WebCore::HTMLFastPathParser::parseElement):
(WebCore::HTMLFastPathParser::parseElementAfterTagName):
(WebCore::HTMLFastPathParser::parseContainerElement):
(WebCore::HTMLFastPathParser::parseVoidElement):
(WebCore::canUseFastPath):
(WebCore::tryFastParsingHTMLFragmentImpl):
(WebCore::tryFastParsingHTMLFragment):
* Source/WebCore/html/parser/HTMLDocumentParserFastPath.h: Added.
* Source/WebCore/html/parser/HTMLEntityParser.cpp:
(WebCore::appendLegalEntityFor):
* Source/WebCore/html/parser/HTMLEntityParser.h:
* Source/WebCore/html/parser/HTMLNameCache.h:
(WebCore::HTMLNameCache::makeAttributeQualifiedName):
(WebCore::HTMLNameCache::makeQualifiedName):

Canonical link: https://commits.webkit.org/260148@main

ba5d5d4

Misc	iOS, tvOS & watchOS	macOS	Linux	Windows
❌ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	✅ 🛠 wpe	~~🛠 wincairo~~
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	✅ 🛠 gtk
✅ 🧪 webkitperl	~~🧪 ios-wk2~~	✅ 🧪 api-mac	~~🧪 gtk-wk2~~
	✅ 🧪 api-ios	~~🧪 mac-wk1~~	~~🧪 api-gtk~~
✅ 🛠 🧪 jsc	✅ 🛠 tv	✅ 🧪 mac-wk2	~~🛠 jsc-armv7~~
✅ 🛠 🧪 jsc-arm64	✅ 🛠 tv-sim	~~🧪 mac-AS-debug-wk2~~	~~🧪 jsc-armv7-tests~~
	✅ 🛠 watch	✅ 🧪 mac-wk2-stress	✅ 🛠 jsc-mips
✅ 🛠 🧪 merge	✅ 🛠 watch-sim		✅ 🧪 jsc-mips-tests

webkit-early-warning-system · 2023-02-10T16:00:28Z

EWS run on previous version of this PR (hash 7d85c96)

Misc	iOS, tvOS & watchOS	macOS	Linux	Windows
❌ 🧪 style	✅ 🛠 ios	❌ 🛠 mac	❌ 🛠 wpe	✅ 🛠 wincairo
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	❌ 🛠 gtk
✅ 🧪 webkitperl	~~🧪 ios-wk2~~	❌ 🧪 api-mac	❌ 🧪 gtk-wk2
	✅ 🧪 api-ios	❌ 🧪 mac-wk1	❌ 🧪 api-gtk
⏳ 🛠 🧪 jsc	✅ 🛠 tv	❌ 🧪 mac-wk2	⏳ 🛠 jsc-armv7
✅ 🛠 🧪 jsc-arm64	✅ 🛠 tv-sim	~~🧪 mac-AS-debug-wk2~~	⏳ 🧪 jsc-armv7-tests
	✅ 🛠 watch	❌ 🧪 mac-wk2-stress	✅ 🛠 jsc-mips
	✅ 🛠 watch-sim		~~🧪 jsc-mips-tests~~

webkit-early-warning-system · 2023-02-10T17:12:26Z

EWS run on previous version of this PR (hash 64b7e15)

Misc	iOS, tvOS & watchOS	macOS	Linux	Windows
❌ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	❌ 🛠 wpe	⏳ 🛠 wincairo
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	❌ 🛠 gtk
✅ 🧪 webkitperl	~~🧪 ios-wk2~~	~~🧪 api-mac~~	❌ 🧪 gtk-wk2
	~~🧪 api-ios~~	~~🧪 mac-wk1~~	❌ 🧪 api-gtk
⏳ 🛠 🧪 jsc	⏳ 🛠 tv	~~🧪 mac-wk2~~	⏳ 🛠 jsc-armv7
⏳ 🛠 🧪 jsc-arm64	✅ 🛠 tv-sim	~~🧪 mac-AS-debug-wk2~~	⏳ 🧪 jsc-armv7-tests
	⏳ 🛠 watch	~~🧪 mac-wk2-stress~~	⏳ 🛠 jsc-mips
	✅ 🛠 watch-sim		⏳ 🧪 jsc-mips-tests

webkit-early-warning-system · 2023-02-10T17:50:50Z

EWS run on previous version of this PR (hash be56d57)

Misc	iOS, tvOS & watchOS	macOS	Linux	Windows
❌ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	✅ 🛠 wpe	✅ 🛠 wincairo
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	✅ 🛠 gtk
✅ 🧪 webkitperl	✅ 🧪 ios-wk2	✅ 🧪 api-mac	✅ 🧪 gtk-wk2
	✅ 🧪 api-ios	✅ 🧪 mac-wk1	~~🧪 api-gtk~~
✅ 🛠 🧪 jsc	✅ 🛠 tv	✅ 🧪 mac-wk2	⏳ 🛠 jsc-armv7
✅ 🛠 🧪 jsc-arm64	✅ 🛠 tv-sim	✅ 🧪 mac-AS-debug-wk2	⏳ 🧪 jsc-armv7-tests
	✅ 🛠 watch	✅ 🧪 mac-wk2-stress	✅ 🛠 jsc-mips
	✅ 🛠 watch-sim		✅ 🧪 jsc-mips-tests

weinig · 2023-02-10T21:17:10Z

What makes the fast path parser faster? For instance, the fast path CSS property parser skips tokenization and just tries to match certain character sequences directly.

weinig · 2023-02-10T21:19:35Z

Source/WTF/wtf/text/WTFString.h

@@ -104,6 +104,8 @@ class String final {
    unsigned length() const { return m_impl ? m_impl->length() : 0; }
    const LChar* characters8() const { return m_impl ? m_impl->characters8() : nullptr; }
    const UChar* characters16() const { return m_impl ? m_impl->characters16() : nullptr; }
+    Span<const LChar> span8() const { return { characters8(), length() }; }


Kinda feel like these should have the characters8/16 name long term and the raw pointers one can either go away or have a longer name. Not worth changing in this patch, but maybe worth a comment.

I tried to be consistent with StringView which already had those:

const LChar* characters8() const; const UChar* characters16() const; Span<const LChar> span8() const { return { characters8(), length() }; } Span<const UChar> span16() const { return { characters16(), length() }; }

For future consideration, I like Sam’s idea even though I am the one who named the functions in StringView.

cdumez · 2023-02-10T21:24:08Z

What makes the fast path parser faster? For instance, the fast path CSS property parser skips tokenization and just tries to match certain character sequences directly.

I didn't write it. I imported it from Blink and perf-tested it.

I believe the fact that it only deals with a very small subset of HTML makes it faster.

I am still A/B testings some things to see if we can make it even faster but I decided to upload this patch already since it already tested as a very good progression and passes all the tests.

weinig · 2023-02-10T21:28:27Z

Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp

+    Document& m_document;
+    DocumentFragment& m_fragment;
+
+    const Char* const m_end { m_source.data() + m_source.size() };


Did you consider using an abstraction like the StringParsingBuffer rather than raw pointers?

This is a straight import. I didn't make any tweaks yet. I think we should adopt more WebKit concepts, though I was hoping to do it in a follow-up since each change will have to be perf-tested.

I'll give it a shot in a follow-up and perf test it. I would be nicer than dealing with pointers indeed.

I was able to rewrite the code using StringParsingBuffer easily. I am waiting on A/B perf results.

Looks like a small regression at the moment, I am trying to figure out why.

darinadler

So many coding style tweaks possibilities, and maybe additional optimization too. But it’s great to get this in here.

darinadler · 2023-02-11T01:28:07Z

Source/WTF/wtf/text/WTFString.h

@@ -104,6 +104,8 @@ class String final {
    unsigned length() const { return m_impl ? m_impl->length() : 0; }
    const LChar* characters8() const { return m_impl ? m_impl->characters8() : nullptr; }
    const UChar* characters16() const { return m_impl ? m_impl->characters16() : nullptr; }
+    Span<const LChar> span8() const { return { characters8(), length() }; }


For future consideration, I like Sam’s idea even though I am the one who named the functions in StringView.

darinadler · 2023-02-11T01:29:59Z

Source/WebCore/editing/markup.cpp

+        bool parsedUsingFastPath = tryFastParsingHTMLFragment(markup, document, fragment, contextElement, parserContentPolicy);
+        if (parsedUsingFastPath) {
+#if ASSERT_ENABLED
+            // As a sanity check for the fast-path, create another fragment using the full parser and compare the results.
+            auto referenceFragment = DocumentFragment::create(document);
+            referenceFragment->parseHTML(markup, &contextElement, parserContentPolicy);
+            ASSERT(serializeFragment(fragment, SerializedNodes::SubtreesOfChildren) == serializeFragment(referenceFragment, SerializedNodes::SubtreesOfChildren));
+#endif
+        } else
+            fragment->parseHTML(markup, &contextElement, parserContentPolicy);


Could this all go into the fragment->parseHTML function, not here? The abstraction is "faster way to parse", not "two different ways to parse", so I think one level deeper.

The problem is that the fast path can start parsing and populating the fragment, then encounter something it can't parse and give up. When it gives up we currently destroy the fragment and reconstruct a new one before calling fragment->parseHTML(). This reconstruction makes what you suggest not possible so we'd need another approach. I already tried calling removeChildren() on the fragment but that turned out to be too expensive. I need to find a cheaper way.

darinadler · 2023-02-11T01:31:14Z

Source/WebCore/html/HTMLInputElement.cpp

+    if (!m_inputType)
+        return false;


What is this change for? Change log doesn’t say.

Same as above.

darinadler · 2023-02-11T01:31:26Z

Source/WebCore/html/HTMLInputElement.cpp

@@ -141,7 +141,7 @@ HTMLInputElement::~HTMLInputElement()
    // a radio button that was in a form. The call to setForm(nullptr) above
    // actually adds the button to the document groups in the latter case.
    // That is inelegant, but harmless since we remove it here.
-    if (isRadioButton())
+    if (m_inputType && isRadioButton())


What is this for? Change log doesn’t say.

I'll add in the commit log. A while back, we landed a Speedometer optimization in HTMLInputElement which delays initialization of the inputType when constructed by the parser. This was to avoid creating an inputType for "text" only to figure out later during parsing that we wanted another type.

There is a case in the new HTML fast parsing case where we give up parsing a destroy a newly constructed HTMLInputElement before we're done fully parsing it. As a result, the input element may now get destroyed before its m_inputType got initialized. However, the HTMLInputElement would previously crash if m_inputType is null. It seemed like a waste to force the initialization of m_inputType in this case so I decided to go the other way and add support for an HTMLInputElement to get destroyed before its m_inputType gets initialized.

darinadler · 2023-02-11T01:32:10Z

Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp

+    FailedMaxDepth,
+    FailedBigText,
+    FailedCssPseudoDirEnabledAndDirAttributeDirty,
+    MaxValue = FailedCssPseudoDirEnabledAndDirAttributeDirty,


We should leave this MaxValue thing out since it’s never used.

darinadler · 2023-02-11T01:49:15Z

Source/WebCore/html/parser/HTMLEntityParser.cpp

@@ -146,4 +146,9 @@ size_t decodeNamedEntityToUCharArray(const char* name, UChar result[4])
    return numberOfCodePoints + appendUChar32ToUCharArray(search.mostRecentMatch()->secondValue, result + numberOfCodePoints);
 }

+void appendLegalEntityFor(UChar32 c, StringBuilder& decodedEntity)


Please use the name character rather than c.

OK.

Is it because it is a UChar32?

You didn't make the same comment for the other functions taking a c so I started wondering.

I just ran out of steam reviewing, really. I think occasionally a single letter name could be justified but in a short function I think we’d want to used a word instead of a letter.

darinadler · 2023-02-11T01:50:33Z

Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp

+            StringBuilder entity;
+            appendLegalEntityFor(res, entity);
+            for (unsigned i = 0; i < entity.length(); ++i)
+                out.append(entity[i]);


This seems inefficient. Why not just have appendLegalEntityFor take a Vector<UChar>? Making and destroying a string each time won’t be fast.

Oh good point! I'll make this change. I only used StringBuilder for consistency with the other API in that header but didn't think enough about the performance implications.

Planning on going with:

void appendLegalEntityFor(UChar32 c, Vector<UChar>& decodedEntity) { auto legalEntity = HTMLEntityParser::legalEntityFor(c); if (U_IS_BMP(legalEntity)) { decodedEntity.append(static_cast<UChar>(legalEntity)); return; } decodedEntity.append(U16_LEAD(legalEntity)); decodedEntity.append(U16_TRAIL(legalEntity)); }

darinadler · 2023-02-11T01:51:23Z

Source/WebCore/html/parser/HTMLEntityParser.h

 bool consumeHTMLEntity(SegmentedString&, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter = '\0');
+void appendLegalEntityFor(UChar32, StringBuilder& decodedEntity);


The name decodedEntity is not a good name for a StringBuilder that we will be appending to. Sure, the thing we are appending is a decoded entity.

Ok. Proposing ouputBuffer

darinadler · 2023-02-11T01:52:19Z

Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp

+        m_parsingFailed = true;
+    }
+
+    template<class ResultType> ResultType didFail(HTMLFastPathResult result, ResultType res)


These argument names: result and res, not very helpful. I suggest returnValue instead of res.

Will use ResultType -> ReturnValueType for the template parameter as well, to avoid confusion.

darinadler · 2023-02-11T01:54:20Z

Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp

+        std::sort(m_attributeNames.begin(), m_attributeNames.end());
+        if (std::adjacent_find(m_attributeNames.begin(), m_attributeNames.end()) != m_attributeNames.end()) {


Is sorting really the fastest way to find duplicates? Seems unlikely. I think a HashSet<AtomString> would be faster.

Ok. I'll add a FIXME comment for now and evaluate this in a follow-up.

Well, interestingly, switching to a HashSet appears to be a confirmed 0.8% regression on Speedometer on iOS.
My bet at the moment is that the issue with the HashSet is that it we don't have a way to clear it while retaining its internal capacity. Unlike a Vector where you can resize(0) and drop its items while retaining its capacity. This is helpful since the parser currently reuses the same parser between elements.

I think we have varieties of HashSet including ones with minimum capacity so we don't spend so much time reallocating memory. If the regression really about the memory allocation and not the hashing I think we can fix it. But maybe this is already fast enough.

But I guess all of them lack a clear function that clears in place. I do think a version that only grows and never shrinks could be useful in many places.

I am A/B testing a version with a HashSet that doesn't clear capacity so see if it is faster. If not, I guess the cost would be the hashing.

So the version that uses a HashSet and clears it while maintaining its capacity is perf-neutral. It's a little disappointing.
We may still want to do it since a HashSet would be less prone to pathological cases than vector sorting & deduping.

cdumez · 2023-02-11T02:39:27Z

Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp

+    // Used if the attribute name contains upper case ascii (which must be mapped to lower case).
+    // 32 matches that used by HTMLToken::Attribute.
+    Vector<Char, 32> m_attributeNameBuffer;
+    Vector<Attribute> m_attributeBuffer;


FYI, this vector is a big contributor to fastMalloc() in the profiles. I think ideally we'd use Vector<Attribute, 10> like we do in the full HTML parser. However, this requires some code changes to make it build/work. I am currently A/B testing such change.

webkit-early-warning-system · 2023-02-11T03:18:37Z

EWS run on current version of this PR (hash ba5d5d4)

Misc	iOS, tvOS & watchOS	macOS	Linux	Windows
🧪 style	🛠 ios	⏳ 🛠 mac	🛠 wpe	⏳ 🛠 wincairo
🧪 bindings	⏳ 🛠 ios-sim	⏳ 🛠 mac-AS-debug	🛠 gtk
🧪 webkitperl	⏳ 🧪 ios-wk2	⏳ 🧪 api-mac	🧪 gtk-wk2
	⏳ 🧪 api-ios	⏳ 🧪 mac-wk1	🧪 api-gtk
⏳ 🛠 🧪 jsc	⏳ 🛠 tv	⏳ 🧪 mac-wk2	⏳ 🛠 jsc-armv7
🛠 🧪 jsc-arm64	⏳ 🛠 tv-sim	⏳ 🧪 mac-AS-debug-wk2	⏳ 🧪 jsc-armv7-tests
	⏳ 🛠 watch	⏳ 🧪 mac-wk2-stress	⏳ 🛠 jsc-mips
	🛠 watch-sim		⏳ 🧪 jsc-mips-tests

webkit-early-warning-system · 2023-02-11T03:18:39Z

EWS run on current version of this PR (hash ba5d5d4)

Misc	iOS, tvOS & watchOS	macOS	Linux	Windows
❌ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	✅ 🛠 wpe	~~🛠 wincairo~~
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	✅ 🛠 gtk
✅ 🧪 webkitperl	~~🧪 ios-wk2~~	✅ 🧪 api-mac	~~🧪 gtk-wk2~~
	✅ 🧪 api-ios	~~🧪 mac-wk1~~	~~🧪 api-gtk~~
✅ 🛠 🧪 jsc	✅ 🛠 tv	✅ 🧪 mac-wk2	~~🛠 jsc-armv7~~
✅ 🛠 🧪 jsc-arm64	✅ 🛠 tv-sim	~~🧪 mac-AS-debug-wk2~~	~~🧪 jsc-armv7-tests~~
	✅ 🛠 watch	✅ 🧪 mac-wk2-stress	✅ 🛠 jsc-mips
✅ 🛠 🧪 merge	✅ 🛠 watch-sim		✅ 🧪 jsc-mips-tests

https://bugs.webkit.org/show_bug.cgi?id=252054 Reviewed by Darin Adler. This patch imports the new fast HTML parser from Blink: https://bugs.chromium.org/p/chromium/issues/detail?id=1407201 This patch adds a new parser that handles a subset of HTML and is only used by Element::set[Inner|Outer]HTML. While limited, the hope is this parser is useful for a number of pages. We could expand on the set of cases this handles in the future. As a sanity check, there is a debug assertion in place to make sure that the fast path generates the same output as the full HTML parser would have. This is a verified 2.25% progression on Speedometer on macOS and 2.60% progression on iOS. * Source/WTF/wtf/text/WTFString.h: * Source/WebCore/Sources.txt: * Source/WebCore/WebCore.xcodeproj/project.pbxproj: * Source/WebCore/dom/Document.h: (WebCore::Document::isDirAttributeDirty const): (WebCore::Document::setIsDirAttributeDirty): (WebCore::Document::isTemplateDocument const): * Source/WebCore/editing/markup.cpp: (WebCore::createFragmentForMarkup): * Source/WebCore/html/HTMLBDIElement.cpp: (WebCore::HTMLBDIElement::HTMLBDIElement): * Source/WebCore/html/HTMLButtonElement.cpp: (WebCore::HTMLButtonElement::create): * Source/WebCore/html/HTMLButtonElement.h: * Source/WebCore/html/HTMLElement.cpp: (WebCore::HTMLElement::childrenChanged): (WebCore::HTMLElement::dirAttributeChanged): * Source/WebCore/html/HTMLInputElement.cpp: (WebCore::HTMLInputElement::~HTMLInputElement): (WebCore::HTMLInputElement::needsSuspensionCallback): We have an optimization in HTMLInputElement which delays the initialization of m_inputType when constructed by the HTML parser. However, in the fast-parsing case, we may now fail parsing (due to limited support) in the middle of parsing an HTMLInputElement. As a result, we may end up destroying an HTMLInputElement before its m_inputType gets initialized, which would cause a crash. I added a couple of null checks for m_inputType so that the HTMLInputElement destructor can run even if m_inputType wasn't initialized yet. * Source/WebCore/html/HTMLLabelElement.cpp: (WebCore::HTMLLabelElement::create): * Source/WebCore/html/HTMLLabelElement.h: * Source/WebCore/html/HTMLSelectElement.cpp: (WebCore::HTMLSelectElement::create): * Source/WebCore/html/HTMLSelectElement.h: * Source/WebCore/html/parser/HTMLDocumentParserFastPath.cpp: Added. (WebCore::operator==): (WebCore::onlyContainsLowercaseASCIILetters): (WebCore::tagNameHash): (WebCore::HTMLFastPathParser::HTMLFastPathParser): (WebCore::HTMLFastPathParser::parse): (WebCore::HTMLFastPathParser::parseResult const): (WebCore::HTMLFastPathParser::TagInfo::Tag::create): (WebCore::HTMLFastPathParser::TagInfo::Tag::allowedInPhrasingOrFlowContent): (WebCore::HTMLFastPathParser::TagInfo::Tag::allowedInFlowContent): (WebCore::HTMLFastPathParser::TagInfo::ContainerTag::parseChild): (WebCore::HTMLFastPathParser::TagInfo::ContainsPhrasingContentTag::parseChild): (WebCore::HTMLFastPathParser::TagInfo::A::parseChild): (WebCore::HTMLFastPathParser::TagInfo::AWithPhrasingContent::parseChild): (WebCore::HTMLFastPathParser::TagInfo::B::create): (WebCore::HTMLFastPathParser::TagInfo::Footer::create): (WebCore::HTMLFastPathParser::TagInfo::I::create): (WebCore::HTMLFastPathParser::TagInfo::Input::create): (WebCore::HTMLFastPathParser::TagInfo::Option::parseChild): (WebCore::HTMLFastPathParser::TagInfo::Ol::parseChild): (WebCore::HTMLFastPathParser::TagInfo::Select::parseChild): (WebCore::HTMLFastPathParser::TagInfo::Strong::create): (WebCore::HTMLFastPathParser::TagInfo::Ul::parseChild): (WebCore::HTMLFastPathParser::parseCompleteInput): (WebCore::HTMLFastPathParser::isWhitespace): (WebCore::HTMLFastPathParser::isValidUnquotedAttributeValueChar): (WebCore::HTMLFastPathParser::isValidAttributeNameChar): (WebCore::HTMLFastPathParser::isCharAfterTagNameOrAttribute): (WebCore::HTMLFastPathParser::isCharAfterUnquotedAttribute): (WebCore::HTMLFastPathParser::skipWhitespace): (WebCore::HTMLFastPathParser::scanText): (WebCore::HTMLFastPathParser::scanEscapedText): (WebCore::HTMLFastPathParser::scanTagName): (WebCore::HTMLFastPathParser::scanAttributeName): (WebCore::HTMLFastPathParser::scanAttributeValue): (WebCore::HTMLFastPathParser::scanEscapedAttributeValue): (WebCore::HTMLFastPathParser::scanHTMLCharacterReference): (WebCore::HTMLFastPathParser::didFail): (WebCore::HTMLFastPathParser::peekNext): (WebCore::HTMLFastPathParser::consumeNext): (WebCore::HTMLFastPathParser::parseChildren): (WebCore::HTMLFastPathParser::processAttribute): (WebCore::HTMLFastPathParser::parseAttributes): (WebCore::HTMLFastPathParser::parseSpecificElements): (WebCore::HTMLFastPathParser::parseElement): (WebCore::HTMLFastPathParser::parseElementAfterTagName): (WebCore::HTMLFastPathParser::parseContainerElement): (WebCore::HTMLFastPathParser::parseVoidElement): (WebCore::canUseFastPath): (WebCore::tryFastParsingHTMLFragmentImpl): (WebCore::tryFastParsingHTMLFragment): * Source/WebCore/html/parser/HTMLDocumentParserFastPath.h: Added. * Source/WebCore/html/parser/HTMLEntityParser.cpp: (WebCore::appendLegalEntityFor): * Source/WebCore/html/parser/HTMLEntityParser.h: * Source/WebCore/html/parser/HTMLNameCache.h: (WebCore::HTMLNameCache::makeAttributeQualifiedName): (WebCore::HTMLNameCache::makeQualifiedName): Canonical link: https://commits.webkit.org/260148@main

webkit-commit-queue · 2023-02-11T07:53:19Z

Committed 260148@main (236ad6c): https://commits.webkit.org/260148@main

Reviewed commits have been landed. Closing PR #9926 and removing active labels.

cdumez self-assigned this Feb 10, 2023

cdumez added the WebCore Misc. For miscellaneous bugs in the WebCore framework (and not JavaScriptCore or WebKit). label Feb 10, 2023

cdumez force-pushed the Blink_FastHTMLParser branch from 7d85c96 to 64b7e15 Compare February 10, 2023 17:12

cdumez force-pushed the Blink_FastHTMLParser branch from 64b7e15 to be56d57 Compare February 10, 2023 17:50

weinig reviewed Feb 10, 2023

View reviewed changes

cdumez marked this pull request as ready for review February 11, 2023 00:45

cdumez requested a review from rniwa as a code owner February 11, 2023 00:45

cdumez requested review from darinadler and geoffreygaren February 11, 2023 00:45

darinadler approved these changes Feb 11, 2023

View reviewed changes

cdumez commented Feb 11, 2023

View reviewed changes

cdumez force-pushed the Blink_FastHTMLParser branch from be56d57 to ba5d5d4 Compare February 11, 2023 03:17

cdumez added the merge-queue Applied to send a pull request to merge-queue label Feb 11, 2023

webkit-early-warning-system force-pushed the Blink_FastHTMLParser branch from ba5d5d4 to 236ad6c Compare February 11, 2023 07:53

webkit-early-warning-system merged commit 236ad6c into WebKit:main Feb 11, 2023

webkit-commit-queue removed the merge-queue Applied to send a pull request to merge-queue label Feb 11, 2023

		bool consumeHTMLEntity(SegmentedString&, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter = '\0');
		void appendLegalEntityFor(UChar32, StringBuilder& decodedEntity);

		std::sort(m_attributeNames.begin(), m_attributeNames.end());
		if (std::adjacent_find(m_attributeNames.begin(), m_attributeNames.end()) != m_attributeNames.end()) {

Introduce fast path for innerHTML #9926

Introduce fast path for innerHTML #9926

Conversation

cdumez commented Feb 10, 2023 • edited by webkit-early-warning-system Loading

webkit-early-warning-system commented Feb 10, 2023 • edited Loading

webkit-early-warning-system commented Feb 10, 2023 • edited Loading

webkit-early-warning-system commented Feb 10, 2023 • edited Loading

weinig commented Feb 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cdumez commented Feb 10, 2023

Choose a reason for hiding this comment

cdumez Feb 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darinadler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darinadler Feb 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

webkit-early-warning-system commented Feb 11, 2023 • edited Loading

webkit-early-warning-system commented Feb 11, 2023 • edited Loading

webkit-commit-queue commented Feb 11, 2023

cdumez commented Feb 10, 2023 •

edited by webkit-early-warning-system

Loading

webkit-early-warning-system commented Feb 10, 2023 •

edited

Loading

webkit-early-warning-system commented Feb 10, 2023 •

edited

Loading

webkit-early-warning-system commented Feb 10, 2023 •

edited

Loading

cdumez Feb 10, 2023 •

edited

Loading

darinadler Feb 11, 2023 •

edited

Loading

webkit-early-warning-system commented Feb 11, 2023 •

edited

Loading

webkit-early-warning-system commented Feb 11, 2023 •

edited

Loading