RESOLVED FIXED55898
REGRESSION (HTML5 tree builder): Text selection in a large text document is extremely slow
https://bugs.webkit.org/show_bug.cgi?id=55898
Summary REGRESSION (HTML5 tree builder): Text selection in a large text document is e...
Alexey Proskuryakov
Reported 2011-03-07 13:08:45 PST
See bug URL. Selecting near the end of document is extremely slow. This regressed because a fix for bug 12833 has been reverted in order to match HTML5. Current releases of Safari and Firefox limit the size of parser inserted text nodes. This has been forbidden in HTML5 and dropped in implementations because client code can break when trying to match text that crosses an engine node boundary, and we shouldn't surprise JS developers like that. I'm not sure if this was a good tradeoff: 1) Most client code needs to deal with text crossing node boundaries anyway (e.g. for style changes, or for line breaks, or for DOM-inserted nodes). 2) We shouldn't surprise client code developers with huge amounts of text either. Just like we bend backwards to spare them from threading or reentrancy issues, we can limit the harm from non-linear algorithms here. 3) The failure mode for an O(N^2) algorithm on an unlimited length text (freezing) is often worse than missing a match in a huge text due to hitting a boundary. This has also been affecting clients of Objective C WebKit API (<rdar://problem/9013049> for those who can see it). There isn't much difference from JavaScript case, with one twist: when using Objective C autorelease pools, a client can easily take a huge hit in memory consumption, in addition to CPU performance. Regardless of what we do long term, I think that we need to re-introduce the node size limit in WebKit until we can sort out internal performance, and get clients updated.
Attachments
proposed fix (21.39 KB, patch)
2011-03-07 16:45 PST, Alexey Proskuryakov
no flags
Adam Barth
Comment 1 2011-03-07 13:13:38 PST
I gave ap some pointers on IRC to where in the code needs to be changed. Eric seems to think he did this already, but I can't find any evidence of that.
Boris Zbarsky
Comment 2 2011-03-07 13:18:48 PST
For what it's worth, the most common compat issue we had with the old behavior in Gecko was sites using XHR and then extracting the text from a node in the responseXML. They _could_ use .textContent, but tended to use .firstChild.data, and this would break in random ways depending on the data... We did have to fix a few of the resulting performance issues in Gecko 2, yes. Wasn't that big a deal.
Eric Seidel (no email)
Comment 3 2011-03-07 15:51:11 PST
I'll dig around. I swear I implemented text node splitting when doing the new parser.
Alexey Proskuryakov
Comment 4 2011-03-07 16:34:10 PST
I have a patch that I'm testing now.
Alexey Proskuryakov
Comment 5 2011-03-07 16:45:38 PST
Created attachment 84996 [details] proposed fix Even if we improve text selection performance, add a quirk for Mail, and decide to not split the nodes, this code will still be needed for the quirk.
Darin Adler
Comment 6 2011-03-07 18:19:10 PST
Comment on attachment 84996 [details] proposed fix Looks good.
Eric Seidel (no email)
Comment 7 2011-03-07 18:56:06 PST
Comment on attachment 84996 [details] proposed fix OK.
WebKit Commit Bot
Comment 8 2011-03-07 19:18:05 PST
Comment on attachment 84996 [details] proposed fix Clearing flags on attachment: 84996 Committed r80526: <http://trac.webkit.org/changeset/80526>
WebKit Commit Bot
Comment 9 2011-03-07 19:18:11 PST
All reviewed patches have been landed. Closing bug.
Note You need to log in before you can comment on or make changes to this bug.