Previously, url.Parse(baseUrl) was called on every self-closing tags, and on
most opening tags, accounting for around 15% of the CPU time spent in
processor.ProcessFeedEntries
The `sanitizer.StripTags` function is calling `html.NewTokenizer`, which is
allocating a 4096 bytes buffer on the heap, as well a running a complex state
machine to tokenize html. There is no need to do all of this for empty strings.
This commit also fixes a TrimSpace/StripTags call inversion.
A bit more than 10% of processor.ProcessFeedEntries' CPU time is spent in
urlcleaner.RemoveTrackingParameters, specifically calling url.Parse, so let's
extract this operation outside of it, and do it once before calling
urlcleaner.RemoveTrackingParameters multiple times.
Co-authored-by: Frédéric Guillot <f@miniflux.net>
Calls to urllib.AbsoluteURL take a bit less than 10% of the time spent in
parser.ParseFeed, completely parsing an url only to check if it's absolute, and
if not, to make it so.
Checking if it starts with `https://` or `http://` is usually enough to find if
an url is absolute, and if is doesn't, it's always possible to fall back to
urllib.AbsoluteURL.
This also comes with the advantage of reducing heap allocations, as most of the
time spent in urllib.AbsoluteURL is heap-related (de)allocations.
Instead of using bytes.Map which is returning a copy of the provided []byte,
use a custom in-place implementation, as the bytes.Map call is taking around
25% of rss.Parse
io.ReadAll is growing the underlying buffer progressively, while
io.Copy is able to allocate it in one go, which is significantly faster.
io.ReadAll is currently accounting for around 10% of the CPU time of rss.Parse
Rationale: Opening links in the current tab is the default browser behavior.
Using `target="_blank"` on external links can lead to accessibility issues and override user preferences. It may also interfere with assistive technologies and expected browser behavior.
To maintain backward compatibility, this option is enabled by default (`true`), which adds `target="_blank"` to links.
The [strings.Fields](https://pkg.go.dev/strings#Fields) considers `'\t', '\n',
'\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).` as spaces, so no need to
remove them beforehand.
This is a continuation of f2f60a8f73