1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-08-06 17:41:00 +00:00
Commit graph

54 commits

Author SHA1 Message Date
Julien Voisin
a8b4e88742
perf(sanitizer): improve the performances of the sanitizer (#3497)
- Grow the underlying buffer of SanitizeHTML's strings.Builder to 3/4 of the
  raw HTML from the start, to reduce the amount of iterative allocations. This
  number is a complete guesstimation, but it sounds reasonable to me.
- Add a `absoluteURLParsedBase` function to avoid parsing baseURL over and over.
2025-07-07 15:21:13 -07:00
Frédéric Guillot
cb617ff6e0 test(sanitizer): enhance tests for image width and height attributes 2025-07-01 20:52:45 -07:00
jvoisin
435a950d64 refactor(sanitizer): minor refactorization
Use a proper switch-case instead of a bunch of if.
2025-07-01 19:48:55 -07:00
jvoisin
cdb57b3843 perf(sanitizer): minor simplifications of the sanitizer
- Factorize some conditions
- Remove useless `default` case and move the return at the end of the functions
- Use strings.CutPrefix instead of strings.HasPrefix + strings.TrimPrefix
- Use switch-case constructs instead of slices.Contains, as this reduces the
  complexity of the functions and allows them to be inlined, as well as helping
  the compiler to optimize them, as it sucks at interprocedural optimizations.
2025-06-17 17:42:45 -07:00
jvoisin
152ef578d2 feat(sanitizer): consider images of size 0x0 as pixel trackers 2025-06-17 17:32:00 -07:00
jvoisin
b296f21e98 refactor(internal): add an urllib.DomainWithoutWWW function 2025-06-17 17:27:36 -07:00
jvoisin
237672a62c perf(sanitizer): use a switch-case instead of a map
This removes a heap allocation, and should be way faster. It also makes the
code shorted/simpler.
2025-06-16 14:54:48 -07:00
jvoisin
e9d4a130fd refactor(sanitizer): remove two useless www. prefixes
No need to have those prefixes, as the check is for substrings, so removing
them will improve the amount of matches.
2025-06-16 14:53:15 -07:00
Frédéric Guillot
b95c9023ee refactor(sanitizer): make isValidAttribute() check O(1) 2025-06-13 21:44:25 -07:00
Frédéric Guillot
3538c4271b refactor(sanitizer): use global variables to avoid recreating slices on every call 2025-06-13 21:34:07 -07:00
Frédéric Guillot
ac44507af2 refactor(sanitizer): use a map for iframe allow list 2025-06-13 21:05:23 -07:00
jvoisin
44c48d109f perf(sanitizer): extract a call to url.Parse and make intensive use of it
Previously, url.Parse(baseUrl) was called on every self-closing tags, and on
most opening tags, accounting for around 15% of the CPU time spent in
processor.ProcessFeedEntries
2025-06-13 17:05:17 -07:00
jvoisin
7c857bdc72 perf(reader): optimize RemoveTrackingParameters
A bit more than 10% of processor.ProcessFeedEntries' CPU time is spent in
urlcleaner.RemoveTrackingParameters, specifically calling url.Parse, so let's
extract this operation outside of it, and do it once before calling
urlcleaner.RemoveTrackingParameters multiple times.

Co-authored-by: Frédéric Guillot <f@miniflux.net>
2025-06-10 19:29:25 -07:00
Frédéric Guillot
cecc18420d feat(sanitizer): add validation for empty width and height attributes in img tags 2025-06-09 20:38:17 -07:00
Frédéric Guillot
d53fd17e10 feat(sanitizer): validate MathML XML namespace 2025-06-09 20:28:54 -07:00
Frédéric Guillot
21d22d7f0b feat(sanitizer): add support for fetchpriority and decoding attributes in img tags 2025-06-09 20:12:15 -07:00
Frédéric Guillot
8db637cb39 feat(ui): add user setting to control target="_blank" on links
Rationale: Opening links in the current tab is the default browser behavior.

Using `target="_blank"` on external links can lead to accessibility issues and override user preferences. It may also interfere with assistive technologies and expected browser behavior.

To maintain backward compatibility, this option is enabled by default (`true`), which adds `target="_blank"` to links.
2025-06-08 21:07:11 -07:00
Frédéric Guillot
828a4334db fix(sanitizer): MathML tags are not fully supported by golang.org/x/net/html
See https://github.com/golang/net/blob/master/html/atom/gen.go
and https://github.com/golang/net/blob/master/html/atom/table.go
2025-05-06 21:18:19 -07:00
jvoisin
d1dc369bb2 feat(sanitizer): add MathML tags to the sanitizer
This was found by reading the article pointed by https://lobste.rs/s/nobvmp/how_prime_factorizations_govern_collatz
2025-05-06 20:19:56 -07:00
jvoisin
ff2dfe977b feat: remove the ref parameter from url
This is used by (at least) Ghost (https://forum.ghost.org/t/ref-parameter-being-added-to-links/38335)

Examples:
- https://blog.exploits.club/exploits-club-weekly-newsletter-66-mitigations-galore-dirtycow-revisited-program-analysis-for-uafs-and-more/
- https://labs.watchtowr.com/is-the-sofistication-in-the-room-with-us-x-forwarded-for-and-ivanti-connect-secure-cve-2025-22457/
2025-05-06 19:59:55 -07:00
NoelNegash
81c7669945
feat(sanitized): allow Spotify iframes 2025-05-02 16:25:17 -07:00
Frédéric Guillot
e342a4f143 fix: address minor issues detected by Go linters 2025-03-24 20:48:46 -07:00
jvoisin
f916373f55 fix: allow the <b> tag 2025-03-06 19:27:30 -08:00
jvoisin
5353211206 fix: allow the <u> tag in feeds 2025-03-06 19:26:26 -08:00
Frédéric Guillot
462ba8d7f7 feat(sanitizer): allow img tags with only a srcset and no src attribute 2025-02-15 18:03:36 -08:00
Frédéric Guillot
e777f12490 fix(sanitizer): correct HTML tag name from tfooter to tfoot 2025-02-06 21:16:29 -08:00
Frédéric Guillot
1faccc7eca fix(sanitizer): non-allowed attributes are not properly stripped
Regression introduced in commit 58178d90cb
2025-01-22 20:50:38 -08:00
Frédéric Guillot
5549f75dd7 fix(sanitizer): allow <hr> tags 2024-12-27 13:56:06 -08:00
Julien Voisin
728423339a
refactor(sanitizer): improve rewriteIframeURL()
- Use `url.Parse` instead of a regex, as this is much faster and way more robust
- Add support for Vimeo's Do Not Track parameter
2024-12-09 17:14:54 -08:00
Julien Voisin
dea46ac0ea
refactor: optimize sanitizeAttributes
- Use string concatenation instead of `Sprintf`, as this is much faster, and the
  call to `Sprintf` is responsible for 30% of the CPU time of the function
- Anchor the youtube regex, to allow it to bail early, as this also account for
  another 30% of the CPU time. It might be worth chaining calls to `TrimPrefix`
  and check if the string has been trimmed instead of using a regex, to speed
  things up even more, but this needs to be benchmarked properly.
2024-12-08 14:42:18 -08:00
jvoisin
2f56ebd3a6 Remove a now-useless function 2024-12-07 16:50:18 -08:00
jvoisin
059f5c0905 Inline a condition 2024-12-07 16:50:18 -08:00
jvoisin
58178d90cb Refactor Sanitize
- Use `token.String()` instead of `html.EscapeString(token.Data)`
- Refactor conditions to highlight their similitude, enabling further
  refactoring

This refactoring brings forth at least one bug: `tagStack` is never emptied.
2024-12-07 16:50:18 -08:00
jvoisin
cc885bbabb config.Opts is guaranteed to never be nil 2024-12-07 16:50:18 -08:00
jvoisin
0e185849b4 Google+ isn't a thing anymore 2024-12-07 16:50:18 -08:00
jvoisin
d0984f29da Simplify isValidTag 2024-12-07 16:50:18 -08:00
jvoisin
902ca63c45 Inline a function and fix a bug in it
The `isAnchor` function's first parameter was always `a`, instead of being
passed `tagName`. As this function is a single line and was only called in a
single place, it can be inlined.
2024-12-07 16:50:18 -08:00
jvoisin
2314500515 Merge two conditions 2024-12-07 16:50:18 -08:00
jvoisin
787d373211 Change the scope of a variable 2024-12-07 16:50:18 -08:00
Julien Voisin
331c831c23
refactor(sanitizer): simplify hasRequiredAttributes
This function takes around 1.5% of the total CPU time on my instance, and most
of it is spent in `mapassign_faststr` to initialize the `map`. This is replaced
with a switch-case construct, that should be both significantly faster as well
as pretty dull in term of memory consumption.
2024-12-07 16:30:15 -08:00
Frédéric Guillot
92f3dc26e4 feat: add support for aside HTML element in entry content 2024-07-25 21:11:37 -07:00
Frédéric Guillot
c0f6e32a99 feat: remove well-known URL parameter trackers 2024-07-19 21:35:47 -07:00
JohnnyJayJay
ee5e18ea9f sanitizer: add support for HTML hidden attribute
This commit adjusts the `Sanitize` function to skip tags with the
`hidden` attribute, similar to how it skips blocked tags and their
contents.
2024-06-21 14:00:40 -07:00
Frédéric Guillot
b1e73fafdf Enable go-critic linter and fix various issues detected 2024-03-17 13:52:34 -07:00
jvoisin
3d0126be0b Speed the sanitizer up a bit, again
- allow youtube urls to start with `www`
- use `strings.Builder` instead of a `bytes.Buffer`
- use a `strings.NewReader` instead of a `bytes.NewBufferString`
- sprinkles a couple of `continue` to make the code-flow more obvious
- inline calls to `inList`, and put their parameters in the right order
- simplify isPixelTracker
- simplify `isValidIframeSource`, by extracting the hostname and comparing it
  directly, instead of using the full url and checking if it starts with
  multiple variations of the same one (`//`, `http:`, `https://` multiplied by
  ``/`www.`)
- add a benchmark
2024-03-05 19:31:50 -08:00
Frédéric Guillot
c493f8921e Add missing regex anchor detected by CodeQL 2024-02-28 20:50:17 -08:00
jvoisin
f12d5131b0 Divide the sanitization time by 3
Instead of having to allocate a ~100 keys map containing possibly dynamic
values (at least to the go compiler), allocate it once in a global variable.
This significantly speeds things up, by reducing the garbage
collector/allocator involvements.

Local synthetic benchmarks have shown a improvements from 38% of wall time to only
12%.
2024-02-28 20:00:13 -08:00
jvoisin
b04550e2f2 Use %q instead of "%s" 2024-02-28 19:47:30 -08:00
jvoisin
54b5be5e7d Significantly simplify/speed up the sanitizer
- Use constant time access for maps instead of iterating on them
- Build a ~large whitelist map inline instead of constructing it item by item
  (and remove a duplicate key/value pair)
- Use `slices` instead of hand-rolled loops
2024-02-25 17:29:46 -08:00
Kristof Mattei
d53ad3b79a fix: clicking youtube links in iframes returns ERR_BLOCKED_BY_RESPONSE 2023-12-10 16:59:58 -08:00