1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-08-06 17:41:00 +00:00
Commit graph

61 commits

Author SHA1 Message Date
jvoisin
43546976d2 refactor(tests): use b.Loop() instead of for range b.N
See https://tip.golang.org/doc/go1.24#new-benchmark-function
2025-06-18 20:12:55 -07:00
Frédéric Guillot
6af4d69c39 test(sanitizer): add test case to cover Vimeo iframe rewrite without query string 2025-06-17 17:55:39 -07:00
Frédéric Guillot
27015a5e34 test(sanitizer): add unit test for 0x0 pixel tracker 2025-06-17 17:42:55 -07:00
jvoisin
cdb57b3843 perf(sanitizer): minor simplifications of the sanitizer
- Factorize some conditions
- Remove useless `default` case and move the return at the end of the functions
- Use strings.CutPrefix instead of strings.HasPrefix + strings.TrimPrefix
- Use switch-case constructs instead of slices.Contains, as this reduces the
  complexity of the functions and allows them to be inlined, as well as helping
  the compiler to optimize them, as it sucks at interprocedural optimizations.
2025-06-17 17:42:45 -07:00
jvoisin
152ef578d2 feat(sanitizer): consider images of size 0x0 as pixel trackers 2025-06-17 17:32:00 -07:00
jvoisin
b296f21e98 refactor(internal): add an urllib.DomainWithoutWWW function 2025-06-17 17:27:36 -07:00
jvoisin
237672a62c perf(sanitizer): use a switch-case instead of a map
This removes a heap allocation, and should be way faster. It also makes the
code shorted/simpler.
2025-06-16 14:54:48 -07:00
jvoisin
e9d4a130fd refactor(sanitizer): remove two useless www. prefixes
No need to have those prefixes, as the check is for substrings, so removing
them will improve the amount of matches.
2025-06-16 14:53:15 -07:00
Frédéric Guillot
b95c9023ee refactor(sanitizer): make isValidAttribute() check O(1) 2025-06-13 21:44:25 -07:00
Frédéric Guillot
3538c4271b refactor(sanitizer): use global variables to avoid recreating slices on every call 2025-06-13 21:34:07 -07:00
Frédéric Guillot
ac44507af2 refactor(sanitizer): use a map for iframe allow list 2025-06-13 21:05:23 -07:00
jvoisin
44c48d109f perf(sanitizer): extract a call to url.Parse and make intensive use of it
Previously, url.Parse(baseUrl) was called on every self-closing tags, and on
most opening tags, accounting for around 15% of the CPU time spent in
processor.ProcessFeedEntries
2025-06-13 17:05:17 -07:00
jvoisin
7c857bdc72 perf(reader): optimize RemoveTrackingParameters
A bit more than 10% of processor.ProcessFeedEntries' CPU time is spent in
urlcleaner.RemoveTrackingParameters, specifically calling url.Parse, so let's
extract this operation outside of it, and do it once before calling
urlcleaner.RemoveTrackingParameters multiple times.

Co-authored-by: Frédéric Guillot <f@miniflux.net>
2025-06-10 19:29:25 -07:00
Frédéric Guillot
cecc18420d feat(sanitizer): add validation for empty width and height attributes in img tags 2025-06-09 20:38:17 -07:00
Frédéric Guillot
d53fd17e10 feat(sanitizer): validate MathML XML namespace 2025-06-09 20:28:54 -07:00
Frédéric Guillot
21d22d7f0b feat(sanitizer): add support for fetchpriority and decoding attributes in img tags 2025-06-09 20:12:15 -07:00
Frédéric Guillot
8db637cb39 feat(ui): add user setting to control target="_blank" on links
Rationale: Opening links in the current tab is the default browser behavior.

Using `target="_blank"` on external links can lead to accessibility issues and override user preferences. It may also interfere with assistive technologies and expected browser behavior.

To maintain backward compatibility, this option is enabled by default (`true`), which adds `target="_blank"` to links.
2025-06-08 21:07:11 -07:00
Frédéric Guillot
828a4334db fix(sanitizer): MathML tags are not fully supported by golang.org/x/net/html
See https://github.com/golang/net/blob/master/html/atom/gen.go
and https://github.com/golang/net/blob/master/html/atom/table.go
2025-05-06 21:18:19 -07:00
jvoisin
d1dc369bb2 feat(sanitizer): add MathML tags to the sanitizer
This was found by reading the article pointed by https://lobste.rs/s/nobvmp/how_prime_factorizations_govern_collatz
2025-05-06 20:19:56 -07:00
jvoisin
ff2dfe977b feat: remove the ref parameter from url
This is used by (at least) Ghost (https://forum.ghost.org/t/ref-parameter-being-added-to-links/38335)

Examples:
- https://blog.exploits.club/exploits-club-weekly-newsletter-66-mitigations-galore-dirtycow-revisited-program-analysis-for-uafs-and-more/
- https://labs.watchtowr.com/is-the-sofistication-in-the-room-with-us-x-forwarded-for-and-ivanti-connect-secure-cve-2025-22457/
2025-05-06 19:59:55 -07:00
NoelNegash
81c7669945
feat(sanitized): allow Spotify iframes 2025-05-02 16:25:17 -07:00
Frédéric Guillot
e342a4f143 fix: address minor issues detected by Go linters 2025-03-24 20:48:46 -07:00
jvoisin
f916373f55 fix: allow the <b> tag 2025-03-06 19:27:30 -08:00
jvoisin
5353211206 fix: allow the <u> tag in feeds 2025-03-06 19:26:26 -08:00
jvoisin
4a77e937af perf(sanitizer): remove two useless calls to strings.ReplaceAll
The [strings.Fields](https://pkg.go.dev/strings#Fields) considers `'\t', '\n',
'\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).` as spaces, so no need to
remove them beforehand.

This is a continuation of f2f60a8f73
2025-02-18 19:42:39 -08:00
Frédéric Guillot
462ba8d7f7 feat(sanitizer): allow img tags with only a srcset and no src attribute 2025-02-15 18:03:36 -08:00
Frédéric Guillot
f2f60a8f73 feat(sanitizer): improve text truncation with better space handling 2025-02-06 21:21:49 -08:00
Frédéric Guillot
e777f12490 fix(sanitizer): correct HTML tag name from tfooter to tfoot 2025-02-06 21:16:29 -08:00
Frédéric Guillot
1faccc7eca fix(sanitizer): non-allowed attributes are not properly stripped
Regression introduced in commit 58178d90cb
2025-01-22 20:50:38 -08:00
Julien Voisin
f116f7dd6a
test(sanitizer): add a fuzzer 2025-01-11 17:19:31 -08:00
Frédéric Guillot
5549f75dd7 fix(sanitizer): allow <hr> tags 2024-12-27 13:56:06 -08:00
Julien Voisin
728423339a
refactor(sanitizer): improve rewriteIframeURL()
- Use `url.Parse` instead of a regex, as this is much faster and way more robust
- Add support for Vimeo's Do Not Track parameter
2024-12-09 17:14:54 -08:00
Julien Voisin
dea46ac0ea
refactor: optimize sanitizeAttributes
- Use string concatenation instead of `Sprintf`, as this is much faster, and the
  call to `Sprintf` is responsible for 30% of the CPU time of the function
- Anchor the youtube regex, to allow it to bail early, as this also account for
  another 30% of the CPU time. It might be worth chaining calls to `TrimPrefix`
  and check if the string has been trimmed instead of using a regex, to speed
  things up even more, but this needs to be benchmarked properly.
2024-12-08 14:42:18 -08:00
jvoisin
2f56ebd3a6 Remove a now-useless function 2024-12-07 16:50:18 -08:00
jvoisin
059f5c0905 Inline a condition 2024-12-07 16:50:18 -08:00
jvoisin
58178d90cb Refactor Sanitize
- Use `token.String()` instead of `html.EscapeString(token.Data)`
- Refactor conditions to highlight their similitude, enabling further
  refactoring

This refactoring brings forth at least one bug: `tagStack` is never emptied.
2024-12-07 16:50:18 -08:00
jvoisin
cc885bbabb config.Opts is guaranteed to never be nil 2024-12-07 16:50:18 -08:00
jvoisin
0e185849b4 Google+ isn't a thing anymore 2024-12-07 16:50:18 -08:00
jvoisin
d0984f29da Simplify isValidTag 2024-12-07 16:50:18 -08:00
jvoisin
902ca63c45 Inline a function and fix a bug in it
The `isAnchor` function's first parameter was always `a`, instead of being
passed `tagName`. As this function is a single line and was only called in a
single place, it can be inlined.
2024-12-07 16:50:18 -08:00
jvoisin
2314500515 Merge two conditions 2024-12-07 16:50:18 -08:00
jvoisin
787d373211 Change the scope of a variable 2024-12-07 16:50:18 -08:00
Julien Voisin
bfb429b919
refactor(sanitizer): optimize internal/reader/sanitizer/strip_tags.go
- Use strings instead of doing string->bytes->string
- Use a strings.Builder to build the output
2024-12-07 16:31:48 -08:00
Julien Voisin
331c831c23
refactor(sanitizer): simplify hasRequiredAttributes
This function takes around 1.5% of the total CPU time on my instance, and most
of it is spent in `mapassign_faststr` to initialize the `map`. This is replaced
with a switch-case construct, that should be both significantly faster as well
as pretty dull in term of memory consumption.
2024-12-07 16:30:15 -08:00
Julien Voisin
92a49d7e69
refactor(sanitizer): micro-optimizations of internal/reader/sanitizer/srcset.go
- Pre-allocate a slice
- Inline a local variable
- Remove a superfluous call to `strings.TrimSpace`
- Simplify some conditions via a switch-case construct
2024-12-07 16:27:56 -08:00
Frédéric Guillot
92f3dc26e4 feat: add support for aside HTML element in entry content 2024-07-25 21:11:37 -07:00
Frédéric Guillot
c0f6e32a99 feat: remove well-known URL parameter trackers 2024-07-19 21:35:47 -07:00
JohnnyJayJay
ee5e18ea9f sanitizer: add support for HTML hidden attribute
This commit adjusts the `Sanitize` function to skip tags with the
`hidden` attribute, similar to how it skips blocked tags and their
contents.
2024-06-21 14:00:40 -07:00
Frédéric Guillot
b1e73fafdf Enable go-critic linter and fix various issues detected 2024-03-17 13:52:34 -07:00
jvoisin
3d0126be0b Speed the sanitizer up a bit, again
- allow youtube urls to start with `www`
- use `strings.Builder` instead of a `bytes.Buffer`
- use a `strings.NewReader` instead of a `bytes.NewBufferString`
- sprinkles a couple of `continue` to make the code-flow more obvious
- inline calls to `inList`, and put their parameters in the right order
- simplify isPixelTracker
- simplify `isValidIframeSource`, by extracting the hostname and comparing it
  directly, instead of using the full url and checking if it starts with
  multiple variations of the same one (`//`, `http:`, `https://` multiplied by
  ``/`www.`)
- add a benchmark
2024-03-05 19:31:50 -08:00