1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-06-27 16:36:00 +00:00
Commit graph

227 commits

Author SHA1 Message Date
jvoisin
8a014c6abc perf(readability): minor regex improvement
- Improve the check for tags by matching only if its name is followed either by
  a space, a slash or a closing angle
- Use an anonymous group
2025-06-12 19:13:58 -07:00
jvoisin
60ad19c427 perf(rss): early return when looking for an item's author
The `sanitizer.StripTags` function is calling `html.NewTokenizer`, which is
allocating a 4096 bytes buffer on the heap, as well a running a complex state
machine to tokenize html. There is no need to do all of this for empty strings.

This commit also fixes a TrimSpace/StripTags call inversion.
2025-06-11 19:06:15 -07:00
jvoisin
f40c1e7f63 fix(reader): fix a crash introduced by d59990f1
And add a fuzzer and a testcase as well to validate that nothing breaks.
2025-06-11 19:04:46 -07:00
Frédéric Guillot
a4d16cc5c1 refactor(rewrite): rename Rewriter function to ApplyContentRewriteRules 2025-06-10 20:28:15 -07:00
jvoisin
7c857bdc72 perf(reader): optimize RemoveTrackingParameters
A bit more than 10% of processor.ProcessFeedEntries' CPU time is spent in
urlcleaner.RemoveTrackingParameters, specifically calling url.Parse, so let's
extract this operation outside of it, and do it once before calling
urlcleaner.RemoveTrackingParameters multiple times.

Co-authored-by: Frédéric Guillot <f@miniflux.net>
2025-06-10 19:29:25 -07:00
jvoisin
0caadf82f2 perf(rss): optimize a bit BuildFeed
Calls to urllib.AbsoluteURL take a bit less than 10% of the time spent in
parser.ParseFeed, completely parsing an url only to check if it's absolute, and
if not, to make it so.

Checking if it starts with `https://` or `http://` is usually enough to find if
an url is absolute, and if is doesn't, it's always possible to fall back to
urllib.AbsoluteURL.

This also comes with the advantage of reducing heap allocations, as most of the
time spent in urllib.AbsoluteURL is heap-related (de)allocations.
2025-06-10 19:23:16 -07:00
Frédéric Guillot
cecc18420d feat(sanitizer): add validation for empty width and height attributes in img tags 2025-06-09 20:38:17 -07:00
Frédéric Guillot
d53fd17e10 feat(sanitizer): validate MathML XML namespace 2025-06-09 20:28:54 -07:00
Frédéric Guillot
21d22d7f0b feat(sanitizer): add support for fetchpriority and decoding attributes in img tags 2025-06-09 20:12:15 -07:00
jvoisin
d59990f1dd perf(xml): optimize xml filtering
Instead of using bytes.Map which is returning a copy of the provided []byte,
use a custom in-place implementation, as the bytes.Map call is taking around
25% of rss.Parse
2025-06-09 13:49:10 -07:00
jvoisin
49085daefe perf(xml): optimized NewXMLDecoder
io.ReadAll is growing the underlying buffer progressively, while
io.Copy is able to allocate it in one go, which is significantly faster.
io.ReadAll is currently accounting for around 10% of the CPU time of rss.Parse
2025-06-09 13:49:10 -07:00
Frédéric Guillot
8db637cb39 feat(ui): add user setting to control target="_blank" on links
Rationale: Opening links in the current tab is the default browser behavior.

Using `target="_blank"` on external links can lead to accessibility issues and override user preferences. It may also interfere with assistive technologies and expected browser behavior.

To maintain backward compatibility, this option is enabled by default (`true`), which adds `target="_blank"` to links.
2025-06-08 21:07:11 -07:00
Frédéric Guillot
8142268799 feat: populate feed description automatically 2025-05-24 21:15:52 -07:00
Anton Larionov
553c578f2e
feat(rssbridge): support auth token for RSS-Bridge 2025-05-19 20:47:12 -07:00
Frédéric Guillot
828a4334db fix(sanitizer): MathML tags are not fully supported by golang.org/x/net/html
See https://github.com/golang/net/blob/master/html/atom/gen.go
and https://github.com/golang/net/blob/master/html/atom/table.go
2025-05-06 21:18:19 -07:00
jvoisin
d1dc369bb2 feat(sanitizer): add MathML tags to the sanitizer
This was found by reading the article pointed by https://lobste.rs/s/nobvmp/how_prime_factorizations_govern_collatz
2025-05-06 20:19:56 -07:00
jvoisin
ff2dfe977b feat: remove the ref parameter from url
This is used by (at least) Ghost (https://forum.ghost.org/t/ref-parameter-being-added-to-links/38335)

Examples:
- https://blog.exploits.club/exploits-club-weekly-newsletter-66-mitigations-galore-dirtycow-revisited-program-analysis-for-uafs-and-more/
- https://labs.watchtowr.com/is-the-sofistication-in-the-room-with-us-x-forwarded-for-and-ivanti-connect-secure-cve-2025-22457/
2025-05-06 19:59:55 -07:00
NoelNegash
81c7669945
feat(sanitized): allow Spotify iframes 2025-05-02 16:25:17 -07:00
Frédéric Guillot
d33e305af9 fix(api): hide_globally categories field should be a boolean 2025-04-21 19:43:25 -07:00
Frédéric Guillot
c87c93d85f feat(config): add SCHEDULER_ROUND_ROBIN_MAX_INTERVAL option
Add option to cap maximum refresh interval when RSS TTL, Retry-After, Cache-Control, or Expires headers specify excessively high values.
2025-04-11 15:40:32 -07:00
Frédéric Guillot
ef22e95f8b feat: implement proxy URL per feed 2025-04-06 21:05:19 -07:00
Frédéric Guillot
c45b51d1f8 feat: use Cache-Control max-age and Expires headers to calculate next check 2025-04-06 16:24:00 -07:00
Frédéric Guillot
0af1a6e121 refactor: avoid logging twice the feed errors in the background worker 2025-04-06 15:39:40 -07:00
Frédéric Guillot
535fd050b7 feat: add proxy rotation functionality 2025-04-06 14:59:00 -07:00
Frédéric Guillot
51560f191f fix(subscription): add /rss/feed.xml to the list of known feed URLs 2025-03-28 16:59:06 -07:00
Frédéric Guillot
e342a4f143 fix: address minor issues detected by Go linters 2025-03-24 20:48:46 -07:00
Frédéric Guillot
315e72c412 fix(rewrite): remove obsolete rule for webtoons.com 2025-03-06 20:11:03 -08:00
jvoisin
f916373f55 fix: allow the <b> tag 2025-03-06 19:27:30 -08:00
jvoisin
5353211206 fix: allow the <u> tag in feeds 2025-03-06 19:26:26 -08:00
AiraNadih
ad02f21d04
refactor(rewrite): reorganize referer rules and remove obsolete mapping 2025-03-02 19:40:52 -08:00
Maytham Alsudany
f01ff067a5 fix(processor): add missing quotation marks to import comments 2025-02-24 16:34:26 -08:00
jvoisin
117d711d7d feat(urlcleaner): add more Google Analytics parameters 2025-02-22 17:07:59 -08:00
jvoisin
4a77e937af perf(sanitizer): remove two useless calls to strings.ReplaceAll
The [strings.Fields](https://pkg.go.dev/strings#Fields) considers `'\t', '\n',
'\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).` as spaces, so no need to
remove them beforehand.

This is a continuation of f2f60a8f73
2025-02-18 19:42:39 -08:00
Frédéric Guillot
462ba8d7f7 feat(sanitizer): allow img tags with only a srcset and no src attribute 2025-02-15 18:03:36 -08:00
Frédéric Guillot
6eedf4111f fix(scraper): avoid encoding issue if charset meta tag is after 1024 bytes 2025-02-15 17:05:14 -08:00
Frédéric Guillot
af1f966250 test(encoding): add unit tests for CharsetReader function 2025-02-15 15:40:07 -08:00
Frédéric Guillot
7f54b27079 fix(rss): handle item title with CDATA content correctly
Fix regression introduced in commit a3ce03cc
2025-02-15 14:51:27 -08:00
Frédéric Guillot
a3ce03cc9d feat(rss): add workaround for RSS item title with HTML content 2025-02-14 21:21:49 -08:00
Frédéric Guillot
f2f60a8f73 feat(sanitizer): improve text truncation with better space handling 2025-02-06 21:21:49 -08:00
Frédéric Guillot
e777f12490 fix(sanitizer): correct HTML tag name from tfooter to tfoot 2025-02-06 21:16:29 -08:00
Julien Voisin
7eb1d15315
refactor(date): use an else-if instead of two if statements 2025-02-06 19:44:12 -08:00
Julien Voisin
b193bc212a
refactor(xml): improve the performances of NewXMLDecoder
- Invert a condition to make the code more readable
- Extract the encoding directly from the slice of bytes instead of converting
  it to string first.
2025-01-30 19:37:06 -08:00
Julien Voisin
7275bc808a
feat(urlcleaner): add trackers to the blocklist 2025-01-29 19:32:19 -08:00
Frédéric Guillot
369054b02d feat(processor): fetch YouTube watch time in bulk using the API 2025-01-24 15:16:23 -08:00
Frédéric Guillot
c3c42b0c37 fix(scraper): update TechCrunch scraper rule 2025-01-23 19:29:32 -08:00
jvoisin
2e57e3351b Remove superfluous parenthesis 2025-01-23 19:20:13 -08:00
jvoisin
a412cde3b3 Don't define receivers on both values and pointer
And use `o` instead of `outline` as done everywhere else.
2025-01-23 19:20:13 -08:00
jvoisin
abfd9306a4 Guard against a potential null dereference 2025-01-23 19:20:13 -08:00
Frédéric Guillot
1faccc7eca fix(sanitizer): non-allowed attributes are not properly stripped
Regression introduced in commit 58178d90cb
2025-01-22 20:50:38 -08:00
Frédéric Guillot
9c82e55b98 fix: do not strip tags in Atom entry title 2025-01-18 15:33:44 -08:00