jvoisin
8a014c6abc
perf(readability): minor regex improvement
...
- Improve the check for tags by matching only if its name is followed either by
a space, a slash or a closing angle
- Use an anonymous group
2025-06-12 19:13:58 -07:00
jvoisin
60ad19c427
perf(rss): early return when looking for an item's author
...
The `sanitizer.StripTags` function is calling `html.NewTokenizer`, which is
allocating a 4096 bytes buffer on the heap, as well a running a complex state
machine to tokenize html. There is no need to do all of this for empty strings.
This commit also fixes a TrimSpace/StripTags call inversion.
2025-06-11 19:06:15 -07:00
jvoisin
f40c1e7f63
fix(reader): fix a crash introduced by d59990f1
...
And add a fuzzer and a testcase as well to validate that nothing breaks.
2025-06-11 19:04:46 -07:00
Frédéric Guillot
a4d16cc5c1
refactor(rewrite): rename Rewriter
function to ApplyContentRewriteRules
2025-06-10 20:28:15 -07:00
jvoisin
7c857bdc72
perf(reader): optimize RemoveTrackingParameters
...
A bit more than 10% of processor.ProcessFeedEntries' CPU time is spent in
urlcleaner.RemoveTrackingParameters, specifically calling url.Parse, so let's
extract this operation outside of it, and do it once before calling
urlcleaner.RemoveTrackingParameters multiple times.
Co-authored-by: Frédéric Guillot <f@miniflux.net>
2025-06-10 19:29:25 -07:00
jvoisin
0caadf82f2
perf(rss): optimize a bit BuildFeed
...
Calls to urllib.AbsoluteURL take a bit less than 10% of the time spent in
parser.ParseFeed, completely parsing an url only to check if it's absolute, and
if not, to make it so.
Checking if it starts with `https://` or `http://` is usually enough to find if
an url is absolute, and if is doesn't, it's always possible to fall back to
urllib.AbsoluteURL.
This also comes with the advantage of reducing heap allocations, as most of the
time spent in urllib.AbsoluteURL is heap-related (de)allocations.
2025-06-10 19:23:16 -07:00
Frédéric Guillot
cecc18420d
feat(sanitizer): add validation for empty width and height attributes in img tags
2025-06-09 20:38:17 -07:00
Frédéric Guillot
d53fd17e10
feat(sanitizer): validate MathML XML namespace
2025-06-09 20:28:54 -07:00
Frédéric Guillot
21d22d7f0b
feat(sanitizer): add support for fetchpriority and decoding attributes in img tags
2025-06-09 20:12:15 -07:00
jvoisin
d59990f1dd
perf(xml): optimize xml filtering
...
Instead of using bytes.Map which is returning a copy of the provided []byte,
use a custom in-place implementation, as the bytes.Map call is taking around
25% of rss.Parse
2025-06-09 13:49:10 -07:00
jvoisin
49085daefe
perf(xml): optimized NewXMLDecoder
...
io.ReadAll is growing the underlying buffer progressively, while
io.Copy is able to allocate it in one go, which is significantly faster.
io.ReadAll is currently accounting for around 10% of the CPU time of rss.Parse
2025-06-09 13:49:10 -07:00
Frédéric Guillot
8db637cb39
feat(ui): add user setting to control target="_blank"
on links
...
Rationale: Opening links in the current tab is the default browser behavior.
Using `target="_blank"` on external links can lead to accessibility issues and override user preferences. It may also interfere with assistive technologies and expected browser behavior.
To maintain backward compatibility, this option is enabled by default (`true`), which adds `target="_blank"` to links.
2025-06-08 21:07:11 -07:00
Frédéric Guillot
8142268799
feat: populate feed description automatically
2025-05-24 21:15:52 -07:00
Anton Larionov
553c578f2e
feat(rssbridge): support auth token for RSS-Bridge
2025-05-19 20:47:12 -07:00
Frédéric Guillot
828a4334db
fix(sanitizer): MathML tags are not fully supported by golang.org/x/net/html
...
See https://github.com/golang/net/blob/master/html/atom/gen.go
and https://github.com/golang/net/blob/master/html/atom/table.go
2025-05-06 21:18:19 -07:00
jvoisin
d1dc369bb2
feat(sanitizer): add MathML tags to the sanitizer
...
This was found by reading the article pointed by https://lobste.rs/s/nobvmp/how_prime_factorizations_govern_collatz
2025-05-06 20:19:56 -07:00
jvoisin
ff2dfe977b
feat: remove the ref
parameter from url
...
This is used by (at least) Ghost (https://forum.ghost.org/t/ref-parameter-being-added-to-links/38335 )
Examples:
- https://blog.exploits.club/exploits-club-weekly-newsletter-66-mitigations-galore-dirtycow-revisited-program-analysis-for-uafs-and-more/
- https://labs.watchtowr.com/is-the-sofistication-in-the-room-with-us-x-forwarded-for-and-ivanti-connect-secure-cve-2025-22457/
2025-05-06 19:59:55 -07:00
NoelNegash
81c7669945
feat(sanitized): allow Spotify iframes
2025-05-02 16:25:17 -07:00
Frédéric Guillot
d33e305af9
fix(api): hide_globally
categories field should be a boolean
2025-04-21 19:43:25 -07:00
Frédéric Guillot
c87c93d85f
feat(config): add SCHEDULER_ROUND_ROBIN_MAX_INTERVAL
option
...
Add option to cap maximum refresh interval when RSS TTL, Retry-After, Cache-Control, or Expires headers specify excessively high values.
2025-04-11 15:40:32 -07:00
Frédéric Guillot
ef22e95f8b
feat: implement proxy URL per feed
2025-04-06 21:05:19 -07:00
Frédéric Guillot
c45b51d1f8
feat: use Cache-Control
max-age and Expires
headers to calculate next check
2025-04-06 16:24:00 -07:00
Frédéric Guillot
0af1a6e121
refactor: avoid logging twice the feed errors in the background worker
2025-04-06 15:39:40 -07:00
Frédéric Guillot
535fd050b7
feat: add proxy rotation functionality
2025-04-06 14:59:00 -07:00
Frédéric Guillot
51560f191f
fix(subscription): add /rss/feed.xml
to the list of known feed URLs
2025-03-28 16:59:06 -07:00
Frédéric Guillot
e342a4f143
fix: address minor issues detected by Go linters
2025-03-24 20:48:46 -07:00
Frédéric Guillot
315e72c412
fix(rewrite): remove obsolete rule for webtoons.com
2025-03-06 20:11:03 -08:00
jvoisin
f916373f55
fix: allow the <b>
tag
2025-03-06 19:27:30 -08:00
jvoisin
5353211206
fix: allow the <u>
tag in feeds
2025-03-06 19:26:26 -08:00
AiraNadih
ad02f21d04
refactor(rewrite): reorganize referer rules and remove obsolete mapping
2025-03-02 19:40:52 -08:00
Maytham Alsudany
f01ff067a5
fix(processor): add missing quotation marks to import comments
2025-02-24 16:34:26 -08:00
jvoisin
117d711d7d
feat(urlcleaner): add more Google Analytics parameters
2025-02-22 17:07:59 -08:00
jvoisin
4a77e937af
perf(sanitizer): remove two useless calls to strings.ReplaceAll
...
The [strings.Fields](https://pkg.go.dev/strings#Fields ) considers `'\t', '\n',
'\v', '\f', '\r', ' ', U+0085 (NEL), U+00A0 (NBSP).` as spaces, so no need to
remove them beforehand.
This is a continuation of f2f60a8f73
2025-02-18 19:42:39 -08:00
Frédéric Guillot
462ba8d7f7
feat(sanitizer): allow img
tags with only a srcset
and no src
attribute
2025-02-15 18:03:36 -08:00
Frédéric Guillot
6eedf4111f
fix(scraper): avoid encoding issue if charset meta tag is after 1024 bytes
2025-02-15 17:05:14 -08:00
Frédéric Guillot
af1f966250
test(encoding): add unit tests for CharsetReader function
2025-02-15 15:40:07 -08:00
Frédéric Guillot
7f54b27079
fix(rss): handle item title with CDATA content correctly
...
Fix regression introduced in commit a3ce03cc
2025-02-15 14:51:27 -08:00
Frédéric Guillot
a3ce03cc9d
feat(rss): add workaround for RSS item title with HTML content
2025-02-14 21:21:49 -08:00
Frédéric Guillot
f2f60a8f73
feat(sanitizer): improve text truncation with better space handling
2025-02-06 21:21:49 -08:00
Frédéric Guillot
e777f12490
fix(sanitizer): correct HTML tag name from tfooter
to tfoot
2025-02-06 21:16:29 -08:00
Julien Voisin
7eb1d15315
refactor(date): use an else-if instead of two if statements
2025-02-06 19:44:12 -08:00
Julien Voisin
b193bc212a
refactor(xml): improve the performances of NewXMLDecoder
...
- Invert a condition to make the code more readable
- Extract the encoding directly from the slice of bytes instead of converting
it to string first.
2025-01-30 19:37:06 -08:00
Julien Voisin
7275bc808a
feat(urlcleaner): add trackers to the blocklist
2025-01-29 19:32:19 -08:00
Frédéric Guillot
369054b02d
feat(processor): fetch YouTube watch time in bulk using the API
2025-01-24 15:16:23 -08:00
Frédéric Guillot
c3c42b0c37
fix(scraper): update TechCrunch scraper rule
2025-01-23 19:29:32 -08:00
jvoisin
2e57e3351b
Remove superfluous parenthesis
2025-01-23 19:20:13 -08:00
jvoisin
a412cde3b3
Don't define receivers on both values and pointer
...
And use `o` instead of `outline` as done everywhere else.
2025-01-23 19:20:13 -08:00
jvoisin
abfd9306a4
Guard against a potential null dereference
2025-01-23 19:20:13 -08:00
Frédéric Guillot
1faccc7eca
fix(sanitizer): non-allowed attributes are not properly stripped
...
Regression introduced in commit 58178d90cb
2025-01-22 20:50:38 -08:00
Frédéric Guillot
9c82e55b98
fix: do not strip tags in Atom entry title
2025-01-18 15:33:44 -08:00