1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-06-27 16:36:00 +00:00
Commit graph

186 commits

Author SHA1 Message Date
Julien Voisin
b193bc212a
refactor(xml): improve the performances of NewXMLDecoder
- Invert a condition to make the code more readable
- Extract the encoding directly from the slice of bytes instead of converting
  it to string first.
2025-01-30 19:37:06 -08:00
Julien Voisin
7275bc808a
feat(urlcleaner): add trackers to the blocklist 2025-01-29 19:32:19 -08:00
Frédéric Guillot
369054b02d feat(processor): fetch YouTube watch time in bulk using the API 2025-01-24 15:16:23 -08:00
Frédéric Guillot
c3c42b0c37 fix(scraper): update TechCrunch scraper rule 2025-01-23 19:29:32 -08:00
jvoisin
2e57e3351b Remove superfluous parenthesis 2025-01-23 19:20:13 -08:00
jvoisin
a412cde3b3 Don't define receivers on both values and pointer
And use `o` instead of `outline` as done everywhere else.
2025-01-23 19:20:13 -08:00
jvoisin
abfd9306a4 Guard against a potential null dereference 2025-01-23 19:20:13 -08:00
Frédéric Guillot
1faccc7eca fix(sanitizer): non-allowed attributes are not properly stripped
Regression introduced in commit 58178d90cb
2025-01-22 20:50:38 -08:00
Frédéric Guillot
9c82e55b98 fix: do not strip tags in Atom entry title 2025-01-18 15:33:44 -08:00
Frédéric Guillot
e9520f5d1c fix(finder): do not add redirections to the list of subscriptions to avoid confusion 2025-01-12 17:09:32 -08:00
Jake Walker
6cbe8c3a9d
feat: add fix_ghost_cards rewrite rule 2025-01-12 14:43:27 -08:00
Julien Voisin
f116f7dd6a
test(sanitizer): add a fuzzer 2025-01-11 17:19:31 -08:00
Frédéric Guillot
5549f75dd7 fix(sanitizer): allow <hr> tags 2024-12-27 13:56:06 -08:00
Julien Voisin
8df4b780a8
refactor(readingtime): replace whatlanggo with an ad-hoc implementation
The package `github.com/abadojack/whatlanggo` is unmaintained since 5 years, is
overkill for simply detecting CJK, and is quite slow.
2024-12-26 14:21:07 -08:00
Julien Voisin
195b75d185
refactor(rewriter): use custom title case converter implementation instead of golang.org/x/text/cases.Title()
The implementation is equivalent to
`cases.Title(language.English).String(strings.ToLower(…))`,
and this is the only place in miniflux where
"golang.org/x/text/cases" and "golang.org/x/text/language"
are (directly) used.

This reduces the binary size from 27015590 to
26686112 on my machine.

Kudos to https://gsa.zxilly.dev for making it straightforward to catch things
like this.
2024-12-23 21:16:02 -08:00
jvoisin
bd91e5f320 Add more referer spoofing
Based on #2261. For moyu.im/jandan.net, see https://github.com/DIYgod/RSSHub/issues/11528
2024-12-20 11:53:38 -08:00
Sevi.C
bca9bea676
feat: add date-based entry filtering rules 2024-12-16 20:38:20 -08:00
jvoisin
7939b54341 Resize favicons to 32x32 to account of scaling
As suggested by @michaelkuhn in https://github.com/miniflux/v2/pull/2998#issuecomment-2546702212
2024-12-16 19:28:38 -08:00
jvoisin
2df59b4865 Refactor internal/reader/readability/testdata
- Use chained strings.Contains instead of a regex for
  blacklistCandidatesRegexp, as this is a bit faster
- Simplify a Find.Each.Remove to Find.Remove
- Don't concatenate id and class for removeUnlikelyCandidates, as it makes no
  sense to match on overlaps. It might also marginally improve performances, as
  regex now have to run on two strings separately, instead of both.
- Add a small benchmark
2024-12-15 20:52:32 -08:00
Julien Voisin
777d0dd248
feat: resize favicons before storing them
Some websites are using images of O(10kB) when not )O(100kB) for their
favicons. As miniflux only displays them with a 16x16 resolution, let's do our
best to resize them before storing them in the database. This should make
miniflux consume less bandwidth when serving pages, for the joy of mobile users
on a small data plan.

Of course, images that already are 16x16 aren't resized.
2024-12-15 20:47:19 -08:00
Julien Voisin
cfda948c3a
refactor(rewriter): avoid the use of regex in addDynamicImage
See https://dustri.org/b/parsing-noscript-tags-with-goquery.html for the whole
story.
2024-12-15 17:56:39 -08:00
Julien Voisin
945d436055
refactor(rewriter): replace regex with URL parsing for referrer override
No need for brittle regex when matching plain strings or domain names.
This should save some negligible amount of heap memory as well as
tremendously speeding up the matching.
2024-12-13 14:50:12 -08:00
Frédéric Guillot
c3649bd6b1 refactor(rewrite): remove unused function arguments 2024-12-12 21:10:35 -08:00
Julien Voisin
6ad5ad0bb2
refactor(readability): various improvements and optimizations
- Replace a completely overkill regex
- Use `.Remove()` instead of a hand-rolled loop
- Use a strings.Builder instead of a bytes.NewBufferString
- Replace a call to Fprintf with string concatenation, as the latter are much
  faster
- Remove a superfluous cast
- Delay some computations
- Add some tests
2024-12-12 20:41:56 -08:00
Frédéric Guillot
113abeea59 test(rewrite): add unit test for referer rewrite function 2024-12-12 20:11:47 -08:00
Julien Voisin
e6185b1393
refactor: use min/max instead of math.Min/math.Max
This saves a couple of back'n'forth casts.
2024-12-11 19:43:14 -08:00
Julien Voisin
1b0b8b9c42
refactor: use a better construct than doc.Find(…).First()
As mentioned in goquery's documentation (https://pkg.go.dev/github.com/PuerkitoBio/goquery#Single):

> By default, Selection.Find and other functions that accept a selector string
to select nodes will use all matches corresponding to that selector. By using
the Matcher returned by Single, at most the first match will be selected.
>
> The one using Single is optimized to be potentially much faster on large documents.
2024-12-11 19:40:55 -08:00
Julien Voisin
3caa16ac31
refactor(processor): use URL parsing instead of a regex 2024-12-11 19:30:59 -08:00
Julien Voisin
637fb85de0
refactor(handler): delay store.UserByID as much as possible
In internal/reader/handler/handler.go:RefreshFeed, there is a call to
store.UserByID pretty early, which is only used for
originalFeed.WithTranslatedErrorMessage(localizedError.Translate(user.Language)
Its only other usage is in processor.ProcessFeedEntries(store, originalFeed,
user, forceRefresh), which is pretty late in RefreshFeed, and only called if
there are new items in the feed. It makes sense to only fetch the user's
language if the error localization function is used.

Calls to `store.UserByID` take around 10% of the CPU time of RefreshFeed in my
profiling.

This commit also makes `processor.ProcessFeedEntries` take a `userID` instead
of a `user`, to make the code a bit more concise.

This should close #2984
2024-12-09 19:32:59 -08:00
Julien Voisin
02c6d14659
refactor(subscription): use strings.HasSuffix instead of a regex in FindSubscriptionsFromYouTubePlaylistPage 2024-12-09 17:19:28 -08:00
Julien Voisin
728423339a
refactor(sanitizer): improve rewriteIframeURL()
- Use `url.Parse` instead of a regex, as this is much faster and way more robust
- Add support for Vimeo's Do Not Track parameter
2024-12-09 17:14:54 -08:00
Julien Voisin
dea46ac0ea
refactor: optimize sanitizeAttributes
- Use string concatenation instead of `Sprintf`, as this is much faster, and the
  call to `Sprintf` is responsible for 30% of the CPU time of the function
- Anchor the youtube regex, to allow it to bail early, as this also account for
  another 30% of the CPU time. It might be worth chaining calls to `TrimPrefix`
  and check if the string has been trimmed instead of using a regex, to speed
  things up even more, but this needs to be benchmarked properly.
2024-12-08 14:42:18 -08:00
Julien Voisin
a913f3f75f
feat(rewrite)!: remove parse_markdown rewrite rule
It was added in 2022 by #1513, to support blog.laravel.com, which has
since switched to HTML. The Atom 0.3/1.0, RSS 1.0/2.0, RDF, and JSON formats
don't support markdown in their spec, and any website serving it there should
be considered as buggy and fixed.

This shaves off 2MB from the miniflux binary, which is quite steep for a
feature that nobody is/should be using, and remove a dependency which is always
a good thing.
2024-12-08 14:34:47 -08:00
Julien Voisin
2671f57edd
refactor(readability): simplify the regexes in internal/reader/readability/readability.go
- Use strings.ToLower() instead of having case-insensitive regex
- Remove overlapping words in the regex
- Split a condition to increase readability
2024-12-07 16:56:19 -08:00
jvoisin
2f56ebd3a6 Remove a now-useless function 2024-12-07 16:50:18 -08:00
jvoisin
059f5c0905 Inline a condition 2024-12-07 16:50:18 -08:00
jvoisin
58178d90cb Refactor Sanitize
- Use `token.String()` instead of `html.EscapeString(token.Data)`
- Refactor conditions to highlight their similitude, enabling further
  refactoring

This refactoring brings forth at least one bug: `tagStack` is never emptied.
2024-12-07 16:50:18 -08:00
jvoisin
cc885bbabb config.Opts is guaranteed to never be nil 2024-12-07 16:50:18 -08:00
jvoisin
0e185849b4 Google+ isn't a thing anymore 2024-12-07 16:50:18 -08:00
jvoisin
d0984f29da Simplify isValidTag 2024-12-07 16:50:18 -08:00
jvoisin
902ca63c45 Inline a function and fix a bug in it
The `isAnchor` function's first parameter was always `a`, instead of being
passed `tagName`. As this function is a single line and was only called in a
single place, it can be inlined.
2024-12-07 16:50:18 -08:00
jvoisin
2314500515 Merge two conditions 2024-12-07 16:50:18 -08:00
jvoisin
787d373211 Change the scope of a variable 2024-12-07 16:50:18 -08:00
Julien Voisin
fefbf2c935
refactor(processor): improve the rewrite URL rule regex
- Use `[^"]` instead of `.`, to help the regex engine to determine boundaries,
  instead of having it bruteforce its way to find them
- Use `+` instead of `*`, as empty rules don't make sense
2024-12-07 16:35:51 -08:00
Julien Voisin
bfb429b919
refactor(sanitizer): optimize internal/reader/sanitizer/strip_tags.go
- Use strings instead of doing string->bytes->string
- Use a strings.Builder to build the output
2024-12-07 16:31:48 -08:00
Julien Voisin
331c831c23
refactor(sanitizer): simplify hasRequiredAttributes
This function takes around 1.5% of the total CPU time on my instance, and most
of it is spent in `mapassign_faststr` to initialize the `map`. This is replaced
with a switch-case construct, that should be both significantly faster as well
as pretty dull in term of memory consumption.
2024-12-07 16:30:15 -08:00
Julien Voisin
92a49d7e69
refactor(sanitizer): micro-optimizations of internal/reader/sanitizer/srcset.go
- Pre-allocate a slice
- Inline a local variable
- Remove a superfluous call to `strings.TrimSpace`
- Simplify some conditions via a switch-case construct
2024-12-07 16:27:56 -08:00
Gabe Cook
c3ca603960 fix: load icon from site URL instead of feed URL 2024-12-07 16:06:26 -08:00
telnet23
7e2b50efee feat: optionally fetch watch time from YouTube API instead of website 2024-12-07 16:00:35 -08:00
Gabe Cook
b61ee15c1b fix: feed icon from xml ignored during force refresh 2024-12-07 15:59:49 -08:00