1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-06-27 16:36:00 +00:00
Commit graph

163 commits

Author SHA1 Message Date
Julien Voisin
6ad5ad0bb2
refactor(readability): various improvements and optimizations
- Replace a completely overkill regex
- Use `.Remove()` instead of a hand-rolled loop
- Use a strings.Builder instead of a bytes.NewBufferString
- Replace a call to Fprintf with string concatenation, as the latter are much
  faster
- Remove a superfluous cast
- Delay some computations
- Add some tests
2024-12-12 20:41:56 -08:00
Frédéric Guillot
113abeea59 test(rewrite): add unit test for referer rewrite function 2024-12-12 20:11:47 -08:00
Julien Voisin
e6185b1393
refactor: use min/max instead of math.Min/math.Max
This saves a couple of back'n'forth casts.
2024-12-11 19:43:14 -08:00
Julien Voisin
1b0b8b9c42
refactor: use a better construct than doc.Find(…).First()
As mentioned in goquery's documentation (https://pkg.go.dev/github.com/PuerkitoBio/goquery#Single):

> By default, Selection.Find and other functions that accept a selector string
to select nodes will use all matches corresponding to that selector. By using
the Matcher returned by Single, at most the first match will be selected.
>
> The one using Single is optimized to be potentially much faster on large documents.
2024-12-11 19:40:55 -08:00
Julien Voisin
3caa16ac31
refactor(processor): use URL parsing instead of a regex 2024-12-11 19:30:59 -08:00
Julien Voisin
637fb85de0
refactor(handler): delay store.UserByID as much as possible
In internal/reader/handler/handler.go:RefreshFeed, there is a call to
store.UserByID pretty early, which is only used for
originalFeed.WithTranslatedErrorMessage(localizedError.Translate(user.Language)
Its only other usage is in processor.ProcessFeedEntries(store, originalFeed,
user, forceRefresh), which is pretty late in RefreshFeed, and only called if
there are new items in the feed. It makes sense to only fetch the user's
language if the error localization function is used.

Calls to `store.UserByID` take around 10% of the CPU time of RefreshFeed in my
profiling.

This commit also makes `processor.ProcessFeedEntries` take a `userID` instead
of a `user`, to make the code a bit more concise.

This should close #2984
2024-12-09 19:32:59 -08:00
Julien Voisin
02c6d14659
refactor(subscription): use strings.HasSuffix instead of a regex in FindSubscriptionsFromYouTubePlaylistPage 2024-12-09 17:19:28 -08:00
Julien Voisin
728423339a
refactor(sanitizer): improve rewriteIframeURL()
- Use `url.Parse` instead of a regex, as this is much faster and way more robust
- Add support for Vimeo's Do Not Track parameter
2024-12-09 17:14:54 -08:00
Julien Voisin
dea46ac0ea
refactor: optimize sanitizeAttributes
- Use string concatenation instead of `Sprintf`, as this is much faster, and the
  call to `Sprintf` is responsible for 30% of the CPU time of the function
- Anchor the youtube regex, to allow it to bail early, as this also account for
  another 30% of the CPU time. It might be worth chaining calls to `TrimPrefix`
  and check if the string has been trimmed instead of using a regex, to speed
  things up even more, but this needs to be benchmarked properly.
2024-12-08 14:42:18 -08:00
Julien Voisin
a913f3f75f
feat(rewrite)!: remove parse_markdown rewrite rule
It was added in 2022 by #1513, to support blog.laravel.com, which has
since switched to HTML. The Atom 0.3/1.0, RSS 1.0/2.0, RDF, and JSON formats
don't support markdown in their spec, and any website serving it there should
be considered as buggy and fixed.

This shaves off 2MB from the miniflux binary, which is quite steep for a
feature that nobody is/should be using, and remove a dependency which is always
a good thing.
2024-12-08 14:34:47 -08:00
Julien Voisin
2671f57edd
refactor(readability): simplify the regexes in internal/reader/readability/readability.go
- Use strings.ToLower() instead of having case-insensitive regex
- Remove overlapping words in the regex
- Split a condition to increase readability
2024-12-07 16:56:19 -08:00
jvoisin
2f56ebd3a6 Remove a now-useless function 2024-12-07 16:50:18 -08:00
jvoisin
059f5c0905 Inline a condition 2024-12-07 16:50:18 -08:00
jvoisin
58178d90cb Refactor Sanitize
- Use `token.String()` instead of `html.EscapeString(token.Data)`
- Refactor conditions to highlight their similitude, enabling further
  refactoring

This refactoring brings forth at least one bug: `tagStack` is never emptied.
2024-12-07 16:50:18 -08:00
jvoisin
cc885bbabb config.Opts is guaranteed to never be nil 2024-12-07 16:50:18 -08:00
jvoisin
0e185849b4 Google+ isn't a thing anymore 2024-12-07 16:50:18 -08:00
jvoisin
d0984f29da Simplify isValidTag 2024-12-07 16:50:18 -08:00
jvoisin
902ca63c45 Inline a function and fix a bug in it
The `isAnchor` function's first parameter was always `a`, instead of being
passed `tagName`. As this function is a single line and was only called in a
single place, it can be inlined.
2024-12-07 16:50:18 -08:00
jvoisin
2314500515 Merge two conditions 2024-12-07 16:50:18 -08:00
jvoisin
787d373211 Change the scope of a variable 2024-12-07 16:50:18 -08:00
Julien Voisin
fefbf2c935
refactor(processor): improve the rewrite URL rule regex
- Use `[^"]` instead of `.`, to help the regex engine to determine boundaries,
  instead of having it bruteforce its way to find them
- Use `+` instead of `*`, as empty rules don't make sense
2024-12-07 16:35:51 -08:00
Julien Voisin
bfb429b919
refactor(sanitizer): optimize internal/reader/sanitizer/strip_tags.go
- Use strings instead of doing string->bytes->string
- Use a strings.Builder to build the output
2024-12-07 16:31:48 -08:00
Julien Voisin
331c831c23
refactor(sanitizer): simplify hasRequiredAttributes
This function takes around 1.5% of the total CPU time on my instance, and most
of it is spent in `mapassign_faststr` to initialize the `map`. This is replaced
with a switch-case construct, that should be both significantly faster as well
as pretty dull in term of memory consumption.
2024-12-07 16:30:15 -08:00
Julien Voisin
92a49d7e69
refactor(sanitizer): micro-optimizations of internal/reader/sanitizer/srcset.go
- Pre-allocate a slice
- Inline a local variable
- Remove a superfluous call to `strings.TrimSpace`
- Simplify some conditions via a switch-case construct
2024-12-07 16:27:56 -08:00
Gabe Cook
c3ca603960 fix: load icon from site URL instead of feed URL 2024-12-07 16:06:26 -08:00
telnet23
7e2b50efee feat: optionally fetch watch time from YouTube API instead of website 2024-12-07 16:00:35 -08:00
Gabe Cook
b61ee15c1b fix: feed icon from xml ignored during force refresh 2024-12-07 15:59:49 -08:00
Gabe Cook
30c2e09a56 chore: remove blog.laravel.com rewrite rule 2024-12-03 01:21:42 -08:00
3zero2
c6c71c58b8
feat: add predefined scraper rules for arstechnica.com 2024-11-14 17:47:31 -08:00
AiraNadih
f0fe91172f feat(mediaProxy): update predefined referer spoofing rules for restricted media resources 2024-11-12 19:47:23 -08:00
AiraNadih
b0a3b4d5d9 style(mediaProxy): format with gofmt to pass linter checks 2024-10-30 19:50:12 -07:00
AiraNadih
469f23968e feat(mediaProxy): implement referer spoofing for restricted media resources 2024-10-30 19:50:12 -07:00
Frédéric Guillot
191f3a7ad7 feat(rss): calculate hash based on item title/content for feeds without GUID and link 2024-10-18 18:37:38 -07:00
July
86c0cc61ba
feat: set entry URL to rewritten URL if a rewrite rule is defined 2024-10-13 21:21:28 -07:00
Frédéric Guillot
5c4df786de fix: avoid panic in IsRateLimited() function 2024-10-06 21:34:23 -07:00
Frédéric Guillot
e1050e21b5
feat: take Retry-After header into consideration for rate limited feeds 2024-10-05 22:26:05 -07:00
Frédéric Guillot
f16735fd6d feat: update feed icon during force refresh 2024-10-04 20:51:40 -07:00
Scott Leggett
562a7b79a5 fix: update Last-Modified if it changes in a 304 response
When a server returns a 304 response with a strong validator, any other
stored fields must be updated if they are also present in the response.

This behaviour is described in RFC9111, sections 3.2 and 4.3.4.
2024-10-04 17:47:48 -07:00
Scott Leggett
cb610230d9 chore: update test case comment
The updated comment reflects a better understanding of the RFCs.
2024-10-04 17:47:48 -07:00
Frédéric Guillot
cfe410f202 refactor: split processor package into smaller files 2024-09-22 18:54:19 -07:00
Qeynos
c2ac2bfb83
feat: use Bilibili API instead of web scraping to get video watch time 2024-09-22 18:05:43 -07:00
Pontus Jensen Karlsson
ade412f453 fix: Honor hide_globally when creating a new feed through the api
TestGetGlobalEntriesEndpoint was failing because CreateFeed ignored HideGlobally, this fixes that.
2024-08-12 20:20:44 -07:00
Qeynos
bcbf9f4025
feat: add FETCH_BILIBILI_WATCH_TIME config option 2024-08-01 19:52:31 -07:00
Frédéric Guillot
37309adbc0 fix: do not alter the original URL if there is no tracker parameter 2024-07-25 22:10:28 -07:00
Frédéric Guillot
92f3dc26e4 feat: add support for aside HTML element in entry content 2024-07-25 21:11:37 -07:00
Frédéric Guillot
f6dc952551 feat: add support for base element when discovering feeds 2024-07-25 20:54:51 -07:00
Frédéric Guillot
29387f2d60 feat: implement base element handling in content scraper 2024-07-25 20:36:56 -07:00
Frédéric Guillot
c0f6e32a99 feat: remove well-known URL parameter trackers 2024-07-19 21:35:47 -07:00
Frédéric Guillot
36c25e7689 refactor: simplify Youtube feeds discovery 2024-07-13 12:17:13 -07:00
Frédéric Guillot
cb97d4a1a8 feat: remove YouTube video page subscription finder because meta[itemprop="channelId"] no longer exists 2024-07-13 11:11:50 -07:00