miniflux-v2

mirror of https://github.com/miniflux/v2.git synced 2025-09-30 19:22:11 +00:00

Author	SHA1	Message	Date
jvoisin	0e9da3a090	refactor(icon): simplify findIconURLsFromHTMLDocument - Don't define the queries before possible early returns - Check for the presence of the href attribute in the queries, instead of later on iterating on the selection - Add two edge-cases to the tests - Use EachIter instead of Each, if only to avoid the lambda	2025-07-10 19:32:29 -07:00
jvoisin	57bd384951	refactor(icon): unexport a bunch of symbols	2025-07-10 19:32:29 -07:00
jvoisin	f455c18c66	perf(rewrite): anchor the rewrite regex There is no need to try to match the regexp over the whole input, having it anchored is enough. If we feel extra-lenient, we might strip spaces in front/tail, but I don't think it's necessary. This commit also invert a condition to reduce the level of nested indentation, and make a condition stricter.	2025-07-10 19:23:54 -07:00
jvoisin	46adb0ffad	refactor(fetcher): simplification of ExecuteRequest Instead of doing some ciphers manipulation before instantiating the http.Transport and then assigning them, instantiate http.Transport, and then in an if do the manipulation. This makes the code a bit clearer, which is always nice when it comes to cryptographic shenanigans.	2025-07-09 19:36:36 -07:00
Frédéric Guillot	2e26f5ca75	test(reader): ensure consistent tags parsing across feed formats	2025-07-07 20:07:35 -07:00
jvoisin	d6d18a2d61	perf(reader): shrink the json detection buffer There is no need to allocate half a kilobyte of memory only check that a buffer starts with a bunch of spaces and a `{`, 32b should be more than enough. Also, no need to allocate it on the heap, having it on the stack works perfectly.	2025-07-07 19:21:59 -07:00
Frédéric Guillot	e7b98afdbe	refactor(subscription): avoid using Sprintf to construct Youtube playlist feed URL	2025-07-07 17:08:47 -07:00
Frédéric Guillot	2cfeefc8d2	test(processor): increase test coverage for `parseISO8601Duration`	2025-07-07 17:01:10 -07:00
jvoisin	b48e6472f5	refactor(processor): parse ~ISO8601 in a proper way Instead of using an ugly (and incomplete) regex, let's use a simple for-loop to parse ISO8601 dates, and make it explicit that we're only supporting a subset of the spec, as we only care about youtube video durations.	2025-07-07 16:28:58 -07:00
jvoisin	7a394b0bf8	refactor(subscription): replace a regex with strings.CutPrefix	2025-07-07 15:44:45 -07:00
Julien Voisin	a8b4e88742	perf(sanitizer): improve the performances of the sanitizer (#3497 ) - Grow the underlying buffer of SanitizeHTML's strings.Builder to 3/4 of the raw HTML from the start, to reduce the amount of iterative allocations. This number is a complete guesstimation, but it sounds reasonable to me. - Add a `absoluteURLParsedBase` function to avoid parsing baseURL over and over.	2025-07-07 15:21:13 -07:00
jvoisin	69a74c4abf	refactor(readability): minor clean up Remove a now-useless regex and its associated test.	2025-07-02 16:50:49 -07:00
jvoisin	766d4ab834	refactor(readability): make use of getSelectionLength	2025-07-02 16:47:27 -07:00
Frédéric Guillot	cb617ff6e0	test(sanitizer): enhance tests for image width and height attributes	2025-07-01 20:52:45 -07:00
Frédéric Guillot	8c3f280f32	test(readability): add test case for `ExtractContent` with broken reader	2025-07-01 20:14:52 -07:00
jvoisin	8a98926674	refactor(readability): add a getSelectionLength function When we're only interested in the length of contained Text, there is no need to materialize it fully to then call len() on the result: we can simply iterate over the text element and sum their length instead.	2025-07-01 19:52:53 -07:00
jvoisin	435a950d64	refactor(sanitizer): minor refactorization Use a proper switch-case instead of a bunch of if.	2025-07-01 19:48:55 -07:00
jvoisin	89c32d518d	perf(readability): significantly improve transformMisusedDivsIntoParagraphs	2025-07-01 19:44:58 -07:00
jvoisin	2f7b2e7375	perf(readability): improve getLinkDensity - There is no need to materialize all the content of a given Node when we can simply compute its length directly, saving a lot of memory, on the order of several megabytes on my instance, with peaks at a couple of dozen. - One might object to the usage of a recursive construct, but this is a direct port of goquery's Text method, so this change doesn't make anything worse. - The computation of linkLength can be similarly computed, but this can go in another commit, as it's a bit trickier, since we need to get the length of every Node that has a `a` Node as parent, without iterating on the whole parent chain every time.	2025-07-01 19:40:47 -07:00
Frédéric Guillot	6eeccae7cd	test(readability): increase test coverage	2025-06-30 21:29:07 -07:00
jvoisin	aed99e65c1	perf(readability): improve getClassWeight speed Before ```console $ go test -bench=. goos: linux goarch: arm64 pkg: miniflux.app/v2/internal/reader/readability BenchmarkExtractContent-8 34 86102474 ns/op BenchmarkGetWeight-8 10573 103045 ns/op PASS ok miniflux.app/v2/internal/reader/readability 5.409s ``` After ```console $ go test -bench=. goos: linux goarch: arm64 pkg: miniflux.app/v2/internal/reader/readability BenchmarkExtractContent-8 56 83130924 ns/op BenchmarkGetWeight-8 246541 5241 ns/op PASS ok miniflux.app/v2/internal/reader/readability 6.026s ``` This should make ProcessFeedEntries marginally faster, while saving some memory.	2025-06-30 19:28:20 -07:00
jvoisin	d1a3f98df9	perf(fetcher): save 8 bytes in the RequestBuilder struct before: ``` // request_builder.go:25 \| Size: 64 (Optimal: 56) type RequestBuilder struct { headers http.Header ■ ■ ■ ■ ■ ■ ■ ■ clientProxyURL url.URL ■ ■ ■ ■ ■ ■ ■ ■ useClientProxy bool ■ □ □ □ □ □ □ □ clientTimeout int ■ ■ ■ ■ ■ ■ ■ ■ withoutRedirects bool ■ ignoreTLSErrors bool ■ disableHTTP2 bool ■ □ □ □ □ □ proxyRotator proxyrotator.ProxyRotator ■ ■ ■ ■ ■ ■ ■ ■ feedProxyURL string ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ } ``` after: ``` // request_builder.go:25 \| Size: 56 type RequestBuilder struct { headers http.Header ■ ■ ■ ■ ■ ■ ■ ■ clientProxyURL url.URL ■ ■ ■ ■ ■ ■ ■ ■ clientTimeout int ■ ■ ■ ■ ■ ■ ■ ■ useClientProxy bool ■ withoutRedirects bool ■ ignoreTLSErrors bool ■ disableHTTP2 bool ■ □ □ □ □ proxyRotator proxyrotator.ProxyRotator ■ ■ ■ ■ ■ ■ ■ ■ feedProxyURL string ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ } ```	2025-06-29 16:10:35 -07:00
jvoisin	4e1f836266	refactor(readability): simplify a bit getArticle - Use a proper division instead of multiplying by a float. - Extract a condition in the parent scope - Use an else-if construct instead of a simple if	2025-06-29 16:06:34 -07:00
Frédéric Guillot	a68de4ee6a	test(readability): add tests for `getArticle` function	2025-06-29 16:03:17 -07:00
jvoisin	c064891314	perf(readability): Simplify removeUnlikelyCandidates - Use an array of strings instead of a regex, like done in ef13756b1a7a7ba30fd34174a5367381fd8b4849 - Extract the `shouldRemove` function from `removeUnlikelyCandidates`, as there is no reason to have it there instead of being a proper standalone function. - Improve a condition, where the goquery selection would have its `id` attribute left unchecked if a `class` one was present, regardless of if `class` was a candidate to removal or not. - Add some comments	2025-06-29 15:31:01 -07:00
Frédéric Guillot	5129f53d58	test(readability): add tests for `removeUnlikelyCandidates` function	2025-06-29 15:23:56 -07:00
Frédéric Guillot	e60f0fd142	test(readability): add tests for `getClassWeight` function	2025-06-29 13:24:06 -07:00
Julien Voisin	2b26a345cd	perf(processor): minify content even further There is no need to keep comments (conditionals or not, as IE isn't a thing anymore), nor default attribute values.	2025-06-29 12:55:34 -07:00
Frédéric Guillot	3de31a1a4d	test(processor): add more unit tests for `minifyContent` function	2025-06-29 12:53:23 -07:00
jvoisin	560be66147	refactor(misc): Use proper slog.XXX instead of slog.Any This has close to no impact for now, as our slog.Debug/Info/... are leaking their parameters to the heap, but using proper typing instead of Any allows to skip some reflection-based computation, making things marginally faster, and removing the corresponding heap leak.	2025-06-29 12:30:17 -07:00
Frédéric Guillot	6d58052504	fix(readability): do not remove elements within code blocks `<span class="hljs-comment"># exit 1</span>` will match the `unlikelyCandidatesRegexp` because it contains the `comment` string.	2025-06-19 21:03:53 -07:00
Frédéric Guillot	db49e41acf	refactor(processor): move FilterEntryMaxAgeDays filter to filter package	2025-06-19 17:56:45 -07:00
Frédéric Guillot	e6b814199b	feat(filter): add `EntryDate=max-age:duration` filter Example: `EntryDate=max-age:30d` or `EntryDate=max-age:1h` Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h", "d".	2025-06-19 17:25:19 -07:00
Frédéric Guillot	9c05c3c493	feat(filter): merge user and feed entry filter rules	2025-06-19 16:24:57 -07:00
Frédéric Guillot	2a9d91c783	feat: add entry filters at the feed level	2025-06-19 15:15:16 -07:00
Frédéric Guillot	cb59944d6b	refactor(processor): move `RewriteEntryURL` function to `rewrite` package	2025-06-19 13:22:29 -07:00
Frédéric Guillot	c12476c1a9	refactor(filter): avoid code duplication between IsBlockedEntry and IsAllowedEntry functions	2025-06-19 12:55:00 -07:00
Frédéric Guillot	bc6ab44ff2	fix(filter): skip invalid rules instead of exiting the loop	2025-06-19 12:36:35 -07:00
Frédéric Guillot	6282ac1f38	refactor(processor): move filters to a `filter` package	2025-06-19 12:06:30 -07:00
jvoisin	96c0ef4efd	refactor(processor): massive refactoring of filters.go - Use proper variable names for `key=value` strings parts - Explicitly assign false to the `match` boolean - Use an explicit `len(parts) == 2` assertion to help the compiler remove `isSliceInBounds` calls. - Refactor identical code into a containsRegexPattern function. - Early exit when parsing the first date fails when using the `Between` operator, instead of trying to parse the second one.	2025-06-19 11:43:47 -07:00
jvoisin	b139ac4a2c	refactor(youtube): Remove a regex and make use of fetchWatchTime	2025-06-19 11:43:47 -07:00
jvoisin	c818d5bbb8	refactor(youtube): initiliaze two maps to the proper length	2025-06-19 11:43:47 -07:00
jvoisin	e366710529	refactor(processor): remove a useless type declaration	2025-06-19 11:43:47 -07:00
jvoisin	5cff4d7117	refactor(processor): remove a duplication function call As youtubeVideoID is assigned to getVideoIDFromYouTubeURL(entry.URL), there is no need to call the latter again when we can simly use youtubeVideoID instead.	2025-06-19 11:43:47 -07:00
jvoisin	f31a784eaa	refactor(processor): refactor common code into a fetchWatchTime function Both nebula and odysee were using the same function to parse time.	2025-06-19 11:43:47 -07:00
jvoisin	7edfcc3cf7	refactor(processor): remove a useless type declaration	2025-06-19 11:43:47 -07:00
jvoisin	fe4b00b9f8	refactor(processor): extract some functions into an utils.go file	2025-06-19 11:43:47 -07:00
jvoisin	46b159ac58	refactor(processor): simplify bilibili processing - Use strings.Contains instead of a regex - Use strings concatenation instead of a call to fmt.Sprintf - Use `any` instead of `interface{}`	2025-06-19 11:43:47 -07:00
jvoisin	86c58e11f6	perf(reader): use a non-cryptographic hash when possible There is no need to use SHA256 everywhere, especially on small inputs where we don't care about its cryptographic properties. We're using FNV as it's the faster available hash in go's standard library, and we're picking its "a" version as it's slightly better avalanche characteristics, which are relevant for small inputs. This commit has the side-effect of invalidating all favicons saved in the database, which is desirable to benefit from the resize process implemented in `777d0dd2`, as it didn't apply retro-actively. We're also making use of hex.EncodeToString instead of fmt.Sprintf, as it's marginally faster. Note that we can't change the usage of sha256 for feed.Hash as it's used to deduplicate entries in the database.	2025-06-18 20:28:23 -07:00
jvoisin	43546976d2	refactor(tests): use b.Loop() instead of for range b.N See https://tip.golang.org/doc/go1.24#new-benchmark-function	2025-06-18 20:12:55 -07:00

1 2 3 4 5 ...

294 commits