miniflux-v2

mirror of https://github.com/miniflux/v2.git synced 2025-08-06 17:41:00 +00:00

Author	SHA1	Message	Date
jvoisin	69a74c4abf	refactor(readability): minor clean up Remove a now-useless regex and its associated test.	2025-07-02 16:50:49 -07:00
Frédéric Guillot	8c3f280f32	test(readability): add test case for `ExtractContent` with broken reader	2025-07-01 20:14:52 -07:00
jvoisin	2f7b2e7375	perf(readability): improve getLinkDensity - There is no need to materialize all the content of a given Node when we can simply compute its length directly, saving a lot of memory, on the order of several megabytes on my instance, with peaks at a couple of dozen. - One might object to the usage of a recursive construct, but this is a direct port of goquery's Text method, so this change doesn't make anything worse. - The computation of linkLength can be similarly computed, but this can go in another commit, as it's a bit trickier, since we need to get the length of every Node that has a `a` Node as parent, without iterating on the whole parent chain every time.	2025-07-01 19:40:47 -07:00
Frédéric Guillot	6eeccae7cd	test(readability): increase test coverage	2025-06-30 21:29:07 -07:00
jvoisin	aed99e65c1	perf(readability): improve getClassWeight speed Before ```console $ go test -bench=. goos: linux goarch: arm64 pkg: miniflux.app/v2/internal/reader/readability BenchmarkExtractContent-8 34 86102474 ns/op BenchmarkGetWeight-8 10573 103045 ns/op PASS ok miniflux.app/v2/internal/reader/readability 5.409s ``` After ```console $ go test -bench=. goos: linux goarch: arm64 pkg: miniflux.app/v2/internal/reader/readability BenchmarkExtractContent-8 56 83130924 ns/op BenchmarkGetWeight-8 246541 5241 ns/op PASS ok miniflux.app/v2/internal/reader/readability 6.026s ``` This should make ProcessFeedEntries marginally faster, while saving some memory.	2025-06-30 19:28:20 -07:00
Frédéric Guillot	a68de4ee6a	test(readability): add tests for `getArticle` function	2025-06-29 16:03:17 -07:00
Frédéric Guillot	5129f53d58	test(readability): add tests for `removeUnlikelyCandidates` function	2025-06-29 15:23:56 -07:00
Frédéric Guillot	e60f0fd142	test(readability): add tests for `getClassWeight` function	2025-06-29 13:24:06 -07:00
Frédéric Guillot	6d58052504	fix(readability): do not remove elements within code blocks `<span class="hljs-comment"># exit 1</span>` will match the `unlikelyCandidatesRegexp` because it contains the `comment` string.	2025-06-19 21:03:53 -07:00
jvoisin	2df59b4865	Refactor internal/reader/readability/testdata - Use chained strings.Contains instead of a regex for blacklistCandidatesRegexp, as this is a bit faster - Simplify a Find.Each.Remove to Find.Remove - Don't concatenate id and class for removeUnlikelyCandidates, as it makes no sense to match on overlaps. It might also marginally improve performances, as regex now have to run on two strings separately, instead of both. - Add a small benchmark	2024-12-15 20:52:32 -08:00
Julien Voisin	6ad5ad0bb2	refactor(readability): various improvements and optimizations - Replace a completely overkill regex - Use `.Remove()` instead of a hand-rolled loop - Use a strings.Builder instead of a bytes.NewBufferString - Replace a call to Fprintf with string concatenation, as the latter are much faster - Remove a superfluous cast - Delay some computations - Add some tests	2024-12-12 20:41:56 -08:00
Frédéric Guillot	29387f2d60	feat: implement base element handling in content scraper	2024-07-25 20:36:56 -07:00

12 commits