1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-08-06 17:41:00 +00:00
Commit graph

9 commits

Author SHA1 Message Date
Frédéric Guillot
6eeccae7cd test(readability): increase test coverage 2025-06-30 21:29:07 -07:00
jvoisin
aed99e65c1 perf(readability): improve getClassWeight speed
Before

```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8   	     34	 86102474 ns/op
BenchmarkGetWeight-8        	  10573	    103045 ns/op
PASS
ok  	miniflux.app/v2/internal/reader/readability	5.409s
```

After

```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8   	     56	 83130924 ns/op
BenchmarkGetWeight-8        	 246541	     5241 ns/op
PASS
ok  	miniflux.app/v2/internal/reader/readability	6.026s
```

This should make ProcessFeedEntries marginally faster, while saving
some memory.
2025-06-30 19:28:20 -07:00
Frédéric Guillot
a68de4ee6a test(readability): add tests for getArticle function 2025-06-29 16:03:17 -07:00
Frédéric Guillot
5129f53d58 test(readability): add tests for removeUnlikelyCandidates function 2025-06-29 15:23:56 -07:00
Frédéric Guillot
e60f0fd142 test(readability): add tests for getClassWeight function 2025-06-29 13:24:06 -07:00
Frédéric Guillot
6d58052504
fix(readability): do not remove elements within code blocks
`<span class="hljs-comment"># exit 1</span>` will match the `unlikelyCandidatesRegexp` because it contains the `comment` string.
2025-06-19 21:03:53 -07:00
jvoisin
2df59b4865 Refactor internal/reader/readability/testdata
- Use chained strings.Contains instead of a regex for
  blacklistCandidatesRegexp, as this is a bit faster
- Simplify a Find.Each.Remove to Find.Remove
- Don't concatenate id and class for removeUnlikelyCandidates, as it makes no
  sense to match on overlaps. It might also marginally improve performances, as
  regex now have to run on two strings separately, instead of both.
- Add a small benchmark
2024-12-15 20:52:32 -08:00
Julien Voisin
6ad5ad0bb2
refactor(readability): various improvements and optimizations
- Replace a completely overkill regex
- Use `.Remove()` instead of a hand-rolled loop
- Use a strings.Builder instead of a bytes.NewBufferString
- Replace a call to Fprintf with string concatenation, as the latter are much
  faster
- Remove a superfluous cast
- Delay some computations
- Add some tests
2024-12-12 20:41:56 -08:00
Frédéric Guillot
29387f2d60 feat: implement base element handling in content scraper 2024-07-25 20:36:56 -07:00