1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-06-27 16:36:00 +00:00
Commit graph

14 commits

Author SHA1 Message Date
jvoisin
8a014c6abc perf(readability): minor regex improvement
- Improve the check for tags by matching only if its name is followed either by
  a space, a slash or a closing angle
- Use an anonymous group
2025-06-12 19:13:58 -07:00
jvoisin
2df59b4865 Refactor internal/reader/readability/testdata
- Use chained strings.Contains instead of a regex for
  blacklistCandidatesRegexp, as this is a bit faster
- Simplify a Find.Each.Remove to Find.Remove
- Don't concatenate id and class for removeUnlikelyCandidates, as it makes no
  sense to match on overlaps. It might also marginally improve performances, as
  regex now have to run on two strings separately, instead of both.
- Add a small benchmark
2024-12-15 20:52:32 -08:00
Julien Voisin
6ad5ad0bb2
refactor(readability): various improvements and optimizations
- Replace a completely overkill regex
- Use `.Remove()` instead of a hand-rolled loop
- Use a strings.Builder instead of a bytes.NewBufferString
- Replace a call to Fprintf with string concatenation, as the latter are much
  faster
- Remove a superfluous cast
- Delay some computations
- Add some tests
2024-12-12 20:41:56 -08:00
Julien Voisin
e6185b1393
refactor: use min/max instead of math.Min/math.Max
This saves a couple of back'n'forth casts.
2024-12-11 19:43:14 -08:00
Julien Voisin
1b0b8b9c42
refactor: use a better construct than doc.Find(…).First()
As mentioned in goquery's documentation (https://pkg.go.dev/github.com/PuerkitoBio/goquery#Single):

> By default, Selection.Find and other functions that accept a selector string
to select nodes will use all matches corresponding to that selector. By using
the Matcher returned by Single, at most the first match will be selected.
>
> The one using Single is optimized to be potentially much faster on large documents.
2024-12-11 19:40:55 -08:00
Julien Voisin
2671f57edd
refactor(readability): simplify the regexes in internal/reader/readability/readability.go
- Use strings.ToLower() instead of having case-insensitive regex
- Remove overlapping words in the regex
- Split a condition to increase readability
2024-12-07 16:56:19 -08:00
Frédéric Guillot
29387f2d60 feat: implement base element handling in content scraper 2024-07-25 20:36:56 -07:00
Frédéric Guillot
b1e73fafdf Enable go-critic linter and fix various issues detected 2024-03-17 13:52:34 -07:00
jvoisin
347740dce1 Speed up removeUnlikelyCandidates
`.Not` returns a brand new Selection, copied element by element.
2024-02-29 19:38:43 -08:00
Frédéric Guillot
97765b93a9 Revert "Minor internal/reader/readability/readability.go speedup"
This reverts commit 4db138d4b8.

```
panic: runtime error: index out of range [-1]

goroutine 49 [running]:
miniflux.app/v2/internal/reader/readability.getArticle.func1(0x8?, 0xc000b56570)
        /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:120 +0x2ac
github.com/PuerkitoBio/goquery.(*Selection).Each(0xc000b56510, 0xc000892fa8)
        /home/fred/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.9.0/iteration.go:10 +0x62
miniflux.app/v2/internal/reader/readability.getArticle(0xc00044f1f0, 0xc000a04a50)
        /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:101 +0x15d
miniflux.app/v2/internal/reader/readability.ExtractContent({0x1005d00?, 0xc0001522d0?})
        /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:91 +0x211
miniflux.app/v2/internal/reader/scraper.ScrapeWebsite(0xc000893688?, {0xc0007ce720, 0x54}, {0x0, 0x0})
        /home/fred/repos/miniflux/v2/internal/reader/scraper/scraper.go:63 +0x859
miniflux.app/v2/internal/reader/processor.ProcessFeedEntries(0xc000133188, 0xc000502c40, 0xc0003e6360, 0x0)
        /home/fred/repos/miniflux/v2/internal/reader/processor/processor.go:77 +0x8ea
miniflux.app/v2/internal/reader/handler.RefreshFeed(0xc000133188, 0x10cf, 0x52d5c, 0x0)
        /home/fred/repos/miniflux/v2/internal/reader/handler/handler.go:301 +0x1485
miniflux.app/v2/internal/cli.refreshFeeds.func1(0x0)
        /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:59 +0x2d7
created by miniflux.app/v2/internal/cli.refreshFeeds in goroutine 1
        /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:50 +0x5d5
```
2024-02-29 19:06:03 -08:00
jvoisin
4db138d4b8 Minor internal/reader/readability/readability.go speedup
- Don't use a capturing group in `divToPElementsRegexp`
- Remove a duplicate condition
- Replace a regex with a fixed-comparison and a `Contains`
2024-02-28 20:03:14 -08:00
jvoisin
61af08a721 Use .WriteString( instead of .Write([]byte(… 2024-02-28 19:47:30 -08:00
Frédéric Guillot
c0e954f19d Implement structured logging using log/slog package 2023-09-24 22:37:33 -07:00
Frédéric Guillot
168a870c02 Move internal packages to an internal folder
For reference: https://go.dev/doc/go1.4#internalpackages
2023-08-10 20:29:34 -07:00
Renamed from reader/readability/readability.go (Browse further)