Julien Voisin
6ad5ad0bb2
refactor(readability): various improvements and optimizations
...
- Replace a completely overkill regex
- Use `.Remove()` instead of a hand-rolled loop
- Use a strings.Builder instead of a bytes.NewBufferString
- Replace a call to Fprintf with string concatenation, as the latter are much
faster
- Remove a superfluous cast
- Delay some computations
- Add some tests
2024-12-12 20:41:56 -08:00
Julien Voisin
e6185b1393
refactor: use min/max instead of math.Min/math.Max
...
This saves a couple of back'n'forth casts.
2024-12-11 19:43:14 -08:00
Julien Voisin
1b0b8b9c42
refactor: use a better construct than doc.Find(…).First()
...
As mentioned in goquery's documentation (https://pkg.go.dev/github.com/PuerkitoBio/goquery#Single ):
> By default, Selection.Find and other functions that accept a selector string
to select nodes will use all matches corresponding to that selector. By using
the Matcher returned by Single, at most the first match will be selected.
>
> The one using Single is optimized to be potentially much faster on large documents.
2024-12-11 19:40:55 -08:00
Julien Voisin
2671f57edd
refactor(readability): simplify the regexes in internal/reader/readability/readability.go
...
- Use strings.ToLower() instead of having case-insensitive regex
- Remove overlapping words in the regex
- Split a condition to increase readability
2024-12-07 16:56:19 -08:00
Frédéric Guillot
29387f2d60
feat: implement base element handling in content scraper
2024-07-25 20:36:56 -07:00
Frédéric Guillot
b1e73fafdf
Enable go-critic linter and fix various issues detected
2024-03-17 13:52:34 -07:00
jvoisin
347740dce1
Speed up removeUnlikelyCandidates
...
`.Not` returns a brand new Selection, copied element by element.
2024-02-29 19:38:43 -08:00
Frédéric Guillot
97765b93a9
Revert "Minor internal/reader/readability/readability.go speedup"
...
This reverts commit 4db138d4b8
.
```
panic: runtime error: index out of range [-1]
goroutine 49 [running]:
miniflux.app/v2/internal/reader/readability.getArticle.func1(0x8?, 0xc000b56570)
/home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:120 +0x2ac
github.com/PuerkitoBio/goquery.(*Selection).Each(0xc000b56510, 0xc000892fa8)
/home/fred/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.9.0/iteration.go:10 +0x62
miniflux.app/v2/internal/reader/readability.getArticle(0xc00044f1f0, 0xc000a04a50)
/home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:101 +0x15d
miniflux.app/v2/internal/reader/readability.ExtractContent({0x1005d00?, 0xc0001522d0?})
/home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:91 +0x211
miniflux.app/v2/internal/reader/scraper.ScrapeWebsite(0xc000893688?, {0xc0007ce720, 0x54}, {0x0, 0x0})
/home/fred/repos/miniflux/v2/internal/reader/scraper/scraper.go:63 +0x859
miniflux.app/v2/internal/reader/processor.ProcessFeedEntries(0xc000133188, 0xc000502c40, 0xc0003e6360, 0x0)
/home/fred/repos/miniflux/v2/internal/reader/processor/processor.go:77 +0x8ea
miniflux.app/v2/internal/reader/handler.RefreshFeed(0xc000133188, 0x10cf, 0x52d5c, 0x0)
/home/fred/repos/miniflux/v2/internal/reader/handler/handler.go:301 +0x1485
miniflux.app/v2/internal/cli.refreshFeeds.func1(0x0)
/home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:59 +0x2d7
created by miniflux.app/v2/internal/cli.refreshFeeds in goroutine 1
/home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:50 +0x5d5
```
2024-02-29 19:06:03 -08:00
jvoisin
4db138d4b8
Minor internal/reader/readability/readability.go speedup
...
- Don't use a capturing group in `divToPElementsRegexp`
- Remove a duplicate condition
- Replace a regex with a fixed-comparison and a `Contains`
2024-02-28 20:03:14 -08:00
jvoisin
61af08a721
Use .WriteString( instead of .Write([]byte(…
2024-02-28 19:47:30 -08:00
Frédéric Guillot
c0e954f19d
Implement structured logging using log/slog package
2023-09-24 22:37:33 -07:00
Frédéric Guillot
168a870c02
Move internal packages to an internal folder
...
For reference: https://go.dev/doc/go1.4#internalpackages
2023-08-10 20:29:34 -07:00