miniflux-v2

mirror of https://github.com/miniflux/v2.git synced 2025-08-06 17:41:00 +00:00

Author	SHA1	Message	Date
jvoisin	4e1f836266	refactor(readability): simplify a bit getArticle - Use a proper division instead of multiplying by a float. - Extract a condition in the parent scope - Use an else-if construct instead of a simple if	2025-06-29 16:06:34 -07:00
jvoisin	c064891314	perf(readability): Simplify removeUnlikelyCandidates - Use an array of strings instead of a regex, like done in ef13756b1a7a7ba30fd34174a5367381fd8b4849 - Extract the `shouldRemove` function from `removeUnlikelyCandidates`, as there is no reason to have it there instead of being a proper standalone function. - Improve a condition, where the goquery selection would have its `id` attribute left unchecked if a `class` one was present, regardless of if `class` was a candidate to removal or not. - Add some comments	2025-06-29 15:31:01 -07:00
Frédéric Guillot	6d58052504	fix(readability): do not remove elements within code blocks `<span class="hljs-comment"># exit 1</span>` will match the `unlikelyCandidatesRegexp` because it contains the `comment` string.	2025-06-19 21:03:53 -07:00
jvoisin	8a014c6abc	perf(readability): minor regex improvement - Improve the check for tags by matching only if its name is followed either by a space, a slash or a closing angle - Use an anonymous group	2025-06-12 19:13:58 -07:00
jvoisin	2df59b4865	Refactor internal/reader/readability/testdata - Use chained strings.Contains instead of a regex for blacklistCandidatesRegexp, as this is a bit faster - Simplify a Find.Each.Remove to Find.Remove - Don't concatenate id and class for removeUnlikelyCandidates, as it makes no sense to match on overlaps. It might also marginally improve performances, as regex now have to run on two strings separately, instead of both. - Add a small benchmark	2024-12-15 20:52:32 -08:00
Julien Voisin	6ad5ad0bb2	refactor(readability): various improvements and optimizations - Replace a completely overkill regex - Use `.Remove()` instead of a hand-rolled loop - Use a strings.Builder instead of a bytes.NewBufferString - Replace a call to Fprintf with string concatenation, as the latter are much faster - Remove a superfluous cast - Delay some computations - Add some tests	2024-12-12 20:41:56 -08:00
Julien Voisin	e6185b1393	refactor: use min/max instead of math.Min/math.Max This saves a couple of back'n'forth casts.	2024-12-11 19:43:14 -08:00
Julien Voisin	1b0b8b9c42	refactor: use a better construct than `doc.Find(…).First()` As mentioned in goquery's documentation (https://pkg.go.dev/github.com/PuerkitoBio/goquery#Single): > By default, Selection.Find and other functions that accept a selector string to select nodes will use all matches corresponding to that selector. By using the Matcher returned by Single, at most the first match will be selected. > > The one using Single is optimized to be potentially much faster on large documents.	2024-12-11 19:40:55 -08:00
Julien Voisin	2671f57edd	refactor(readability): simplify the regexes in `internal/reader/readability/readability.go` - Use strings.ToLower() instead of having case-insensitive regex - Remove overlapping words in the regex - Split a condition to increase readability	2024-12-07 16:56:19 -08:00
Frédéric Guillot	29387f2d60	feat: implement base element handling in content scraper	2024-07-25 20:36:56 -07:00
Frédéric Guillot	b1e73fafdf	Enable go-critic linter and fix various issues detected	2024-03-17 13:52:34 -07:00
jvoisin	347740dce1	Speed up removeUnlikelyCandidates `.Not` returns a brand new Selection, copied element by element.	2024-02-29 19:38:43 -08:00
Frédéric Guillot	97765b93a9	Revert "Minor internal/reader/readability/readability.go speedup" This reverts commit `4db138d4b8`. ``` panic: runtime error: index out of range [-1] goroutine 49 [running]: miniflux.app/v2/internal/reader/readability.getArticle.func1(0x8?, 0xc000b56570) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:120 +0x2ac github.com/PuerkitoBio/goquery.(*Selection).Each(0xc000b56510, 0xc000892fa8) /home/fred/go/pkg/mod/github.com/!puerkito!bio/goquery@v1.9.0/iteration.go:10 +0x62 miniflux.app/v2/internal/reader/readability.getArticle(0xc00044f1f0, 0xc000a04a50) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:101 +0x15d miniflux.app/v2/internal/reader/readability.ExtractContent({0x1005d00?, 0xc0001522d0?}) /home/fred/repos/miniflux/v2/internal/reader/readability/readability.go:91 +0x211 miniflux.app/v2/internal/reader/scraper.ScrapeWebsite(0xc000893688?, {0xc0007ce720, 0x54}, {0x0, 0x0}) /home/fred/repos/miniflux/v2/internal/reader/scraper/scraper.go:63 +0x859 miniflux.app/v2/internal/reader/processor.ProcessFeedEntries(0xc000133188, 0xc000502c40, 0xc0003e6360, 0x0) /home/fred/repos/miniflux/v2/internal/reader/processor/processor.go:77 +0x8ea miniflux.app/v2/internal/reader/handler.RefreshFeed(0xc000133188, 0x10cf, 0x52d5c, 0x0) /home/fred/repos/miniflux/v2/internal/reader/handler/handler.go:301 +0x1485 miniflux.app/v2/internal/cli.refreshFeeds.func1(0x0) /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:59 +0x2d7 created by miniflux.app/v2/internal/cli.refreshFeeds in goroutine 1 /home/fred/repos/miniflux/v2/internal/cli/refresh_feeds.go:50 +0x5d5 ```	2024-02-29 19:06:03 -08:00
jvoisin	4db138d4b8	Minor internal/reader/readability/readability.go speedup - Don't use a capturing group in `divToPElementsRegexp` - Remove a duplicate condition - Replace a regex with a fixed-comparison and a `Contains`	2024-02-28 20:03:14 -08:00
jvoisin	61af08a721	Use .WriteString( instead of .Write([]byte(…	2024-02-28 19:47:30 -08:00
Frédéric Guillot	c0e954f19d	Implement structured logging using log/slog package	2023-09-24 22:37:33 -07:00
Frédéric Guillot	168a870c02	Move internal packages to an internal folder For reference: https://go.dev/doc/go1.4#internalpackages	2023-08-10 20:29:34 -07:00

17 commits