- There is no need to materialize all the content of a given Node when we can
simply compute its length directly, saving a lot of memory, on the order of
several megabytes on my instance, with peaks at a couple of dozen.
- One might object to the usage of a recursive construct, but this is a direct
port of goquery's Text method, so this change doesn't make anything worse.
- The computation of linkLength can be similarly computed, but this can go in
another commit, as it's a bit trickier, since we need to get the length of
every Node that has a `a` Node as parent, without iterating on the whole
parent chain every time.
Before
```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8 34 86102474 ns/op
BenchmarkGetWeight-8 10573 103045 ns/op
PASS
ok miniflux.app/v2/internal/reader/readability 5.409s
```
After
```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8 56 83130924 ns/op
BenchmarkGetWeight-8 246541 5241 ns/op
PASS
ok miniflux.app/v2/internal/reader/readability 6.026s
```
This should make ProcessFeedEntries marginally faster, while saving
some memory.
- Use an array of strings instead of a regex, like done in ef13756b1a7a7ba30fd34174a5367381fd8b4849
- Extract the `shouldRemove` function from `removeUnlikelyCandidates`, as there
is no reason to have it there instead of being a proper standalone function.
- Improve a condition, where the goquery selection would have its `id`
attribute left unchecked if a `class` one was present, regardless of if
`class` was a candidate to removal or not.
- Add some comments
This has close to no impact for now, as our slog.Debug/Info/... are leaking
their parameters to the heap, but using proper typing instead of Any allows
to skip some reflection-based computation, making things marginally faster,
and removing the corresponding heap leak.
- Use proper variable names for `key=value` strings parts
- Explicitly assign false to the `match` boolean
- Use an explicit `len(parts) == 2` assertion to help the compiler remove
`isSliceInBounds` calls.
- Refactor identical code into a containsRegexPattern function.
- Early exit when parsing the first date fails when using the `Between`
operator, instead of trying to parse the second one.
As youtubeVideoID is assigned to getVideoIDFromYouTubeURL(entry.URL),
there is no need to call the latter again when we can simly use youtubeVideoID
instead.
There is no need to use SHA256 everywhere, especially on small inputs where we
don't care about its cryptographic properties. We're using FNV as it's the
faster available hash in go's standard library, and we're picking its "a"
version as it's slightly better avalanche characteristics, which are
relevant for small inputs.
This commit has the side-effect of invalidating all favicons saved in the
database, which is desirable to benefit from the resize process implemented in
777d0dd2, as it didn't apply retro-actively.
We're also making use of hex.EncodeToString instead of fmt.Sprintf, as it's
marginally faster.
Note that we can't change the usage of sha256 for feed.Hash as it's used to
deduplicate entries in the database.
- Factorize some conditions
- Remove useless `default` case and move the return at the end of the functions
- Use strings.CutPrefix instead of strings.HasPrefix + strings.TrimPrefix
- Use switch-case constructs instead of slices.Contains, as this reduces the
complexity of the functions and allows them to be inlined, as well as helping
the compiler to optimize them, as it sucks at interprocedural optimizations.
The previous regex was using the [ABC..D]*[ABC] pattern, resulting in a lot of
backtracking. The new regex is stopping the matching at the first space or end
of text (and removes the trailing `.` should one be present).
The backtracking was taking around 50% of the CPU time spent in atom.Parse
Previously, url.Parse(baseUrl) was called on every self-closing tags, and on
most opening tags, accounting for around 15% of the CPU time spent in
processor.ProcessFeedEntries