- Don't define the queries before possible early returns
- Check for the presence of the href attribute in the queries, instead of later
on iterating on the selection
- Add two edge-cases to the tests
- Use EachIter instead of Each, if only to avoid the lambda
There is no need to try to match the regexp over the whole input, having it
anchored is enough. If we feel extra-lenient, we might strip spaces in
front/tail, but I don't think it's necessary.
This commit also invert a condition to reduce the level of nested indentation,
and make a condition stricter.
Instead of doing some ciphers manipulation before instantiating the http.Transport
and then assigning them, instantiate http.Transport, and then in an if do the
manipulation. This makes the code a bit clearer, which is always nice when it
comes to cryptographic shenanigans.
- Surface the faulty line number when trying to parse it
- Use strings.Cut instead of strings.SplitN
- Use strings.TrimSuffix instead of an if
- Simplify parseStringList and make its code more compact
There is no need to allocate half a kilobyte of memory only check that a buffer
starts with a bunch of spaces and a `{`, 32b should be more than enough. Also,
no need to allocate it on the heap, having it on the stack works perfectly.
- Reuse isAudio and isVideo instead of re-implementing them
- In IsImage, return early is the mimetype is enough, instead of systematically
lowercasing the url as well.
- Extract the common parts of the two ProxifyAbsoluteURL implementations into a
private function, make the code smaller and clearer.
- Fix a logic error where `A && B || C` was used instead of `A && (B || C)
Instead of using an ugly (and incomplete) regex, let's use a simple for-loop to
parse ISO8601 dates, and make it explicit that we're only supporting a subset
of the spec, as we only care about youtube video durations.
There is no need to use a mutex to check the length of the proxies list,
as it's read-only during the whole lifetime of a ProxyRotator structure.
Moreover, it's a bit clearer to explicitly wrap the 2 lines of mutex-needing
operations between a Lock/Unlock instead of using a defer.
Instead of having a switch-case returning a function to be executed, it's
simpler/faster to have a single function containing a switch-case. It also
allows to group languages with identical plural form in a single
implementation, and remove the "default" guard value, as switch-case already
have a `default:` case.
- Grow the underlying buffer of SanitizeHTML's strings.Builder to 3/4 of the
raw HTML from the start, to reduce the amount of iterative allocations. This
number is a complete guesstimation, but it sounds reasonable to me.
- Add a `absoluteURLParsedBase` function to avoid parsing baseURL over and over.
When we're only interested in the length of contained Text, there is no need to
materialize it fully to then call len() on the result: we can simply iterate
over the text element and sum their length instead.
- There is no need to materialize all the content of a given Node when we can
simply compute its length directly, saving a lot of memory, on the order of
several megabytes on my instance, with peaks at a couple of dozen.
- One might object to the usage of a recursive construct, but this is a direct
port of goquery's Text method, so this change doesn't make anything worse.
- The computation of linkLength can be similarly computed, but this can go in
another commit, as it's a bit trickier, since we need to get the length of
every Node that has a `a` Node as parent, without iterating on the whole
parent chain every time.
There is no need to send the whole title and content to have them truncated on
postgresql's side when we can do this client-side. This should save some
memory on the database's side, as well as some bandwidth
if it's located on another server. And it makes the SQL queries a tad more
readable as well.
Before
```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8 34 86102474 ns/op
BenchmarkGetWeight-8 10573 103045 ns/op
PASS
ok miniflux.app/v2/internal/reader/readability 5.409s
```
After
```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8 56 83130924 ns/op
BenchmarkGetWeight-8 246541 5241 ns/op
PASS
ok miniflux.app/v2/internal/reader/readability 6.026s
```
This should make ProcessFeedEntries marginally faster, while saving
some memory.