1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-08-11 17:51:01 +00:00
Commit graph

315 commits

Author SHA1 Message Date
jvoisin
485baf9654 refactor(misc): fix a handful of TODO 2025-08-09 15:22:02 -07:00
Julien Voisin
06cbf1b3b3
fix(icon): update incorrect log messages 2025-08-09 15:20:33 -07:00
Tim Douglas
a4f672b589 fix: URL detection incorrectly capturing newlines in media descriptions 2025-08-08 10:42:09 -07:00
Julien Voisin
566670cc06
refactor: unexport symbols 2025-08-07 17:27:04 -07:00
jvoisin
5affd78f4f refactor(reader): move the fetcher outside of a loop
There is no need to rebuilt the fetcher for every item, creating it once is
enough.
2025-08-05 17:39:23 -07:00
Frédéric Guillot
0f3c04a98a test(rewrite): fix flaky test case by sorting query string keys 2025-08-05 17:31:43 -07:00
Julien Voisin
a43d150a27
refactor(parser): centralize seek logic and provide a hint for the compiler to eliminate a useless bound check
- Move the seeking inside of DetectFeedFormat instead of having it everywhere
  in ParseFeed
- Provide a hint for the compiler to eliminate a useless bound check in
  DetectJSONFormat, otherwise it'll check that buffer[i] is valid on every
  iteration of the loop. This shouldn't make a big difference, but oh well.
2025-08-03 12:53:10 -07:00
Julien Voisin
cce0e7bd29
refactor(rewrite): replaced regex-based YouTube and Invidious video ID extraction with URL parsing 2025-08-01 17:44:12 -07:00
Frédéric Guillot
0c3e251884 refactor(filter): parse and merge filters only once per refresh 2025-07-30 21:34:03 -07:00
jvoisin
9eea9873b5 feat(rewrite): add a rule to remove useless heading images on phoronix 2025-07-30 18:53:04 -07:00
Frédéric Guillot
f3ac4dacf6 test(rewrite): add unit tests for addYoutubeVideoFromId and addInvidiousVideo functions 2025-07-29 21:51:15 -07:00
Frédéric Guillot
66b269e6cd feat(readability): avoid removing elements with content class 2025-07-25 19:59:47 -07:00
Frédéric Guillot
54abd0a736 fix(parser): handle feeds with leading whitespace that exceeds buffer size 2025-07-23 21:06:15 -07:00
jvoisin
a62b97bddd refactor(readability): get rid of getClassWeight
Its naming was confusing, and its code simple enough that it could be inlined.
2025-07-23 19:55:47 -07:00
jvoisin
1de9cf4241 perf(readability): simplify removeUnlikelyCandidates
- Use an iterator instead of generating a whole slice when iterating on the selection.
- Using an iterator allows to use a for-loop construct, instead of a lambda,
  which is a bit clearer
- Do the filtering Find()'s selector, instead of in the loop, which doesn't
  matter much now that we're using an iterator, but it makes the code a bit
  more obvious/simpler, and likely reduces a bit the number of iterations.
2025-07-23 19:55:47 -07:00
jvoisin
7912b9b8fb perf(readability): avoid materializing text to count commas
There is no need to materialize the whole text content of the selection only to
count its number of commas. As we already have a getLengthOfTextContent
function that is pretty similar, this commit refactors it to make it more
generic, in the form of a map/fold(+).
2025-07-23 19:55:47 -07:00
jvoisin
2d24f5d04e refactor(readability): minor code folding 2025-07-23 19:55:47 -07:00
Frédéric Guillot
f02213a168 refactor(readability): use String explicitly in debug log instead of Any 2025-07-19 10:58:49 -07:00
Frédéric Guillot
d9de9d1852 feat(rss): fallback to enclosure URL when entry URL is missing 2025-07-19 10:46:43 -07:00
Frédéric Guillot
dc81725788 fix(filter): remove \r\n in rule parsing 2025-07-16 21:03:53 -07:00
Julien Voisin
86e2ce6d0b
perf(readability): move transformMisusedDivsIntoParagraphs call after removeUnlikelyCandidates 2025-07-13 14:34:14 -07:00
jvoisin
0e9da3a090 refactor(icon): simplify findIconURLsFromHTMLDocument
- Don't define the queries before possible early returns
- Check for the presence of the href attribute in the queries, instead of later
  on iterating on the selection
- Add two edge-cases to the tests
- Use EachIter instead of Each, if only to avoid the lambda
2025-07-10 19:32:29 -07:00
jvoisin
57bd384951 refactor(icon): unexport a bunch of symbols 2025-07-10 19:32:29 -07:00
jvoisin
f455c18c66 perf(rewrite): anchor the rewrite regex
There is no need to try to match the regexp over the whole input, having it
anchored is enough. If we feel extra-lenient, we might strip spaces in
front/tail, but I don't think it's necessary.

This commit also invert a condition to reduce the level of nested indentation,
and make a condition stricter.
2025-07-10 19:23:54 -07:00
jvoisin
46adb0ffad refactor(fetcher): simplification of ExecuteRequest
Instead of doing some ciphers manipulation before instantiating the http.Transport
and then assigning them, instantiate http.Transport, and then in an if do the
manipulation. This makes the code a bit clearer, which is always nice when it
comes to cryptographic shenanigans.
2025-07-09 19:36:36 -07:00
Frédéric Guillot
2e26f5ca75 test(reader): ensure consistent tags parsing across feed formats 2025-07-07 20:07:35 -07:00
jvoisin
d6d18a2d61 perf(reader): shrink the json detection buffer
There is no need to allocate half a kilobyte of memory only check that a buffer
starts with a bunch of spaces and a `{`, 32b should be more than enough. Also,
no need to allocate it on the heap, having it on the stack works perfectly.
2025-07-07 19:21:59 -07:00
Frédéric Guillot
e7b98afdbe refactor(subscription): avoid using Sprintf to construct Youtube playlist feed URL 2025-07-07 17:08:47 -07:00
Frédéric Guillot
2cfeefc8d2 test(processor): increase test coverage for parseISO8601Duration 2025-07-07 17:01:10 -07:00
jvoisin
b48e6472f5 refactor(processor): parse ~ISO8601 in a proper way
Instead of using an ugly (and incomplete) regex, let's use a simple for-loop to
parse ISO8601 dates, and make it explicit that we're only supporting a subset
of the spec, as we only care about youtube video durations.
2025-07-07 16:28:58 -07:00
jvoisin
7a394b0bf8 refactor(subscription): replace a regex with strings.CutPrefix 2025-07-07 15:44:45 -07:00
Julien Voisin
a8b4e88742
perf(sanitizer): improve the performances of the sanitizer (#3497)
- Grow the underlying buffer of SanitizeHTML's strings.Builder to 3/4 of the
  raw HTML from the start, to reduce the amount of iterative allocations. This
  number is a complete guesstimation, but it sounds reasonable to me.
- Add a `absoluteURLParsedBase` function to avoid parsing baseURL over and over.
2025-07-07 15:21:13 -07:00
jvoisin
69a74c4abf refactor(readability): minor clean up
Remove a now-useless regex and its associated test.
2025-07-02 16:50:49 -07:00
jvoisin
766d4ab834 refactor(readability): make use of getSelectionLength 2025-07-02 16:47:27 -07:00
Frédéric Guillot
cb617ff6e0 test(sanitizer): enhance tests for image width and height attributes 2025-07-01 20:52:45 -07:00
Frédéric Guillot
8c3f280f32 test(readability): add test case for ExtractContent with broken reader 2025-07-01 20:14:52 -07:00
jvoisin
8a98926674 refactor(readability): add a getSelectionLength function
When we're only interested in the length of contained Text, there is no need to
materialize it fully to then call len() on the result: we can simply iterate
over the text element and sum their length instead.
2025-07-01 19:52:53 -07:00
jvoisin
435a950d64 refactor(sanitizer): minor refactorization
Use a proper switch-case instead of a bunch of if.
2025-07-01 19:48:55 -07:00
jvoisin
89c32d518d perf(readability): significantly improve transformMisusedDivsIntoParagraphs 2025-07-01 19:44:58 -07:00
jvoisin
2f7b2e7375 perf(readability): improve getLinkDensity
- There is no need to materialize all the content of a given Node when we can
  simply compute its length directly, saving a lot of memory, on the order of
  several megabytes on my instance, with peaks at a couple of dozen.
- One might object to the usage of a recursive construct, but this is a direct
  port of goquery's Text method, so this change doesn't make anything worse.
- The computation of linkLength can be similarly computed, but this can go in
  another commit, as it's a bit trickier, since we need to get the length of
  every Node that has a `a` Node as parent, without iterating on the whole
  parent chain every time.
2025-07-01 19:40:47 -07:00
Frédéric Guillot
6eeccae7cd test(readability): increase test coverage 2025-06-30 21:29:07 -07:00
jvoisin
aed99e65c1 perf(readability): improve getClassWeight speed
Before

```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8   	     34	 86102474 ns/op
BenchmarkGetWeight-8        	  10573	    103045 ns/op
PASS
ok  	miniflux.app/v2/internal/reader/readability	5.409s
```

After

```console
$ go test -bench=.
goos: linux
goarch: arm64
pkg: miniflux.app/v2/internal/reader/readability
BenchmarkExtractContent-8   	     56	 83130924 ns/op
BenchmarkGetWeight-8        	 246541	     5241 ns/op
PASS
ok  	miniflux.app/v2/internal/reader/readability	6.026s
```

This should make ProcessFeedEntries marginally faster, while saving
some memory.
2025-06-30 19:28:20 -07:00
jvoisin
d1a3f98df9 perf(fetcher): save 8 bytes in the RequestBuilder struct
before:

```
  // request_builder.go:25 | Size: 64 (Optimal: 56)
  type RequestBuilder struct {
    headers          http.Header                 ■ ■ ■ ■ ■ ■ ■ ■
    clientProxyURL   *url.URL                    ■ ■ ■ ■ ■ ■ ■ ■
    useClientProxy   bool                        ■ □ □ □ □ □ □ □
    clientTimeout    int                         ■ ■ ■ ■ ■ ■ ■ ■
    withoutRedirects bool                        ■
    ignoreTLSErrors  bool                          ■
    disableHTTP2     bool                            ■ □ □ □ □ □
    proxyRotator     *proxyrotator.ProxyRotator  ■ ■ ■ ■ ■ ■ ■ ■
    feedProxyURL     string                      ■ ■ ■ ■ ■ ■ ■ ■
                                                 ■ ■ ■ ■ ■ ■ ■ ■
  }
```

after:

```
  // request_builder.go:25 | Size: 56
  type RequestBuilder struct {
    headers          http.Header                 ■ ■ ■ ■ ■ ■ ■ ■
    clientProxyURL   *url.URL                    ■ ■ ■ ■ ■ ■ ■ ■
    clientTimeout    int                         ■ ■ ■ ■ ■ ■ ■ ■
    useClientProxy   bool                        ■
    withoutRedirects bool                          ■
    ignoreTLSErrors  bool                            ■
    disableHTTP2     bool                              ■ □ □ □ □
    proxyRotator     *proxyrotator.ProxyRotator  ■ ■ ■ ■ ■ ■ ■ ■
    feedProxyURL     string                      ■ ■ ■ ■ ■ ■ ■ ■
                                                 ■ ■ ■ ■ ■ ■ ■ ■
  }
```
2025-06-29 16:10:35 -07:00
jvoisin
4e1f836266 refactor(readability): simplify a bit getArticle
- Use a proper division instead of multiplying by a float.
- Extract a condition in the parent scope
- Use an else-if construct instead of a simple if
2025-06-29 16:06:34 -07:00
Frédéric Guillot
a68de4ee6a test(readability): add tests for getArticle function 2025-06-29 16:03:17 -07:00
jvoisin
c064891314 perf(readability): Simplify removeUnlikelyCandidates
- Use an array of strings instead of a regex, like done in ef13756b1a7a7ba30fd34174a5367381fd8b4849
- Extract the `shouldRemove` function from `removeUnlikelyCandidates`, as there
  is no reason to have it there instead of being a proper standalone function.
- Improve a condition, where the goquery selection would have its `id`
  attribute left unchecked if a `class` one was present, regardless of if
  `class` was a candidate to removal or not.
- Add some comments
2025-06-29 15:31:01 -07:00
Frédéric Guillot
5129f53d58 test(readability): add tests for removeUnlikelyCandidates function 2025-06-29 15:23:56 -07:00
Frédéric Guillot
e60f0fd142 test(readability): add tests for getClassWeight function 2025-06-29 13:24:06 -07:00
Julien Voisin
2b26a345cd
perf(processor): minify content even further
There is no need to keep comments (conditionals or not, as IE isn't a thing
anymore), nor default attribute values.
2025-06-29 12:55:34 -07:00
Frédéric Guillot
3de31a1a4d test(processor): add more unit tests for minifyContent function 2025-06-29 12:53:23 -07:00