miniflux-v2

mirror of https://github.com/miniflux/v2.git synced 2025-06-27 16:36:00 +00:00

Author	SHA1	Message	Date
Frédéric Guillot	6d58052504	fix(readability): do not remove elements within code blocks `<span class="hljs-comment"># exit 1</span>` will match the `unlikelyCandidatesRegexp` because it contains the `comment` string.	2025-06-19 21:03:53 -07:00
Frédéric Guillot	db49e41acf	refactor(processor): move FilterEntryMaxAgeDays filter to filter package	2025-06-19 17:56:45 -07:00
Frédéric Guillot	e6b814199b	feat(filter): add `EntryDate=max-age:duration` filter Example: `EntryDate=max-age:30d` or `EntryDate=max-age:1h` Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h", "d".	2025-06-19 17:25:19 -07:00
Frédéric Guillot	9c05c3c493	feat(filter): merge user and feed entry filter rules	2025-06-19 16:24:57 -07:00
Frédéric Guillot	2a9d91c783	feat: add entry filters at the feed level	2025-06-19 15:15:16 -07:00
Frédéric Guillot	cb59944d6b	refactor(processor): move `RewriteEntryURL` function to `rewrite` package	2025-06-19 13:22:29 -07:00
Frédéric Guillot	c12476c1a9	refactor(filter): avoid code duplication between IsBlockedEntry and IsAllowedEntry functions	2025-06-19 12:55:00 -07:00
Frédéric Guillot	bc6ab44ff2	fix(filter): skip invalid rules instead of exiting the loop	2025-06-19 12:36:35 -07:00
Frédéric Guillot	6282ac1f38	refactor(processor): move filters to a `filter` package	2025-06-19 12:06:30 -07:00
jvoisin	96c0ef4efd	refactor(processor): massive refactoring of filters.go - Use proper variable names for `key=value` strings parts - Explicitly assign false to the `match` boolean - Use an explicit `len(parts) == 2` assertion to help the compiler remove `isSliceInBounds` calls. - Refactor identical code into a containsRegexPattern function. - Early exit when parsing the first date fails when using the `Between` operator, instead of trying to parse the second one.	2025-06-19 11:43:47 -07:00
jvoisin	b139ac4a2c	refactor(youtube): Remove a regex and make use of fetchWatchTime	2025-06-19 11:43:47 -07:00
jvoisin	c818d5bbb8	refactor(youtube): initiliaze two maps to the proper length	2025-06-19 11:43:47 -07:00
jvoisin	e366710529	refactor(processor): remove a useless type declaration	2025-06-19 11:43:47 -07:00
jvoisin	5cff4d7117	refactor(processor): remove a duplication function call As youtubeVideoID is assigned to getVideoIDFromYouTubeURL(entry.URL), there is no need to call the latter again when we can simly use youtubeVideoID instead.	2025-06-19 11:43:47 -07:00
jvoisin	f31a784eaa	refactor(processor): refactor common code into a fetchWatchTime function Both nebula and odysee were using the same function to parse time.	2025-06-19 11:43:47 -07:00
jvoisin	7edfcc3cf7	refactor(processor): remove a useless type declaration	2025-06-19 11:43:47 -07:00
jvoisin	fe4b00b9f8	refactor(processor): extract some functions into an utils.go file	2025-06-19 11:43:47 -07:00
jvoisin	46b159ac58	refactor(processor): simplify bilibili processing - Use strings.Contains instead of a regex - Use strings concatenation instead of a call to fmt.Sprintf - Use `any` instead of `interface{}`	2025-06-19 11:43:47 -07:00
jvoisin	86c58e11f6	perf(reader): use a non-cryptographic hash when possible There is no need to use SHA256 everywhere, especially on small inputs where we don't care about its cryptographic properties. We're using FNV as it's the faster available hash in go's standard library, and we're picking its "a" version as it's slightly better avalanche characteristics, which are relevant for small inputs. This commit has the side-effect of invalidating all favicons saved in the database, which is desirable to benefit from the resize process implemented in `777d0dd2`, as it didn't apply retro-actively. We're also making use of hex.EncodeToString instead of fmt.Sprintf, as it's marginally faster. Note that we can't change the usage of sha256 for feed.Hash as it's used to deduplicate entries in the database.	2025-06-18 20:28:23 -07:00
jvoisin	43546976d2	refactor(tests): use b.Loop() instead of for range b.N See https://tip.golang.org/doc/go1.24#new-benchmark-function	2025-06-18 20:12:55 -07:00
Frédéric Guillot	6af4d69c39	test(sanitizer): add test case to cover Vimeo iframe rewrite without query string	2025-06-17 17:55:39 -07:00
Frédéric Guillot	27015a5e34	test(sanitizer): add unit test for 0x0 pixel tracker	2025-06-17 17:42:55 -07:00
jvoisin	cdb57b3843	perf(sanitizer): minor simplifications of the sanitizer - Factorize some conditions - Remove useless `default` case and move the return at the end of the functions - Use strings.CutPrefix instead of strings.HasPrefix + strings.TrimPrefix - Use switch-case constructs instead of slices.Contains, as this reduces the complexity of the functions and allows them to be inlined, as well as helping the compiler to optimize them, as it sucks at interprocedural optimizations.	2025-06-17 17:42:45 -07:00
jvoisin	152ef578d2	feat(sanitizer): consider images of size 0x0 as pixel trackers	2025-06-17 17:32:00 -07:00
jvoisin	72486b9bd1	refactor(processor): minor simplification of a loop This makes the code a tad clearer.	2025-06-17 17:30:13 -07:00
jvoisin	81df0b2a16	perf(rewrite): make getPredefinedRewriteRules O(1)	2025-06-17 17:27:36 -07:00
jvoisin	b296f21e98	refactor(internal): add an urllib.DomainWithoutWWW function	2025-06-17 17:27:36 -07:00
jvoisin	af15032145	perf(fetcher): pre-allocate the cipherSuites	2025-06-17 16:53:00 -07:00
jvoisin	8660f5e3c7	perf(media): minor regex simplification The previous regex was using the [ABC..D]*[ABC] pattern, resulting in a lot of backtracking. The new regex is stopping the matching at the first space or end of text (and removes the trailing `.` should one be present). The backtracking was taking around 50% of the CPU time spent in atom.Parse	2025-06-17 16:49:07 -07:00
Frédéric Guillot	da4ab4263c	feat(rewrite): add `parkablogs.com` to the referer override list	2025-06-16 20:28:11 -07:00
jvoisin	237672a62c	perf(sanitizer): use a switch-case instead of a map This removes a heap allocation, and should be way faster. It also makes the code shorted/simpler.	2025-06-16 14:54:48 -07:00
jvoisin	e9d4a130fd	refactor(sanitizer): remove two useless `www.` prefixes No need to have those prefixes, as the check is for substrings, so removing them will improve the amount of matches.	2025-06-16 14:53:15 -07:00
Frédéric Guillot	b95c9023ee	refactor(sanitizer): make `isValidAttribute()` check O(1)	2025-06-13 21:44:25 -07:00
Frédéric Guillot	3538c4271b	refactor(sanitizer): use global variables to avoid recreating slices on every call	2025-06-13 21:34:07 -07:00
Frédéric Guillot	ac44507af2	refactor(sanitizer): use a map for iframe allow list	2025-06-13 21:05:23 -07:00
jvoisin	44c48d109f	perf(sanitizer): extract a call to url.Parse and make intensive use of it Previously, url.Parse(baseUrl) was called on every self-closing tags, and on most opening tags, accounting for around 15% of the CPU time spent in processor.ProcessFeedEntries	2025-06-13 17:05:17 -07:00
Frédéric Guillot	40727704c2	feat(rewrite): add support for YouTube Shorts video URL pattern	2025-06-12 21:02:46 -07:00
jvoisin	8a014c6abc	perf(readability): minor regex improvement - Improve the check for tags by matching only if its name is followed either by a space, a slash or a closing angle - Use an anonymous group	2025-06-12 19:13:58 -07:00
jvoisin	60ad19c427	perf(rss): early return when looking for an item's author The `sanitizer.StripTags` function is calling `html.NewTokenizer`, which is allocating a 4096 bytes buffer on the heap, as well a running a complex state machine to tokenize html. There is no need to do all of this for empty strings. This commit also fixes a TrimSpace/StripTags call inversion.	2025-06-11 19:06:15 -07:00
jvoisin	f40c1e7f63	fix(reader): fix a crash introduced by `d59990f1` And add a fuzzer and a testcase as well to validate that nothing breaks.	2025-06-11 19:04:46 -07:00
Frédéric Guillot	a4d16cc5c1	refactor(rewrite): rename `Rewriter` function to `ApplyContentRewriteRules`	2025-06-10 20:28:15 -07:00
jvoisin	7c857bdc72	perf(reader): optimize RemoveTrackingParameters A bit more than 10% of processor.ProcessFeedEntries' CPU time is spent in urlcleaner.RemoveTrackingParameters, specifically calling url.Parse, so let's extract this operation outside of it, and do it once before calling urlcleaner.RemoveTrackingParameters multiple times. Co-authored-by: Frédéric Guillot <f@miniflux.net>	2025-06-10 19:29:25 -07:00
jvoisin	0caadf82f2	perf(rss): optimize a bit BuildFeed Calls to urllib.AbsoluteURL take a bit less than 10% of the time spent in parser.ParseFeed, completely parsing an url only to check if it's absolute, and if not, to make it so. Checking if it starts with `https://` or `http://` is usually enough to find if an url is absolute, and if is doesn't, it's always possible to fall back to urllib.AbsoluteURL. This also comes with the advantage of reducing heap allocations, as most of the time spent in urllib.AbsoluteURL is heap-related (de)allocations.	2025-06-10 19:23:16 -07:00
Frédéric Guillot	cecc18420d	feat(sanitizer): add validation for empty width and height attributes in img tags	2025-06-09 20:38:17 -07:00
Frédéric Guillot	d53fd17e10	feat(sanitizer): validate MathML XML namespace	2025-06-09 20:28:54 -07:00
Frédéric Guillot	21d22d7f0b	feat(sanitizer): add support for fetchpriority and decoding attributes in img tags	2025-06-09 20:12:15 -07:00
jvoisin	d59990f1dd	perf(xml): optimize xml filtering Instead of using bytes.Map which is returning a copy of the provided []byte, use a custom in-place implementation, as the bytes.Map call is taking around 25% of rss.Parse	2025-06-09 13:49:10 -07:00
jvoisin	49085daefe	perf(xml): optimized NewXMLDecoder io.ReadAll is growing the underlying buffer progressively, while io.Copy is able to allocate it in one go, which is significantly faster. io.ReadAll is currently accounting for around 10% of the CPU time of rss.Parse	2025-06-09 13:49:10 -07:00
Frédéric Guillot	8db637cb39	feat(ui): add user setting to control `target="_blank"` on links Rationale: Opening links in the current tab is the default browser behavior. Using `target="_blank"` on external links can lead to accessibility issues and override user preferences. It may also interfere with assistive technologies and expected browser behavior. To maintain backward compatibility, this option is enabled by default (`true`), which adds `target="_blank"` to links.	2025-06-08 21:07:11 -07:00
Frédéric Guillot	8142268799	feat: populate feed description automatically	2025-05-24 21:15:52 -07:00

1 2 3 4 5 ...

264 commits