1
0
Fork 0
mirror of https://github.com/miniflux/v2.git synced 2025-06-27 16:36:00 +00:00
Commit graph

14 commits

Author SHA1 Message Date
Frédéric Guillot
6eedf4111f fix(scraper): avoid encoding issue if charset meta tag is after 1024 bytes 2025-02-15 17:05:14 -08:00
Frédéric Guillot
c3c42b0c37 fix(scraper): update TechCrunch scraper rule 2025-01-23 19:29:32 -08:00
Julien Voisin
1b0b8b9c42
refactor: use a better construct than doc.Find(…).First()
As mentioned in goquery's documentation (https://pkg.go.dev/github.com/PuerkitoBio/goquery#Single):

> By default, Selection.Find and other functions that accept a selector string
to select nodes will use all matches corresponding to that selector. By using
the Matcher returned by Single, at most the first match will be selected.
>
> The one using Single is optimized to be potentially much faster on large documents.
2024-12-11 19:40:55 -08:00
3zero2
c6c71c58b8
feat: add predefined scraper rules for arstechnica.com 2024-11-14 17:47:31 -08:00
Frédéric Guillot
29387f2d60 feat: implement base element handling in content scraper 2024-07-25 20:36:56 -07:00
x
839fc3843a Add pitchfork.com scraping rule 2024-06-10 21:08:59 -07:00
jvoisin
fc4bdf3ab0 Inline a one-liner function
No need to expose a symbol for this.
2024-03-20 17:21:30 -07:00
jvoisin
c2d2f31438 Improve a bit internal/reader/scraper/scraper.go
- make findContentUsingCustomRules' more idiomatic,
  since in golang a function returning an error might
  return garbage in other parameter. Moreover, ignoring
  errors is bad practise.
- getPredefinedScraperRules is now running in constant-time,
  instead of iterating on a list with around 50 items in it.
2024-02-26 18:00:23 -08:00
Frédéric Guillot
d0f99cee1a Regression: ensure all HTML documents are encoded in UTF-8
Fixes #2196
2023-12-01 16:52:03 -08:00
Frédéric Guillot
14e25ab9fe Refactor HTTP Client and LocalizedError packages 2023-10-22 13:09:30 -07:00
Frédéric Guillot
c0e954f19d Implement structured logging using log/slog package 2023-09-24 22:37:33 -07:00
jgbresson
691f56fde9 Update rules.go for webtoons.com
Include author text
2023-08-18 16:53:14 -07:00
Frédéric Guillot
e5d9f2f5a0 Rename internal url package to avoid overlap with net/url 2023-08-13 19:57:04 -07:00
Frédéric Guillot
168a870c02 Move internal packages to an internal folder
For reference: https://go.dev/doc/go1.4#internalpackages
2023-08-10 20:29:34 -07:00