mirror of
https://github.com/miniflux/v2.git
synced 2025-08-26 18:21:01 +00:00
fix(scraper): avoid encoding issue if charset meta tag is after 1024 bytes
This commit is contained in:
parent
af1f966250
commit
6eedf4111f
12 changed files with 352 additions and 10 deletions
48
internal/reader/encoding/testdata/utf8-meta-after-1024.html
vendored
Normal file
48
internal/reader/encoding/testdata/utf8-meta-after-1024.html
vendored
Normal file
|
@ -0,0 +1,48 @@
|
|||
<!DOCTYPE html>
|
||||
<html>
|
||||
<!---
|
||||
|
||||
This text is greater than 1024 bytes which are used by the charset.NewReader to determine the encoding of the file.
|
||||
|
||||
This comment is used to pad the file to 1024 bytes.
|
||||
|
||||
The <meta> tag must be after 1024 bytes to ensure that the encoding is detected correctly.
|
||||
|
||||
---
|
||||
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
More text to pad the file to 1024 bytes.
|
||||
|
||||
-->
|
||||
<head>
|
||||
<meta charset="utf-8">
|
||||
<title>Frédéric</title>
|
||||
</head>
|
||||
<body>
|
||||
<p>Café</p>
|
||||
</body>
|
||||
</html>
|
Loading…
Add table
Add a link
Reference in a new issue