There is no need to use SHA256 everywhere, especially on small inputs where we
don't care about its cryptographic properties. We're using FNV as it's the
faster available hash in go's standard library, and we're picking its "a"
version as it's slightly better avalanche characteristics, which are
relevant for small inputs.
This commit has the side-effect of invalidating all favicons saved in the
database, which is desirable to benefit from the resize process implemented in
777d0dd2, as it didn't apply retro-actively.
We're also making use of hex.EncodeToString instead of fmt.Sprintf, as it's
marginally faster.
Note that we can't change the usage of sha256 for feed.Hash as it's used to
deduplicate entries in the database.
The `sanitizer.StripTags` function is calling `html.NewTokenizer`, which is
allocating a 4096 bytes buffer on the heap, as well a running a complex state
machine to tokenize html. There is no need to do all of this for empty strings.
This commit also fixes a TrimSpace/StripTags call inversion.
Calls to urllib.AbsoluteURL take a bit less than 10% of the time spent in
parser.ParseFeed, completely parsing an url only to check if it's absolute, and
if not, to make it so.
Checking if it starts with `https://` or `http://` is usually enough to find if
an url is absolute, and if is doesn't, it's always possible to fall back to
urllib.AbsoluteURL.
This also comes with the advantage of reducing heap allocations, as most of the
time spent in urllib.AbsoluteURL is heap-related (de)allocations.
This commit adds a bunch of checks to prevent reader/rss from adding empty tags
to rss items, as well as some minor refactors like nested conditions and loops
unrolling.