From 8ffef6d40951505bfd27e889d15d7adafb9bd22b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicolas=20L=C5=93uillet?= Date: Wed, 30 Jul 2014 05:13:53 -0700 Subject: [PATCH] Created Creating a config file for well parsing a website (markdown) --- ...-config-file-for-well-parsing-a-website.md | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 Creating-a-config-file-for-well-parsing-a-website.md diff --git a/Creating-a-config-file-for-well-parsing-a-website.md b/Creating-a-config-file-for-well-parsing-a-website.md new file mode 100644 index 0000000..c142566 --- /dev/null +++ b/Creating-a-config-file-for-well-parsing-a-website.md @@ -0,0 +1,20 @@ +If wallabag is not able to correctly fetch an article, you can create a file for the website which causes trouble. + +Here is an example: + +For bfmtv.com, you must have a specific file. Create a `bfmtv.com.txt` file in `/inc/3rdparty/site_config/custom` with this content: + +``` +title: //title +body: //h2 | //span[@class='masque'] | //article[@class='corps_article_right'] +prune: no +tidy: no + +test_url: http://www.bfmtv.com/societe/cigarette-electronique-dangers-588622.html +``` + +The syntax for `title` and `body` parameters is http://en.wikipedia.org/wiki/XPath|XPath. + +You can also try [Visual content block selector](http://siteconfig.fivefilters.org/). + +You can find the files already created for specific websites here: https://github.com/wallabag/wallabag/tree/master/inc/3rdparty/site_config/standard \ No newline at end of file