ZIM file creation from mediawiki:
ZIM file creation from mediawiki:
I heard news that httrack is not developed since 2017. Github has an html version of the wiki already. In what way can that be improved?
Zim files are single files, but it should be noted that the Firefox reader add-on can only search titles for terms, not content.
I made a functional .zim file using Zimwriterfs and WhonixBOT’s html wiki from Github. Zimwriter requires a png favicon, so I upscaled the Whonix website favicon. HTML wiki is 144MB as a zip, 254 MB uncompressed. The Zim file is 65 MB. I can upload the file if requested.
edit: problem with in-page TOC links (firefox doesn’t know where to find it) but links across different pages works.
How does zimmer fare? Is it easy to set up?
To anyone… How is
whonix-wiki-html? Useful? Any limitations?
As far as I understand https://openzim.org/wiki/Build_your_ZIM_file one can go
website -> HTML dump -> ZIM file
website -> ZIM file
Why not go for the ZIM file? Why take the extra step through HTML?
We need a script or at least instructions on how to create the ZIM file.
Would MWoffliner produce better results?
Zimwriterfs is simple to use, the interface for the viewer is simple too. It uses a start page and has a search function, but only for page titles. Good for kiosk applications.
There are different options for viewers/servers, but I use the extension for Firefox. Some features are crippled (e.g. service workers) so viewing active content is not possible with this browser. Some files error when following links between articles, not sure if that is the extension or the html of the source at fault.
I love it! Browsing complete html is so much better than markdown. Limitation: I remember one or two “link not found” hyperlinks that should have been. That’s a common issue for mirrored websites, though.
No load on Whonix servers and I am comfortable with using the HTML dumps.
Sadly, I don’t know how to script.
Usage: zimwriterfs [mandatory arguments] [optional arguments] HTML_DIRECTORY ZIM_FILE
$ ./zimwriterfs --welcome=Documentation.html --favicon=favicon.png --language=eng -title=‘The Whonix Wiki’ --description=‘A crash course in anonymity and security on the Internet’ --creator=‘Whonix Project’ --publisher=daniel.d whonix-wiki Whonix_Wiki.zim
I’ve been polishing
The issue is introduced with
whonix-wiki-html. I was fixing some broken link issues too. Let’s see if this would be fixed then. Will post when the update is available.
whonix-wiki-html by itself is useful…
Any compelling advantages?
Now documented how one can already use
While imperfect, we now have a functional implementation of offline documentation for all users.
Keep the docs as they are, and forget about Kiwix. The advantage I expected was from an alternative distribution channel to raise awareness. If everything stays first-party that advantage does not exist.
When clicking on images, this is currently broken.
Such as for example when visiting https://www.whonix.org/wiki/Tor_Browser and clicking on an image on the online version the
File: link would pop up. I.e. would show https://www.whonix.org/wiki/File:Tor_browser_how_tor_works.png
These links / files are currently excluded in
whonix-wiki-html. Technically it would probably be easily possible to include these files as well. Would that be worth it?
whonix-wiki-html might increase. I didn’t test how much yet. If we were to use compression (such as in case if we were to pre-install offline documentation inside Whonix-Workstaiton) the difference in disk space use may or may not be negligible. Needs to be tested.
Yeah images would be helpful in some tutorials on there. Depends on how many there are though. Too much may be a lot of hassle to fix.
Probably not feasible to manually curate which
wiki/File:s links should be included and which one not. It’s either all or none.
About the size…
wiki without files decompressed folder (without
wiki without files brotli compressed file:
wiki with files decompressed folder (without
wiki with files brotli compressed file:
In summary: Currently 47 MB without-images vs 127 MB with-images. Which one shall it be?
Probably trivial to add
wiki/File:s links. Just tested download and checked size just now. Didn’t preview yet. But I don’t foresee any issues.
Are we using the most size efficient image format (webp)? Perhaps there’s more saving to be wrung out this way.
Probably not an option anytime soon. Offline documentation is based on webpage2html. It inline all images into the html source. Development activity doesn’t seem active enough to implement better image formats to save space. Dunno if we had webp at the source (whonix.org mediawiki) if webpage2html would inline in another format. Untested.
Well, perhaps webp should be used at the source (whonix.org mediawiki) anyhow. Also not easy at all.
Mediawiki also isn’t clever enough yet to convert images to webp.
Manually switching all images to webp is an insurmountable task.
Perhaps I can somehow run a script on the server to convert create webp and then have nginx servce webp.
That seems useful either way.
Possible but not trivial.
I would need some sting manipulation command to change from.
<a class="image" data-href="/w/index.php?title=File:Tbbbbbb.png&filetimestamp=20180917091939&" href="https://www.whonix.org/w/index.php?title=File:Tbbbbbb.png&filetimestamp=20180917091939&"><img alt="Tbbbbbb.png"
<a class="image" data-href="/w/index.php?title=File:Tbbbbbb.png&filetimestamp=20180917091939&" href="./File:Tbbbbbb.png.html&filetimestamp=20180917091939&"><img alt="Tbbbbbb.png"
Tbbbbbb.png is variable content. Could also be something else such as
&filetimestamp=20180917091939& (also variable content) would be stripped. Could be multiple commands to process this. As many commands as neccessary. Doesn’t need to be one unified super complex regex.)