
But the important thing is that the file is sitting on my nas, Yay Setting up the reader Well next was downloading the dump, at least they provided a torrent, and with 3/4 peers at max it took around 2 days to download which is a lot less fun. This dump has 21,958,765 articles and weight 161GB, not bad! Someone generated a zim file for September 2021 (which still works for me). So to solve this I browsed a bit more and of course reddit came to the rescue: The thing is, I'm not in the mood of writing a script to process 162 GB of data (I will probably write it a some point because it'll probably be a great exercise about managing huge datasets). Looking at the -PostHistory.7z we can see that the dump was done December of 2021, much better, the dump is also 162 GB 🤯 So how do we get a newer ZIM file, well searching around it turns out that the Stack Exchange post database dumps quarterly on the internet archive: There is another issue tho, the dump provided by kiwix dates from February 2019 which isn't that old, but, in a field such a software/firmware development which is constantly moving and innovating I feel that having a 3 years old dump might lack some newer stuff. Īmong them is Stack Overflow in multiple languages (es, ja, pt, ru) including the main one which is a mind-bending 134 GB of compressed data, that's more than the entire dump of English Wikipedia (2021-12) which is around 86 GB 😲. Well, the kiwix wiki provides a list of content. Well that sounds perfect, where can I get the Stack Overflow one. The format allows for the compression of articles, features a full-text search index and native category and image handling similar to MediaWiki, and the entire file is easily indexable and readable using a program like Kiwix – unlike native Wikipedia XML database dumps. Its primary focus is the contents of Wikipedia and other Wikimedia projects.
Kiwix get new files path Offline#
The ZIM file format is an open file format that stores wiki content for offline usage. ZIM files ZIM stands for "Zeno IMproved", as it replaces the earlier Zeno file format. quite big and while I probably would have the space it's really stupid especially considering the second option So how do I download Stack Overflow locally on my NAS? The noobie approach would be to just scrape every question and re-host the site statically, the issue is that Stack Overflow is well. When it's 15min of downtime that's OK I deal with it, and I even then, I'll probably not notice, but when it's 2 hours in the middle of a project it's quite annoying.
Kiwix get new files path code#
When writing software, there is always something that's the dev worst nightmare, be it a mix-match of tab and space in python, a missing semicolon in c, you get the point.īut the most scary thing is when you can't steal code search for help
