Update, 2023-06-13: please refer to the github page for how to use the download and import.
The details on the page are out of date, and a recent import showed some of the issues.
I was working with someone to set up another copy of a wiki, and I found that there was a way to download the wiki pages (/wiki/Special:Export/), and that using this it was possible to import them also:
(/wiki/Special:Import).
So I thought I'd try scripting it.
Downloading the pages was the "easy" part. I had to toy around with some things, but I managed to throw together a wiki downloader.
https://github.com/isaaclw/download-wiki
It scans all the pages (from /wiki/Special:AllPages) generates a list of files (which can be cached) and then goes through and downloads each one individually in a folder that you specify.
The three main chunks to download are:
- categories
- files
- templates
- pages (main)
There's a switch for each one on the script. For the complete thing, include all 4.
You can merge the files into one file:
(echo ""; cat *.xml | grep -Ev "^<\/?mediawiki.*?>"; echo " " ) > ~/xml-download.xml
And transfer that to your host, but I had problems loading the large file with mediawiki's php script, so I just transferred each small file. (It crashed part way through, and I had to start over, I wasn't watching so I don't know why)
I'm doing the import on hostmonster, so I had to modify the php.ini file. I added the following to a local configuration:
register_argc_argv=true
I added the 'sleep' because I was afraid over clocking might kill the process on hostmonster. But that might not be needed.
Rebuild recent changes (as requested) after loading everything.
php -c /home/user/php.ini /home/user/public_html/site/maintenance/rebuildrecentchanges.php
(/wiki/Special:Import).
So I thought I'd try scripting it.
Downloading the pages was the "easy" part. I had to toy around with some things, but I managed to throw together a wiki downloader.
https://github.com/isaaclw/download-wiki
It scans all the pages (from /wiki/Special:AllPages) generates a list of files (which can be cached) and then goes through and downloads each one individually in a folder that you specify.
The three main chunks to download are:
- categories
- files
- templates
- pages (main)
There's a switch for each one on the script. For the complete thing, include all 4.
You can merge the files into one file:
(echo "
And transfer that to your host, but I had problems loading the large file with mediawiki's php script, so I just transferred each small file. (It crashed part way through, and I had to start over, I wasn't watching so I don't know why)
I'm doing the import on hostmonster, so I had to modify the php.ini file. I added the following to a local configuration:
register_argc_argv=true
Then I changed directory to the folder with the folders I uploaded and ran this:
find . -type f | while read f; do echo "processing $f"; php -c /home/user/php.ini /home/user/public_html/site/maintenance/importDump.php < "$f"; sleep 1; doneRebuild recent changes (as requested) after loading everything.
php -c /home/user/php.ini /home/user/public_html/site/maintenance/rebuildrecentchanges.php
I also ran 'rebuildall' but I can't remember why.
php -c /home/user/php.ini ./maintenance/rebuildall.php
Import the images:
php -c /home3/user/php.ini /home/user/public_html/site/maintenance/importImages.php /home3/user/public_html/site/path_to_files/
No comments:
Post a Comment