Saturday, September 27, 2014

Transwiki copying

Update, 2023-06-13: please refer to the github page for how to use the download and import.
The details on the page are out of date, and a recent import showed some of the issues.


I was working with someone to set up another copy of a wiki, and I found that there was a way to download the wiki pages (/wiki/Special:Export/), and that using this it was possible to import them also:
(/wiki/Special:Import).

So I thought I'd try scripting it.

Downloading the pages was the "easy" part. I had to toy around with some things, but I managed to throw together a wiki downloader.

https://github.com/isaaclw/download-wiki

It scans all the pages (from /wiki/Special:AllPages) generates a list of files (which can be cached) and then goes through and downloads each one individually in a folder that you specify.


The three main chunks to download are:
- categories
- files
- templates
- pages (main)

There's a switch for each one on the script. For the complete thing, include all 4.

You can merge the files into one file:
(echo ""; cat *.xml | grep -Ev "^<\/?mediawiki.*?>"; echo "" )  > ~/xml-download.xml
And transfer that to your host, but I had problems loading the large file with mediawiki's php script, so I just transferred each small file. (It crashed part way through, and I had to start over, I wasn't watching so I don't know why)

I'm doing the import on hostmonster, so I had to modify the php.ini file. I added the following to a local configuration:
register_argc_argv=true

Then I changed directory to the folder with the folders I uploaded and ran this:
find . -type f | while read f; do echo "processing $f"; php -c /home/user/php.ini /home/user/public_html/site/maintenance/importDump.php < "$f"; sleep 1; done

I added the 'sleep' because I was afraid over clocking might kill the process on hostmonster. But that might not be needed.

Rebuild recent changes (as requested) after loading everything.
php -c /home/user/php.ini /home/user/public_html/site/maintenance/rebuildrecentchanges.php

I also ran 'rebuildall' but I can't remember why.
php -c /home/user/php.ini ./maintenance/rebuildall.php


Import the images:
php -c /home3/user/php.ini /home/user/public_html/site/maintenance/importImages.php /home3/user/public_html/site/path_to_files/

No comments:

Post a Comment