We had downtime tonight between the following times, due to a fault UPS.
Aaaaand that backup apparently was 1.5 years old. :( We rolled back to August 6 again, and applied a backup that was taken of core code only. Anything that has happened to player or domain code, or character data, after August 6, can be considered lost.
Regrettably, Kjelle's full backup had captured similar corruption to that of the central backup. I have rolled back to the backup from August 6, and applied Kjelle's backup of code only, which seems to be more like what we wanted.
Kjelle had a backup of most of everything that was up-to-date as late as maybe a couple of hours before the crash, and he transferred that backup to the server so that I could restore from it. The administration is testing.
Around 2014-08-11 00:00 - 00:30 UTC, VikingMUD suffered a catastrophic system failure. Even though the server was based on a RAID-5 with a hotspare for data storage, two disks failed completely in short order, and the remaining managed to corrupt parts of their data. Additionally, a fault in the server motherboard prevented normal bootup.
My guess is that a power spike has occurred and fried several components. Since the server is protected by a server grade UPS, this probably means a faulty powersupply in the server itself, which isn't much of a consolation.
I've been working on restoring data and services from backup, which has regrettably turned out to be more difficult than anticipated. The data corruption occured shortly before or during the backup, which means that the daily backup got corrupted at the backup host. This leaves a periodic full backup from August 1-6, which I've restored from. The backup SHOULD include most of everything as of 2014-08-06 00:00 UTC, but I can't really be sure.
The administration is informed about the situation, and I've also posted some updates to our Facebook page during restore.
What remains to do now, is to see whether some potential live backups that may have been made to a secondary server are functional, or if the corruption has spread irrepairably there as well. In the meantime, we're not quite reopening the MUD for service. Also, other services on the server may be unstable, unreliable or simply dysfunctional as I try to find what needs fixing.