Virtual instances - Vancouver, CanadaNode v1215: File system corruption (moved to node v1210)
20:01 PST: After previous power outage incident earlier today node v1215 did not fully boot due to file system corruption. We are currently investigating this.
20:14 PST: Our engineers have begun file system recovery. Due to the amount of data occupied by client's disk images recovery is likely to take 3-6 hours.
01:04 PST: We have some good progress with file system recovery, however at this point it is evident that a subset of client's VMs are likely to be affected by some level of data corruption.
02:11 PST: File system recovery has been completed. All affected clients on node v1215 have been automatically migrated to node v1210 and restarted.
If your VM does not start, it may be due to disk image corruption.
We employ RAID on all our nodes without exception, however RAID does not protect against file system corruption. While we always strongly advise all our clients to have offsite backups, our KiwiVM panel offers free automated backups where no user action is required.
If you find your VM failing to boot, check your KiwiVM panel for recent automated backups. Such backups can be imported into Snapshots and then Snapshot can be restored back onto your VM. Our backups are stored offsite and hence are not affected by this incident.
We thank all our customers for patience during this recovery process.
At 17:50 PST we were alerted by our monitoring system of a multiple server outage in our Vancouver datacenter. Upon first investigation it appears to be a power related outage. We are checking with the datacenter for further information.
At 18:01 PST technician was dispatched to the datacenter for further investigation. He will arrive within 20 minutes.
At 18:26 PST engineer is on site investigating
At 18:38 PST we have determined that the power circuit external to our rack is not providing power for unknown reason.
We are in touch with the building management on possible resolutions
At 18:55 PST Datacenter management dispatched an engineer to fix the circuit.
Meanwhile, we were able to restore power to 3 nodes: v1213, v1214 and v1216 (we moved them onto another available circuit)
At 19:32 PST circuit was restored, we will be powering on the remaining servers shortly.
At 19:54 PST remaining nodes to power on: v1215, v1218, v1219
At 19:58 PST All nodes are back online except v1215, which went into fsck upon boot. We will create a new incident for affected clients on node v1215.