[Resolved] xn2 drive failure

January 7th, 2012

Our monitoring picked up a failed drive in the RAID10 array on xn2 earlier this evening.

The missbehaving drive has been hot-swapped out and the array is rebuilding. There was no interruption to service, but note I/O will be a little slow while the array restores full redundancy.

Thanks

[Completed] cp1 maintenance

December 10th, 2011

Following a previous crash due to high load, we are doing a bit of maintenance on this server to ensure everything is running optimally.

During this time there may be some brief load spikes as we apply updates. Appologies for any inconvenience

Update: This has been completed and we are seeing improved disk performance and lower load.

[Resolved] cp1 outage

December 9th, 2011

cp1 our main shared/reseller cPanel system has crashed and is rebooting.

Its currently doing an automatic FSCK of the filesystem as it’s last reboot was over 250 days ago. Will be up soon.

Update: Server is now up and load is slowly coming down. Will keep an eye on this for the next 24hrs.

[Resolved] Xn5 Outage

December 6th, 2011

xn5 appears to be down and is being checked

Update: Looks like a VPS is under attack and is causing high packet loss. Working to get this under control ASAP.

Update 2: Service restored and we will continue to monitor. This was actually caused by a FreeBSD VM which had run out of disk space and was maxing out CPU and disk, very unusual!

Downtime: Approx 4mins

[Completed] Disk Replacement on VZ2

November 13th, 2011

Below is an email sent to all users on vz2 approx 2PM 12/11/11:

You are recieving this email because you have one or more VPS on the OpenVZ Node, vz2.pcsmarthosting.co.uk

A drive has failed in the RAID10 array on this machine and requires replacement. Due to previous issues we have had on this node following hot-swap disk replacements, we are going to shut down the machine cleanly, replace the drive and power it back up again.

We are sorry for any incovenience this may cause, especially as we only replaced two other drives a couple of months ago. This machine is running a 4x disk RAID10 array and it’s not uncommon for drives to fail around the same time as they are all from the same batch.

Outage Details:

Task: Replace failed drive in the RAID array.

Scheduled Time: 11.00PM UK Time (GMT) on Sunday 13/11/2011

Expected Downtime: Approx 20 Minutes, though as VPS are started individually it may vary.

If you have any questions please don’t hesitate to open a ticket.

Update: The server halted much faster than expected and is now down. We are waiting for remote hands to swap the drive out for us and power on.

Update 2: The drive has been replaced but is not being picked up. Looks like a blackplane/RAID controller issue. We will arrange another maintenance in due course to resolve this.

[Completed] Planned maintenance on xn15

October 20th, 2011

Below is a copy of an email sent to customers on 20/11/2011:

This email is to inform you about some upcoming planned maintenance, affecting a small number of customers on the machine xn15.pcsmarthosting.co.uk. To check if you have a VPS on this machine, log into SolusVM and check the Location at https://solusvm.dns4vps.com:5656 Or just open a ticket with your IP Address and we will check for you.

If your VPS is not hosted on XN15, please discard this email as it does not affect you.

If your VPS is hosted on XN15, our datacenter will be moving this node onto the new network setup with increased resilience, as we have done with our other servers a few months back.

This will be taking place between 1AM and 3AM (UK Time, GMT+1) on 10/11/2011 and the server will be down for approximately 30 minutes while it is moved to another rack and powered up.

If you have any questions please let us know.

Regards,

The PCSmart Team

[Resplved] Xn2 Outage

September 24th, 2011

xn2 appears to be down and is being checked.

The server is now back and VPS are starting

[Resolved] VZ2 Issue

September 14th, 2011

One of the new Western Digital drives fitted only last week has failed in VZ2 and caused the host filesystem to go read-only. We are working on this and service will be restored ASAP.

Update: We have been waiting over 15mins so far and no response from our datacenter (iomart). Further updates will follow…
Update 2: We now have a USB DVD drive connected at last and are working to restore the machine.
Update 3: Since /lib is missing and a few other parts of the OS are resolving to the wrong files we are going to take the fastest route which is to reload the OS on the host. Customer data and configs are on a seperate partition so your data should be safe. Reload is currently 50% so ETA around 20 minutes and we will be up.
Update 4: OS reload complete. We are now doing a quick OS update, installing OpenVZ, rebooting and getting VPS up. SolusVM will be restored shortly after.
Update 5: Server is now booting into OpenVZ
Update 6: VPS are now booting up one by one and we are restoring SolusVM access to this node.
Update 7: SolusVM now restored, VPS are booting up.

[Resolved] VZ2 and XN2 Outages

August 31st, 2011

From 8PM on 02/09/11 I am going to be onsite doing some work on a few machines.

VZ2 will be down while we troubleshoot a RAID controller issue and replace two faulty disks.
XN2 will be rebooted as the RAID controller has not detected a new disk correctly which was hot-swapped in.

If your VPS is currently down we appologise for the inconvenience, but its important we return your host machine to its fully working state with an Optimal RAID10 disk array.

Thanks

[Complete] UK Network Maintenance/Upgrades 27/08/2011

August 26th, 2011

As per an email sent out to all customers last week, we are carrying out some network upgrades on Saturday 27th August at 10.00PM (GMT+1). This is to reconfigure our network with a Cisco HSRP setup for improved redundancy incase of any router issues.

There will be a short outage between 10-20 minutes affecting all of our UK servers, except XN15 and XN16 which are in different racks.

Updates will be posted here throughout.

21.50: We are now preparing for the upgrade
22.03: Upgrade has started. Network is now down
22.11: Conectivity is restored. All finished!