At approximately 19:10 UTC, a public cloud server stopped reporting and rebooted. User services on the affected server became unresponsive as the server was coming back online. After approximately 40 minutes, service was restored to all user services/instances.
A race condition was introduced in the upstream build of SmartOS, the operating system used on Pagoda Box servers. The technical explanation of the race condition can be found here.
A patch has been released to fix the issue. We are running an emergency OS image build to include the patch. However, the affected server booted into the image with the race condition and will need to be updated. We will schedule a maintenance window that will minimize the impact on users, during which, we'll replace the current server image(s) with the patched image.