Forum:Recent Downtime

From Orain Meta
Jump to navigation Jump to search

Good day Orain community, I thought it would be appropriate for me to outline the recent 15 hour downtime of all Orain wikis, how it happened, who was responsible and how we are going to act in future to prevent this or to monitor the servers and systems so we are immediately aware of another global issues like we experienced.

First, I would like to apologies to you all on behalf of the Orain system administrators and how we did not follow a prompt response time to this downtime. Now, around 16 hours ago - an issue occurred on our production 2 host (main webserver) which left the whole farm inoperable for all wikis and our own functions. We are still in aware how this happened but I myself have tracked it down to nginx not properly work at the time. We will investigate this more over today (I certainly will) and if we find any major flaw we will fix it promptly to upgrade software as necessary. Since we are unaware of the fault and any system administrator connected to the server at the time of the downtime - we are currently looking into a server fault as opposed to one if us acting inappropriately or messing as we shouldn't.

On the topic of preventing this and responding immediately to future occurrence, we will look into how we currently monitor the servers and will invest into new software or services as we see fit to provide more appropriate monitoring of the servers and ways we can be notified immediately of future down times or service failures. One way we are keen to look at it IRC notifications from a bot monitoring either pages on the wiki or working with a new server monitoring tool and reporting things as they go.

If you have any questions or wish to say anything, please respond to this post.

Thanks and on behalf of the Orain staff, John (talk) 10:56, 5 April 2014 (UTC)

As a Project Co-Leader, I'd like to personally apologize for the downtime. We're investigating two issues: the server itself being off, and the webserver daemon failing to work. Kudu ~I/O~ 17:25, 5 April 2014 (UTC)