Tech:Incidents/2015-02-07

From Orain Meta
Jump to navigation Jump to search

Extension testing which gone wrong and a bad DB list caused 20 minutes of downtime on 7th February 2015.

Timeline

  • 21:35 Dusti: merging this commit
  • ~21:41 Orain: Orain gone down
  • 21:46 Dusti: reports Orain down on IRC
  • 21:55 Southparkfan: revert various commits
  • 21:57 Southparkfan: forced ansible runs on all servers
  • 21:58 Southparkfan: confirmed ansible ran successfully, but instead of blanking out people get 404 Wiki Not Found errors everywhere
  • 22:00 Southparkfan: discovers that dblist is empty, replacing it with "metawiki|Orain|en|" to make Orain Meta accessible again for get_db_list.py, and run get_db_list.py on all servers
  • 22:03 Southparkfan: all is up again

In Hindsight

  • Extension testing should at all time be done on extloadwiki, and not in production.
  • The bad dblist issue already caused issues in the past, 8 months ago: Tech:Incidents/2014-06-14. The db fetching script should determine whether a dblist looks sane or not before actually fetching it.

Meta

  • Staff on hand in downtime: Dusti, Kudu, Southparkfan
  • Report published by: Southparkfan
  • Timestamp: 22:20, 7 February 2015 (GMT)