Tech:Incidents/2015-02-07
Jump to navigation
Jump to search
Extension testing which gone wrong and a bad DB list caused 20 minutes of downtime on 7th February 2015.
Timeline
- 21:35 Dusti: merging this commit
- ~21:41 Orain: Orain gone down
- 21:46 Dusti: reports Orain down on IRC
- 21:55 Southparkfan: revert various commits
- 21:57 Southparkfan: forced ansible runs on all servers
- 21:58 Southparkfan: confirmed ansible ran successfully, but instead of blanking out people get 404 Wiki Not Found errors everywhere
- 22:00 Southparkfan: discovers that dblist is empty, replacing it with "metawiki|Orain|en|" to make Orain Meta accessible again for get_db_list.py, and run get_db_list.py on all servers
- 22:03 Southparkfan: all is up again
In Hindsight
- Extension testing should at all time be done on extloadwiki, and not in production.
- The bad dblist issue already caused issues in the past, 8 months ago: Tech:Incidents/2014-06-14. The db fetching script should determine whether a dblist looks sane or not before actually fetching it.
Meta
- Staff on hand in downtime: Dusti, Kudu, Southparkfan
- Report published by: Southparkfan
- Timestamp: 22:20, 7 February 2015 (GMT)