Tech:Incidents/2015-05-ddos
(Redirected from Tech:Incidents/2015-05-21-routing)
Orain suffered 9 days of consistent issues and downtime for most users due to a UDP DDoS attack which caused DigitalOcean to null route our hosts.
(Times are UTC+0)
Timeline
Note: small(er) issues have been reported before.
Note: DigitalOcean's automated emails for some reason did not get sent during the first stages of the attack.
- 20 May
- 06:15 Nagios: nagios sends last alerts out
- 06:56 Southparkfan: discovers Orain is down and sends mail to stafforain.org notifying of downtime
- 07:06 Southparkfan: realizes that prod6 seems down, sends mail to Dusti's and addshore's personal email addresses
- 10:15 Addshore: mails back and says Orain is up. It is unknown whether addshore accessed Orain with or without IPv6.
- 10:25 Addshore: mails a picture of prod10 graphs, which includes a graph of the inbound and outbound private and public traffic. There is a public inbound traffic spike, with one hitting 800mb/s inbound traffic
- 17:54 DigitalOcean Incoming DDoS Detected -- prod5.orain.org
- 17:54 DigitalOcean Incoming DDoS Detected -- prod6.orain.org
- 17:54 DigitalOcean Incoming DDoS Detected -- prod8.orain.org
- 19:57 Southparkfan: FastLizard4 tells me Orain is accessible when using IPv6, but not when using IPv4.
- 19:58 Southparkfan: tries to SSH into prod10 by using either prod10.orain.org or its public IP, but none of them work. SSH'ing into prod10 by using prod8 as a proxy works though.
- 22:33 Southparkfan: proposes to FastLizard4 (since he was able to do things, and Southparkfan wasn't) to redirect *.orain.org to prod13-temp.orain.org
- 22:40 Southparkfan: above setup will break IPv6 support, Southparkfan proposes to revert the whole DNS repo to b83d1ca08fe6bc728427d061b040d0245078e031
- 22:45 FastLizard4: tries to push commits to the DNS repo, but gets stuck with permission errors.
- 21 May
- 06:46 Southparkfan: revert /config to b83d1ca08fe6bc728427d061b040d0245078e031
- 06:46 Southparkfan: grant operations full access to DNS repo (it already should have, but just add the group in 'Colloborators' too
- 07:01 Southparkfan: All The Tropes and TestWiki are both confirmed back online, Meta is still down
- 10:44 FastLizard4: confirms Orain is up
- 14:36 Southparkfan: confirms Orain is up
- 16:37 DigitalOcean Incoming DDoS Detected -- prod8.orain.org
- 16:37 DigitalOcean Incoming DDoS Detected -- prod9.orain.org
- 17:24 DigitalOcean Incoming DDoS Detected -- prod7.orain.org
- 17:24 DigitalOcean Incoming DDoS Detected -- prod11.orain.org
- 17:24 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 17:25 DigitalOcean Incoming DDoS Detected -- prod12.orain.org
- 22 May
- 01:39 DigitalOcean Incoming DDoS Detected -- prod11.orain.org
- 23 May
- 24 May
- 19:58 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 25 May
- 20:30 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 23:41 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 26 May
- 01:40 DigitalOcean Incoming DDoS Detected -- prod5.orain.org
- 01:40 DigitalOcean Incoming DDoS Detected -- prod7.orain.org
- 01:40 DigitalOcean Incoming DDoS Detected -- prod8.orain.org
- 01:41 DigitalOcean Incoming DDoS Detected -- prod6.orain.org
- 01:50 DigitalOcean Incoming DDoS Detected -- prod11.orain.org
- 27 May
- 01:11 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 28 May
- Addshore switch orain DNS to CloudFlare (and wait for propagation)
- 18:04 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 29 May
- 02:54 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
- 10:00 - Addshore remove all public Ips from ansible playbook
- 10:33 - Addshore Power everything off
- 10:34 - Addshore snapshot prod10
- 10:34 - Addshore snapshot prod7
- 10:34 - Addshore snapshot prod6
- 10:35 - Addshore snapshot prod5
- 10:35 - Addshore snapshot prod8
- 10:35 - Addshore snapshot prod9
- 10:36 - Addshore snapshot prod11
- 10:36 - Addshore snapshot prod12
- 10:50 - Addshore prod6 done 12 minutes 28 seconds
- 10:51 - Addshore prod8 done 12 minutes 38 seconds
- 10:51 - Addshore prod9 done 10 minutes 11 seconds
- 10:51 - Addshore prod10 done 6 minutes 24 seconds
- 10:51 - Addshore prod11 done 5 minutes 22 seconds
- 10:54 - Addshore prod12 done 16 minutes 42 seconds
- 10:55 - Addshore prod5 done 19 minutes 44 seconds
- 10:56 - Addshore prod7 done 21 minutes 1 second
- 10:57 - Addshore Power everything off
- 11:03 - Addshore All powered off
- 11:05 - Addshore prod10 create
- 11:08 - Addshore prod11 create
- 11:08 - Addshore prod12 create
- 11:10 - Addshore prod5 create
- 11:10 - Addshore prod6 create
- 11:10 - Addshore prod7 create
- 11:12 - Addshore prod8 create
- 11:12 - Addshore prod9 create
- 11:25 - Addshore Push change to ansible rotating all Ips
- 11:38 - Addshore updating DNS settings
- 11:55 - Addshore all nginx servers having Permission denied issues
- 12:17 - Addshore everything up! :)
- 12:23 - Addshore Delete all old droplets
- Notes from Addshore
- prod6 ansible run was failing
- mail is not working
- TODO add hack to create wiki to add CNAME to cloudflare