Tech:Incidents/2015-05-ddos

From Orain Meta
Jump to navigation Jump to search

Orain suffered 9 days of consistent issues and downtime for most users due to a UDP DDoS attack which caused DigitalOcean to null route our hosts.

(Times are UTC+0)

Timeline

Note: small(er) issues have been reported before.

Note: DigitalOcean's automated emails for some reason did not get sent during the first stages of the attack.

  • 20 May
  • 06:15 Nagios: nagios sends last alerts out
  • 06:56 Southparkfan: discovers Orain is down and sends mail to staff at orain.org notifying of downtime
  • 07:06 Southparkfan: realizes that prod6 seems down, sends mail to Dusti's and addshore's personal email addresses
  • 10:15 Addshore: mails back and says Orain is up. It is unknown whether addshore accessed Orain with or without IPv6.
  • 10:25 Addshore: mails a picture of prod10 graphs, which includes a graph of the inbound and outbound private and public traffic. There is a public inbound traffic spike, with one hitting 800mb/s inbound traffic
  • 17:54 DigitalOcean Incoming DDoS Detected -- prod5.orain.org
  • 17:54 DigitalOcean Incoming DDoS Detected -- prod6.orain.org
  • 17:54 DigitalOcean Incoming DDoS Detected -- prod8.orain.org
  • 19:57 Southparkfan: FastLizard4 tells me Orain is accessible when using IPv6, but not when using IPv4.
  • 19:58 Southparkfan: tries to SSH into prod10 by using either prod10.orain.org or its public IP, but none of them work. SSH'ing into prod10 by using prod8 as a proxy works though.
  • 22:33 Southparkfan: proposes to FastLizard4 (since he was able to do things, and Southparkfan wasn't) to redirect *.orain.org to prod13-temp.orain.org
  • 22:40 Southparkfan: above setup will break IPv6 support, Southparkfan proposes to revert the whole DNS repo to b83d1ca08fe6bc728427d061b040d0245078e031
  • 22:45 FastLizard4: tries to push commits to the DNS repo, but gets stuck with permission errors.
  • 21 May
  • 06:46 Southparkfan: revert /config to b83d1ca08fe6bc728427d061b040d0245078e031
  • 06:46 Southparkfan: grant operations full access to DNS repo (it already should have, but just add the group in 'Colloborators' too
  • 07:01 Southparkfan: All The Tropes and TestWiki are both confirmed back online, Meta is still down
  • 10:44 FastLizard4: confirms Orain is up
  • 14:36 Southparkfan: confirms Orain is up
  • 16:37 DigitalOcean Incoming DDoS Detected -- prod8.orain.org
  • 16:37 DigitalOcean Incoming DDoS Detected -- prod9.orain.org
  • 17:24 DigitalOcean Incoming DDoS Detected -- prod7.orain.org
  • 17:24 DigitalOcean Incoming DDoS Detected -- prod11.orain.org
  • 17:24 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
  • 17:25 DigitalOcean Incoming DDoS Detected -- prod12.orain.org
  • 22 May
  • 01:39 DigitalOcean Incoming DDoS Detected -- prod11.orain.org
  • 23 May
  • 24 May
  • 19:58 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
  • 25 May
  • 20:30 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
  • 23:41 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
  • 26 May
  • 01:40 DigitalOcean Incoming DDoS Detected -- prod5.orain.org
  • 01:40 DigitalOcean Incoming DDoS Detected -- prod7.orain.org
  • 01:40 DigitalOcean Incoming DDoS Detected -- prod8.orain.org
  • 01:41 DigitalOcean Incoming DDoS Detected -- prod6.orain.org
  • 01:50 DigitalOcean Incoming DDoS Detected -- prod11.orain.org
  • 27 May
  • 01:11 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
  • 28 May
  • Addshore switch orain DNS to CloudFlare (and wait for propagation)
  • 18:04 DigitalOcean Incoming DDoS Detected -- prod10.orain.org
  • 10:34 - Addshore snapshot prod10
  • 10:34 - Addshore snapshot prod7
  • 10:34 - Addshore snapshot prod6
  • 10:35 - Addshore snapshot prod5
  • 10:35 - Addshore snapshot prod8
  • 10:35 - Addshore snapshot prod9
  • 10:36 - Addshore snapshot prod11
  • 10:36 - Addshore snapshot prod12
  • 10:50 - Addshore prod6 done 12 minutes 28 seconds
  • 10:51 - Addshore prod8 done 12 minutes 38 seconds
  • 10:51 - Addshore prod9 done 10 minutes 11 seconds
  • 10:51 - Addshore prod10 done 6 minutes 24 seconds
  • 10:51 - Addshore prod11 done 5 minutes 22 seconds
  • 10:54 - Addshore prod12 done 16 minutes 42 seconds
  • 10:55 - Addshore prod5 done 19 minutes 44 seconds
  • 10:56 - Addshore prod7 done 21 minutes 1 second
  • 10:57 - Addshore Power everything off
  • 11:03 - Addshore All powered off
  • 11:05 - Addshore prod10 create
  • 11:08 - Addshore prod11 create
  • 11:08 - Addshore prod12 create
  • 11:10 - Addshore prod5 create
  • 11:10 - Addshore prod6 create
  • 11:10 - Addshore prod7 create
  • 11:12 - Addshore prod8 create
  • 11:12 - Addshore prod9 create
  • 11:25 - Addshore Push change to ansible rotating all Ips
  • 11:38 - Addshore updating DNS settings
  • 11:55 - Addshore all nginx servers having Permission denied issues
  • 12:17 - Addshore everything up! :)
  • 12:23 - Addshore Delete all old droplets
  • Notes from Addshore
  • prod6 ansible run was failing
  • mail is not working
  • TODO add hack to create wiki to add CNAME to cloudflare


Links