Tech:Server admin log: Difference between revisions

+addshore fixed problems yesterday with disk space
imported>Addshore
(+)
(+addshore fixed problems yesterday with disk space)
 
(299 intermediate revisions by 9 users not shown)
Line 1:
== July 19 ==
* '''Addshore''' Fixed problems with Disk space --[[User:Reception123|Reception123]] ([[User talk:Reception123|talk]]) 06:36, 20 July 2015 (BST)
 
 
== July 4 ==
* I fixed everything (see git history) .... '''[[User:Addshore|<span style="color:black">·addshore·</span>]]''' <sup>[[User_talk:Addshore|<span style="color:black;">talk to me!</span>]]</sup> 09:28, 4 July 2015 (BST)
 
== June 30 ==
* ~09:10 Southparkfan: pooled prod9 back in prod with below changes applied. Ansible on both servers disabled. DO NOT run ansible on those servers unless you are 100% sure it won't cause issues.
* 08:54 Southparkfan: disable CSS, OnlineStatus and EmbedVideo on All The Tropes Wiki. Meta (why Meta too?) and All The Tropes are now back online and running without throwing MWExceptions.
 
== June 29 ==
* 14:48 Southparkfan: shutdown & destroy prod11
* 12:21 NDKilla: Not experiencing issues on any wiki's that reported issues. extloadtest still shows frequent errors
* 11:20 NDKilla: Rebuild LC on extloadwiki per SPF
* 10:48 NDKilla: Ran all jobs on metawiki and allthetropeswiki
* 10:38 NDKilla: Investigating DB (and hoping I didn't cause them)
* 02:02 GethN7 notifies #orain of a lot of DB issues on allthetropeswiki
 
== June 28 ==
* Late afternoon: Manually ran "sudo /root/ans-all --skip-tags=slow" on prod 8.9, and 11
 
== June 16 ==
* 17:16 Southparkfan: "sudo usermod -u 2020 www-scripts" on prod9 and prod11
 
== June 13 ==
* 11:46 Southparkfan: destroyed prod8 for testing
 
== May 14 ==
* 20:49 Southparkfan: DROP DATABASE spamwiki; on prod12 - massive disk space free up :D
* 20:49 Southparkfan: ran php5 /srv/mediawiki/w/maintenance/Orain/removeDeletedWikis.php --wiki loginwiki on prod9
 
== April 28 ==
* 13:09 Southparkfan: pooled prod11 back
* 13:02 Southparkfan: reboot prod11
* 12:51 Southparkfan: depooled prod11 from haproxy
 
== April 4 ==
* ..... Stuff happened, [[Tech:Incidents/2015-04-04-prod7-resize]]
* 20:10 Addshore: Restart prod7
* 19:43 Addshore: prod9 back up and resized
* 19:41 Addshore: resize prod9 to 512mb instance and restart
* 19:36 Southparkfan: removed prod9 from haproxy config (planned for downgrade/re-install as needed)
* 14:51 Addshore: Login issues, Redis down, Restarted (We should really have a watchdog or something check and restart this)
 
== April 3 ==
* 21:24 Addshore: "pear install net_smtp" on prod9
* 20:35 Addshore: added prod9 back to LB, Cheers SPF!
* 20:24 Addshore: added prod9 back to LB -> it broke stuff -> promptly removed
* 20:20 Addshore: restarted redis-server on prod7 (yes everyone got logged out...)
* 20:16 Addshore: removed prod8 from LB for reboot then added back
* 20:10 Addshore: removed prod11 from LB for reboot then added back
* 19:00 Addshore: removed prod9 from LB and rebuilt (SPFCloud to add everything to prod9 and add back to LB)
 
== April 1 ==
* 13:15 Addshore: got reports users were unable to login. Redis was no longer running on prod7, restarted.
 
== March 26 ==
* 19:53 Southparkfan: noticed great things on prod9 :D
* 19:34 addshore: resize complete, powering back prod9
* 19:27 addshore: shutdown prod9 for resize
 
== March 17 ==
* 15:00 Southparkfan: ran update.php on memewiki
 
== March 16 ==
* 13:09 Southparkfan: HHVM died on prod8 for an unknown reason, causing downtime on the farm - restarted it
 
== March 14 ==
* 15:42 Southparkfan: ran update.php on lovelifesiftwwiki
 
== March 10 ==
* 17:23 Southparkfan: restart HHVM on all servers for HHVM admin password reset
 
== March 6 ==
* 23:47 Southparkfan: enable ansible on prod7
* 20:59 Southparkfan: disable ansible on prod7
* 20:48 Southparkfan: restart ssh on prod7
 
== March 5 ==
* 16:24 Southparkfan: kill'd & restarted HHVM on prod9 and prod11 too. Let's see if performance is improved now.
* 16:21 Southparkfan: disable ansible cron on prod9 and prod11
* 16:06 Southparkfan: start HHVM on prod8
* 16:06 Southparkfan: kill HHVM on prod8
* 15:48 Southparkfan: disable ansible cron on prod8 for HHVM testing
 
== March 4 ==
* 17:53 Southparkfan: (prod7) sudo cp -R /tmp/lovelivesiftw.twgg.org/ /var/mediawiki/uploads/ - all should be fixed now
* 17:52 Southparkfan: (prod7) sudo rm -rf lovelivesiftw.twgg.org lovelivesiftw.orain.org/
* 17:43 Southparkfan: possibly messed up the below commands, so restored directories from backup and tried again
* 17:32 Southparkfan: (prod7) sudo rm -rf http:/lovelivesiftw.twgg.org <- "lovelivesiftw.twgg.org" was a directory inside another directory, "http:"
* 17:30 Southparkfan: (prod7) sudo cp -R lovelivesiftw.twgg.org/ /var/mediawiki/uploads/lovelivesiftw.twgg.org
 
== March 2 ==
* 14:04 Southparkfan: after a bunch of Piwik issues complaining about "mysqli extension (could) not (be) loaded/found" and a thousand restarts, fixed php5-fpm issues
* 13:42 Southparkfan: stop php5-fpm on prod6 (kill -9'd all processes)
 
== February 28 ==
* 13:05 Southparkfan: reload nagios on prod6
 
== February 27 ==
* 23:14 Southparkfan: deleted all jobs from allthetropeswiki's job table
* 23:05 Southparkfan: "delete from job where job_cmd = 'cirrusSearchLinksUpdate';" on prod5
* 22:40 Southparkfan: enable cron again
* 22:29 Southparkfan: disable ansible cron on prod6
* 14:23 Southparkfan: restarted redis
 
== February 25 ==
* 19:43 Southparkfan: ran some apt-get commands on prod7 to get some more disk space
* 19:28 Southparkfan: for now, I did it for trialsintaintedspacewiki, spiralwiki and metawiki too on prod9. The servers should be able to survive a few days/weeks now with the broken logrotate.
* 19:21 Southparkfan: cleaned some diskspace (5.5GB) by compressing files of some wikis on prod11 too. Wikis: (incomplete list) trialsintaintedspacewiki, rightwiki, corruptionsofchampionswiki, loginwiki, metawiki, allthetropeswiki
* 19:06 Southparkfan: below has been done for loginwiki and allthetropeswiki too on prod9. Looks all good, moving on to prod11 for now
* 19:00 Southparkfan: compressed corruptionsofchampions.orain.org.log manually (to corruptionsofchampionswiki.gz), and then deleted the .log file
 
== February 24 ==
* 20:30 Southparkfan: enable ansible cron again on prod6. php5-fpm will now replace mod_php forever, and this will make Piwik twice as fast!
* 20:07 Southparkfan: restarted php5-fpm and apache2 on prod6. Apache will now serve stuff via php5-fpm!
* 20:04 Southparkfan: disable ansible on prod6
* 11:38 Southparkfan: enable ansible again for now. Will test apache stuff at another moment.
* 11:34 Southparkfan: disable ansible temporarily on prod6 (apache testing)
* 11:29 Southparkfan: install python-mwclient.deb (and python-support) on prod6 for orainLog
 
== February 23 ==
* 23:38 Southparkfan: completed a full security upgrade of all packages on all servers (duration: more than one hour)
* 16:43 Southparkfan: killed all LC rebuild processes on prod9 (but at least prod9 is up again!)
* 16:40 Dusti: reboot prod9
* 12:59 Southparkfan: stop apache2 on prod12
 
== February 22 ==
* 12:48 Southparkfan: below on prod11 too
* 12:46 Southparkfan: cd /var/log/mediawiki/ && sudo rm -f spam.orain.org* on prod9
 
== February 21 ==
* 16:18 Southparkfan: (prod10) sudo service cron restart
* 16:17 Southparkfan: change root user password on prod10 again
* 15:33 Southparkfan: change password of root user on prod10
* 15:12 Southparkfan: sudo service cron restart on prod10
* 15:12 Southparkfan: sudo service cron start on prod10
 
== February 19 ==
* 14:12 Southparkfan: (prod8, prod9, prod11) cd /var/log/mediawiki/ && sudo rm -f spam.orain.org*
 
== February 16 ==
* 06:39 Southparkfan: restart HHVM on prod11
 
== February 15 ==
* 16:45 Southparkfan: ran below again
* 16:38 Southparkfan: (prod9, prod11 - /var/log/mediawiki/) sudo rm -f *1.gz - logrotate is having trouble with these files ending with "1.gz" (for an unknown reason these files are not compressed log files, just empty files)
* 09:28 Southparkfan: re-installed php5-gd package on prod6
* 09:06 Southparkfan: restart HHVM on prod9 (it died for an still unknown reason)
 
== February 14 ==
* 19:09 Southparkfan: (prod8, prod9, prod11) cd /var/log/mediawiki/ && sudo rm -f spam.orain.org*
* 15:09 Southparkfan: remove unnecessary packages on prod9, cleans up another 300MB.
* 14:50 Southparkfan: forced a logrotate run on prod9 again (all logs)
* 14:46 Southparkfan: forced a logrotate run on prod9 (mediawiki logs only)
 
== February 13 ==
* 14:34 Southparkfan: forced a logrotate run on prod11 too per the same reason as below. It seems it partially failed too, but still freed up ~100MB disk space.
* 14:14 Southparkfan: forced a logrotate run on prod9 due to a critical amount of disk space left (<150 MB). It seems it partially failed, but it at least freed up something like 400MB disk space or so. Finding out now how to make the run succeed and compress even more log files.
 
== February 11 ==
* 14:22 Addshore: fixed ansible run on prod7 due to conflict of user ids with 'git' user id 2003: Ran the following:
<pre>
usermod -u 2103 git
groupmod -g 2103 git
find / -user 2003 -exec chown -h 2103 {} \;
find / -group 2005 -exec chgrp -h 2103 {} \;
usermod -g 2103 git
</pre>
 
* 14:15 Addshore: remove and re clone private repo on prod12 (fixes ansible run)
 
== February 9 ==
* 12:30 Addshore: reloading haproxy on prod10
 
== February 8 ==
* 13:57 Southparkfan: ran changePassword.php on techwiki for OrainLog
 
== February 7 ==
* 15:15 Southparkfan: deleted "zacharydubois" on prod6 per request
Dusti upgraded GitHub to the Silver plan which includes private repos. SPF working on moving prod7 to a private git repo.
 
== February 6 ==
* 17:01 Southparkfan: upgraded packages with security fixes
 
== February 5 ==
Addshore: on prod6! @ 5:00 GMT / 00:30hrs EST
 
Killed all processes for users noreply and jasper and altered users Ids to fix ansible run. Ran the following:
 
<pre>
usermod -u 2101 noreply
groupmod -g 2101 noreply
find / -user 2006 -exec chown -h 2101 {} \;
find / -group 2008 -exec chgrp -h 2101 {} \;
usermod -g 2101 noreply
</pre>
<pre>
usermod -u 2102 jasper
groupmod -g 2102 jasper
find / -user 2007 -exec chown -h 2102 {} \;
find / -group 2009 -exec chgrp -h 2102 {} \;
usermod -g 2102 jasper
</pre>
 
== January 25 ==
* 17:14 Southparkfan: upgraded packages with security fixes (again)
 
== January 23 ==
* 22:30 Tanner: Migrated DNS to CloudFlare for stability.
* 18:49 Southparkfan: installed security updates across the servers
 
== January 22 ==
* 23:39 Addshore - manually add technoratimedia_sv_115e9.txt file to the root mediawiki directory for Dusti, No point in this being in ansible, it can be removed / vanish whenever...
 
== January 21 ==
* 15:00 Addshore - Killed udp.py script on prod6 that was point at JDnet
** Manually copied script to /home/addshore/udp.py for testing (Not in ansible....) - seems to work fine and will run in screen
 
== January 20 ==
* Sometime - Addshore: Manually patched rebuildtextindex in a secret place and ran accross ALL wikis in a Screen on some prod. Run successful and all indexes rebuilt.
* 14:10 Southparkfan: ran rebuildtextindex.php on metawiki again (with php instead of php5)
* 13:45 Southparkfan: ran rebuildtextindex.php on metawiki
 
== January 17 ==
* 17:47 Southparkfan: changed Southparkfan2's password with changePassword.php (my account of which I forgot the password, and no email was set on the account).
* 00:25 JohnLewis: prod9 has been running at 100% CPU since December 8th. Missing from ganglia. Hard reboot and investigating.
 
== January 9 ==
* 18:42 JohnLewis: update.php on donjonwiki for BF tables
* 18:41 Southparkfan: ran update.php again on donjonwiki to fix dberrors (run conflict with John but k)
* 16:16 Southparkfan: ran update.php on donjonwiki
 
== January 8 ==
* 16:30 Southparkfan: ran importImages.php again on donjonwiki (a few files had bad filenames, and now still a few have....)
* 15:52 Southparkfan: ran importImages.php on donjonwiki
 
== January 4 ==
* 18:24 Southparkfan: ran importDump.php on donjonwiki
 
== December 30 ==
* 15:17 Southparkfan: ran importImages.php on robloxclanswiki
 
== December 29 ==
* 21:13 JohnLewis: delete councilwiki
 
== December 22 ==
* 22:21 JohnLewis: destoy prod3
* 22:19 JohnLewis: push prod3 decom changes and pool prod12 in its place
* 17:32 JohnLewis: deleted 5 wikis form prod3 and cleared respective tables in CA and loginwiki.
* 09:05 JohnLewis: boot prod3 after uninitiated power down. Investigating.
 
== December 19 ==
* 22:20 JohnLewis: password have been migrated
* 21:05 JohnLewis: begin password type migration (pbkdf2-legacyB)
* 20:58 JohnLewis: prod8 and prod11 are now running MW1.24. prod9 is still depooled pending finalising the update. Passwords needs to be wrapped (will do shortly)
 
== December 5 ==
* 21:10 JohnLewis: MariaDB [(none)]> drop database dalieuwiki;
 
== November 20 ==
* 15:30 Arcane: ran a database/ansible update.
 
== November 14 ==
* 15:43 JohnLewis: MariaDB [(none)]> drop database Powersystemswiki;
 
== October 29 ==
* 20:13 JohnLewis: update hhvm
 
== October 27 ==
* 16:52 JohnLewis: drop database esourcewnywiki; (per technical reasons)
 
== October 15 ==
* 20:02 JohnLewis: shutdown prod4 (planned for reinstall tomorrow)
* 19:50 JohnLewis: remove prod4 from lb and purge DNS on ns1 and ns2
* 17:27 JohnLewis: powercycle prod4 (was not responding to anything)
 
== October 11 ==
* 19:47 JohnLewis: switch ns2.orain.org to prod7 (dns cache)
* 18:46 JohnLewis: deleted wikis in the list [[m:Special:Diff/10079|here]] from prod3.
* 18:43 JohnLewis: php5 /srv/mediawiki/w/maintenance/Orain/removeDeletedWikis.php --wiki loginwiki
* 18:00 JohnLewis: security updates for MediaWiki
* 16:40 JohnLewis: DNS change confirmed to have propagated to myself
* ~14:00 JohnLewis: change orain.org's DNS to ns1.orain.org/ns2.orain.org from pam.ns.cloudflare.com/woz.ns.cloudflare.com
 
== October 5 ==
* 18:10 JohnLewis: php5 /srv/mediawiki/w/maintenance/importDump.php --wiki classwiki /home/johnflewis/backup.xml
 
== October 3rd ==
* 18:22 JohnLewis: applied security fixes; email going to sysadmin shortly regarding this.
 
== September 27 ==
* 17:48 JohnLewis: prod6 seems good. Moving onto prod7
* 16:14 JohnLewis: lb.orain.org changed to prod8 (HHVM) for migration over to HHVM and Ubuntu
 
== September 19 ==
* 19:46 JohnLewis: changed prod8's kernel to 3.13.0-32-generic (from 3.13.0-35-generic)
* 17:36 JohnLewis: prod8 is now our first Ubuntu machine
* 17:32 JohnLewis: begin upgrading prod8 to Ubuntu 14.04 (Trusty)
* 17:30 JohnLewis: shutdown prod8
 
== September 12 ==
* 16:37 JohnLewis: DELETE from page WHERE page_id = "409341"; (prod5; allthetropeswiki)
 
== September 10 ==
* 16:28 JohnLewis: root@prod7:/var/mediawiki/uploads/common/skins/foreground/assets# mv font/ fonts/
* 16:09 JohnLewis: usermod -u 1003 www-scripts on prod8
* 15:28 JohnLewis: removed the below
* 15:22 JohnLewis: added 'www-scripts ALL=(ALL) NOPASSWD:ALL' to /etc/sudoers
* 15:21 JohnLewis: chmod 0777 /var/mediawiki/private /var/mediawiki/uploads
 
== September 9 ==
* 15:10 JohnLewis: ufw delete allow 5070 on prod6
 
== September 7 ==
* 11:05 php5-redis wanted to be installed so "dpkg -i /root/debs/php5-redis.deb". This is why our repo for debs / deb locations should be put in dpkg properly
* 11:03 Addshore - ran dpkg --configure -a on prod4 to try to get ansible working
 
== August 27 ==
* Migration done? - Been done for a few days now.
 
== August 20 ==
* 14:01 JohnLewis: disable ansible on all servers
* 14:00 JohnLewis: Migration start!
 
== August 17 ==
* 18:08 JohnLewis: php createLocalAccount.php --wiki airwiki --username Dxing97
* 18:07 JohnLewis: php migrateAccount.php --wiki loginwiki --username Dxing97
 
 
== August 16 ==
* 13:30 JohnLewis: use allthetropeswiki; DELETE from watchlist WHERE wl_user = '21'; SELECT * from watchlist WHERE wl_user = '21';
 
== August 2 ==
* 00:32 JohnLewis: MariaDB [(none)]> drop database aerowikiwiki;
 
== July 29 ==
* 17:20 JohnLewis: /usr/lib/mailman/bin/withlist -l -r fix_url mailman,allthetropes,meta-admin --urlhost=lists.orain.org
 
== July 28 ==
* 01:11 JohnLewis: (prod4 and prod5) ip6tables -I INPUT 1 -p tcp --dport 443 -j ACCEPT
* 01:10 JohnLewis: (prod4 and prod5) ip6tables -I INPUT 1 -p tcp --dport 80 -j ACCEPT
 
== July 26 ==
* 13:26 JohnLewis: chown -R www-data:www-data private/mediawiki/
* 13:24 JohnLewis: mv static.orain.org/jasperinternal.orain.org/* private/mediawiki/jasperinternal.orain.org/
* 13:20 JohnLewis: mkdir /usr/share/nginx/private/mediawiki/jasperinternal.orain.org/
 
== July 23 ==
* 22:53 JohnLewis: reboot prod3 - MySQL health and load is fluctuating massively
 
== July 22 ==
* 19:30 JohnLewis: gave SELECT to archive, revision, user and recentchanges tables on all PUBLIC WKIS for user 'useranalysis' on prod3. Account used by Cyberpower678 for his useranalysis tool. +1 for Orain-Community relations!
 
== July 21 ==
* 15:42 JohnLewis: restart memcached (causing MediaWiki exceptions)
 
== July 16 ==
* 20:00 JohnLewis: install [[prod5]]'s basic requirements.
* 18:55 JohnLewis: reboot prod4; irregular issues occuring
* 16:05 JohnLewis: revert DNS back after fixing necessary issues relating to DNS
* 15:20 JohnLewis: changed DNS for Orain directly to prod4 - broke all non-prod4 services
 
== July 14 ==
* 21:09 JohnLewis: new SSL cert is installed and confirmed to be functioning correctly
 
== July 13 ==
* 21:38 Addshore: Everything back up
 
== July 8 ==
* 14:43 JohnLewis: mysql -p -e "drop database Techwritewiki"
 
== July 7 ==
* 14:09 JohnLewis: speed seems to have improved. Need to further monitor the SQL downtimes however.
* 14:08 JohnLewis: reboot prod3
* 14:00 JohnLewis: prod3 is not responding to shell commands; matches downtimes with the farm
 
== July 5 ==
* 17:00 JohnLewis: php createLocalAccount.php --wiki detectiveconanwiki --username KidProdigy
* 16:58 JohnLewis: php migrateAccount.php --wiki metawiki --auto --homewiki metawiki --username KidProdigy
* 13:45 JohnLewis: php maintenance/runJobs.php --wiki allthetropeswiki
* 11:51 JohnLewis: mysql -p -e "drop database Revitestwiki"
 
== June 28 ==
* 09:56 JohnLewis: confirmed security fix
* 09:55 JohnLewis: reboot prod4 to force restart
* 09:54 JohnLewis: manually patch OpenSSL to the latest release to fix a security issue (again)
 
== June 20 ==
* 22:17 JohnLewis updated php5-fpm
* 22:10 JohnLewis: updated OpenSSL to a security fix release
* 00:10 Addshore: Started a run of rebuildFileCache.php for ATT wiki in a SCREEN on prod4 to fix pages once CSS extension has been re enabled
 
== June 19 ==
* 23:45 Addshore: Reenabled ansible on prod4
* 23:30 Addshore: Ran update.php on all wikis
 
== June 18 ==
* 22:24 Addshore: Ran update.php on all wikis
 
== June 15 ==
* 19:41 JohnLewis: mkdir OrainHacks; add a basic extension file and a .magic. file with LQT magicwords in. php rebuildLocalisationCache.php --force --wiki extloadwiki. Happy days! Now need to do it for the other 10 extensions disabled.
 
== June 14 ==
* 19:45 Addshore: metawiki up, running the get db list script
* 19:44 Addshore: DBlist is corrupt, replacing with "metawiki|meta|"
* 19:35 Addshore: Removed Popups extension from mediawiki and reenabled ansible cron
* 19:24 Addshore: all sites getting DB errors
 
== June 11 ==
* 16:48 JohnLewis: disabled ansible to prevent ansible running while I do stuff (staggered committing)
 
== June 10 ==
* 11:10 JohnLewis: added new .log files and rearranged the logging structure
 
== June 9 ==
* 16:21 JohnLewis: upgrade spamassassin on prod1
* 16:03 JohnLewis: php update.php --wiki jossewiki --quick
 
== June 8 ==
* 0:00 JohnLewis: php deleteArchivedRevisions.php --wiki allthetropeswiki --delete
 
== June 7 ==
* 21:36 JohnLewis: re-enable ansible
* 21:30 JohnLewis: ran update.php on all wikis for MW 1.23 update
* 19:35 JohnLewis: disabled ansible ('''for safety''')
* 16:08 JohnLewis: restarted memcached to clean up stuff
* 16:00 JohnLewis: renamed 'spacetimewiki' database to 'timespacewiki'
 
== June 3 ==
* 17:57 JohnLewis: purge torblock's node index
 
== June 1 ==
* 15:34 JohnLewis: force password reset for "Stef99"
* 15:33 JohnLewis: restart memcached
 
== May 31 ==
* 12:46 addshore: ran update.php on ALL wikis
* 12:43 addshore: updating to MW 1.22.7
 
== May 30 ==
* 18:50 JohnLewis: remove 'notice' for CreateWiki on GitHub
* 12:30 JohnLewis: ran ansible on prod4 to catch new nginx rules
* 12:29 JohnLewis: change ufw rules on prod1 for mail
* 01:02 JohnLewis: ufw allow 9300 and ufw allow 9200
* 01:02 JohnLewis: playing tennis for elasticsearch on prod1. restarting it a bit.
* 00:50 JohnLewis: remove elasticsearch from prod1
* 00:25 JohnLewis: massive reduce in disk space :D
* 00:24 addshore: on prod4 rm /root/old
 
== May 29 ==
* 22:53 JohnLewis: ran ansible on prod1; needed to get the port rule in
* 22:17 JohnLewis: del
* 22:17 JohnLewis: restarted nginx
* 22:06 addshore: rebooting prod1
* 18:44 JohnLewis: restarted nagios3 on prod1
* 18:30 addshore: ansible successfully runs on prod1 now, adding to cron
* 18:25 JohnLewis: prod1: nagios3 -v *
* 18:02 addshore: update ansible to 1.6.2 on prod1
* 18:01 addshore: update ansible to 1.6.2 on prod3
* 18:01 JohnLewis: Removed 'notice' from OrainMessages calls from GitHub
* 17:58 addshore: update ansible to 1.6.2 on prod4
* 17:57 addshore: orainLog back up...
* 13:00 - 17:00 - Addshore - Poking prod4 and ansible. Prod4 now again has ansible on a cronjob. There were multiple shot downtimes during this time due to the poking of ufw (the firewall), but this was for the greater good!!!
 
== May 24 ==
* 12:51 - JohnLewis - service php5-fpm restart
* 12:46 - pingdom reports site down
* 12:06 - JohnLewis - rename verkeerswiki to verkeerwiki. A bunch of SQL stuff.
 
== May 23 ==
* 20:06 - JohnLewis - php createLocalAccount.php --wiki=espiralarchivowiki John
 
== May 14 ==
* 16:07 - JohnLewis - php createLocalAccount.php --wiki=onepiecewiki Bocaniko
* 16:01 - JohnLewis - clear apc cache
 
== May 12 ==
* Recent downtime was caused by prod4 being suspended by the host, this is resolved.
 
== May 11 ==
* 19:00 - JohnLewis: php reassignEdits.php --wiki allthetropeswiki 300154507a A300154507
 
== May 09 ==
* 13:15 - addshore: added values for duplicity and AWS to prod3 vars
* 13:15 - addshore: added AWS_BACKUPS_ACCESS_KEY_ID to prod3 vars.yml
 
== April 15 ==
* prod2 died, migrated to a new user and the set up was pretty much so hacky nothing worked really. Kudu knows more about that than me.
* A key server file became corrupted and the server crashed. That account s for around 24 hours, then we moved to a new server and had to deal with a hacky set up which accounts for the other 40 ish hours downtime.
* This is kinda bad to say this downtime happened while we were still looking at the old downtime.. so :/
* At least we know *why* this one occured.
 
== April 14 ==
* 16:37 Addshore: prod4 - Killed db loop scripts running i18n cache updates
* 16:42 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (this is all that is ever needed as extload has everything loaded and i18n cache is shared)
* 16:45 JohnLewis: Reboot prod4
* 16:50 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (in a SCREEN)
* 16:51 JohnLewis: root@prod4:/# /etc/init.d/apache2 stop
* 16:51 JohnLewis: root@prod4:/# /etc/init.d/nginx start
* 17:17 Addshore: Remove JohnLewis IP from deny hosts file for sshd again on prod3
 
== April 9 ==
* 22:30 JohnLewis: re enabled ansible cron
* 16:23 JohnLewis: disabled ansible cron (doing live work on prod2 for ATTwiki). I'll post a note when I'm done.
 
== April 6 ==
* 13:16 JohnLewis: run update.php on dangsunsnwiki and cheer
* 13:08 JohnLewis: eval.php some more stuff into my dangsunsnwiki account...
* 13:06 JohnLewis: eval.php an email into my dangsunsnwiki account
* 13:03 JohnLewis: get annoyed about things
* 12:57 JohnLewis: rename buswiki -> dangsunsnwiki
 
== April 5 ==
* 9:10 JohnLewis: prod2 nginx killed and restarted, i18n cache reloaded
 
== April 4 ==
* 13:00 pingdom reports orain down
 
== April 3 ==
* 17:10 JohnLewis: dropped centralnoticetestwiki database as all worked - not needed now
* 17:06 JohnLewis: manually ran ansible
* 16:41 JohnLewis: ran update.php on all wikis
 
== April 1 ==
* 19:14 JohnLewis: restarted nginx (not an April fools)
 
== March 30 ==
* 16:41 JohnLewis: manually ran ansible because Joe is right about my stupidity sometimes
* 16:19 JohnLewis: drop temp database (used to fix some issues with importing)
* 16:18 JohnLewis: run update.php on archivoespiral and metawiki
* 16:15 JohnLewis: do a bunch of SQL stuff on prod3 to get archivoespiral working
 
== March 29 ==
* 19:46 Addshore: Remove JL IP from from prod2 deny hosts file
 
== March 28 ==
* 21:30 Addshore: Remove JL IP from from prod2 deny hosts file
 
== March 18 ==
* 20:27 JohnLewis: re enabled ansible cron on prod2
* 19:47 JohnLewis: disabled ansible cron on prod2
 
== March 15 ==
* 22:47 kudu: Run fixDoubleRedirects.php on ATT
 
== March 12 ==
* 17:35 JohnLewis: populated interwiki table on some databases
 
== March 8 ==
* 14:32 JohnLewis: changed some centralauth database entries to suit wiki move
 
== March 7 ==
* 22:40 JohnLewis: dumped allthetropeswiki for Arcane
* 16:53 JohnLewis: manually updated ansible
* 16:51 JohnLewis: renamed database trainwiki to reviwiki
 
== March 4 ==
* 20:46 JohnLewis: ran CentralAuth's createLocalAccount.php for myself on a few wikis to fix things
 
== March 2 ==
* 01:25 JohnLewis: ran update.php on all wikis
* 00:54 JohnLewis: manually ran ansible again
* 00:47 JohnLewis: manually updated ansible (debugging - yay)
 
== March 1 ==
* 23:58 JohnLewis: manually ran ansible and update.php on jdwiki
 
== February 26 ==
* 21:53 JohnLewis: ditto on metawiki
* 21:52 JohnLewis: ran update on jh67wiki
 
== February 25 ==
* 03:32 kudu: Ran fixDoubleRedirects.php on ATT
 
== February 21 ==
* 23:54 addshore: ran update.php for pmr2014wiki
* 23:48 addshore: prod2 uninstalled dvipng texlive-latex-base etc. cjk-latex
* 23:30 addshore: .... all of which we have and work.... GAH!
* 23:30 addshore: for the record it is stuck on.. Failed to parse(PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert))
* 23:29 addshore: apt-get installed dvipng texlive-latex-base texlive-latex-extra tex-live-recommended cjk-latex while trying to fix Math, no success
* 22:36 addshore: chmod and chown Math extension .. we should have this all pulled as www-data
* 22:22 addshore: prod2 ran make in /usr/share/nginx/.orain.org/w/extensions/Math/math
* 22:07 addshore: ran update.php on extload
* 21:56 addshore: reenable prod2 ansible cron
* 21:14 addshore: disabling ansible on prod2
* 20:14 addshore: ran i18n cache rebuild
* 20:14 JohnFLewis: rebooted prod2
* 15:20 JohnLewis: manually update ansible
* 10:34 addshore: ran update.php on all wikis
 
== February 20 ==
* 21:57 JohnLewis: ran update.php on jdwiki
* 21:11 addshore: reenabling ansible pull cron on prod2 after resolving issue 173
* 20:06 addshore: comment out ansible pull from prod2 cron while I manually poke collection extension
* 16:25 JohnLewis: Info: Mail is fully working with a final dovecot restart!
* 16:20 JohnLewis: changed dovecot config and the restarted x3 (issues first two times)
* 15:32 JohnLewis: restarted dovecot on prod1
 
== February 19 ==
* 22:40 addshore: prod2 on mediawiki submodules ran git submodle foreach --recursive git config core.fileMode false - this also solves the dirsty Elastica folder
* 22:33 addshore: prod2 on mediawiki submodules ran git submodle foreach git config core.fileMode false
* 22:26 addshore: prod2 chown www-data:www-data /w/extensions/*
* 22:19 addshore: simplified the two ansible cronjobs on prod2
* 22:13 addshore: rm /root/ans on prod2, this files is wrong!
* 00:30 kudu: Ran rebuildtextindex.php on all wikis
 
== February 13 ==
* 16:23 addshore: i18n cache broke, tried rebuilding off extload but the script wouldn't run, ran off metawiki first then off extloadwiki and everything returned to normal. The question remains why did the cache break in the first place and why could we not rebuild from extload wiki in the first place?