Tech:Server admin log

From Orain Meta
Jump to navigation Jump to search

July 19

  • Addshore Fixed problems with Disk space --Reception123 (talk) 06:36, 20 July 2015 (BST)


July 4

June 30

  • ~09:10 Southparkfan: pooled prod9 back in prod with below changes applied. Ansible on both servers disabled. DO NOT run ansible on those servers unless you are 100% sure it won't cause issues.
  • 08:54 Southparkfan: disable CSS, OnlineStatus and EmbedVideo on All The Tropes Wiki. Meta (why Meta too?) and All The Tropes are now back online and running without throwing MWExceptions.

June 29

  • 14:48 Southparkfan: shutdown & destroy prod11
  • 12:21 NDKilla: Not experiencing issues on any wiki's that reported issues. extloadtest still shows frequent errors
  • 11:20 NDKilla: Rebuild LC on extloadwiki per SPF
  • 10:48 NDKilla: Ran all jobs on metawiki and allthetropeswiki
  • 10:38 NDKilla: Investigating DB (and hoping I didn't cause them)
  • 02:02 GethN7 notifies #orain of a lot of DB issues on allthetropeswiki

June 28

  • Late afternoon: Manually ran "sudo /root/ans-all --skip-tags=slow" on prod 8.9, and 11

June 16

  • 17:16 Southparkfan: "sudo usermod -u 2020 www-scripts" on prod9 and prod11

June 13

  • 11:46 Southparkfan: destroyed prod8 for testing

May 14

  • 20:49 Southparkfan: DROP DATABASE spamwiki; on prod12 - massive disk space free up :D
  • 20:49 Southparkfan: ran php5 /srv/mediawiki/w/maintenance/Orain/removeDeletedWikis.php --wiki loginwiki on prod9

April 28

  • 13:09 Southparkfan: pooled prod11 back
  • 13:02 Southparkfan: reboot prod11
  • 12:51 Southparkfan: depooled prod11 from haproxy

April 4

  • ..... Stuff happened, Tech:Incidents/2015-04-04-prod7-resize
  • 20:10 Addshore: Restart prod7
  • 19:43 Addshore: prod9 back up and resized
  • 19:41 Addshore: resize prod9 to 512mb instance and restart
  • 19:36 Southparkfan: removed prod9 from haproxy config (planned for downgrade/re-install as needed)
  • 14:51 Addshore: Login issues, Redis down, Restarted (We should really have a watchdog or something check and restart this)

April 3

  • 21:24 Addshore: "pear install net_smtp" on prod9
  • 20:35 Addshore: added prod9 back to LB, Cheers SPF!
  • 20:24 Addshore: added prod9 back to LB -> it broke stuff -> promptly removed
  • 20:20 Addshore: restarted redis-server on prod7 (yes everyone got logged out...)
  • 20:16 Addshore: removed prod8 from LB for reboot then added back
  • 20:10 Addshore: removed prod11 from LB for reboot then added back
  • 19:00 Addshore: removed prod9 from LB and rebuilt (SPFCloud to add everything to prod9 and add back to LB)

April 1

  • 13:15 Addshore: got reports users were unable to login. Redis was no longer running on prod7, restarted.

March 26

  • 19:53 Southparkfan: noticed great things on prod9 :D
  • 19:34 addshore: resize complete, powering back prod9
  • 19:27 addshore: shutdown prod9 for resize

March 17

  • 15:00 Southparkfan: ran update.php on memewiki

March 16

  • 13:09 Southparkfan: HHVM died on prod8 for an unknown reason, causing downtime on the farm - restarted it

March 14

  • 15:42 Southparkfan: ran update.php on lovelifesiftwwiki

March 10

  • 17:23 Southparkfan: restart HHVM on all servers for HHVM admin password reset

March 6

  • 23:47 Southparkfan: enable ansible on prod7
  • 20:59 Southparkfan: disable ansible on prod7
  • 20:48 Southparkfan: restart ssh on prod7

March 5

  • 16:24 Southparkfan: kill'd & restarted HHVM on prod9 and prod11 too. Let's see if performance is improved now.
  • 16:21 Southparkfan: disable ansible cron on prod9 and prod11
  • 16:06 Southparkfan: start HHVM on prod8
  • 16:06 Southparkfan: kill HHVM on prod8
  • 15:48 Southparkfan: disable ansible cron on prod8 for HHVM testing

March 4

  • 17:53 Southparkfan: (prod7) sudo cp -R /tmp/lovelivesiftw.twgg.org/ /var/mediawiki/uploads/ - all should be fixed now
  • 17:52 Southparkfan: (prod7) sudo rm -rf lovelivesiftw.twgg.org lovelivesiftw.orain.org/
  • 17:43 Southparkfan: possibly messed up the below commands, so restored directories from backup and tried again
  • 17:32 Southparkfan: (prod7) sudo rm -rf http:/lovelivesiftw.twgg.org <- "lovelivesiftw.twgg.org" was a directory inside another directory, "http:"
  • 17:30 Southparkfan: (prod7) sudo cp -R lovelivesiftw.twgg.org/ /var/mediawiki/uploads/lovelivesiftw.twgg.org

March 2

  • 14:04 Southparkfan: after a bunch of Piwik issues complaining about "mysqli extension (could) not (be) loaded/found" and a thousand restarts, fixed php5-fpm issues
  • 13:42 Southparkfan: stop php5-fpm on prod6 (kill -9'd all processes)

February 28

  • 13:05 Southparkfan: reload nagios on prod6

February 27

  • 23:14 Southparkfan: deleted all jobs from allthetropeswiki's job table
  • 23:05 Southparkfan: "delete from job where job_cmd = 'cirrusSearchLinksUpdate';" on prod5
  • 22:40 Southparkfan: enable cron again
  • 22:29 Southparkfan: disable ansible cron on prod6
  • 14:23 Southparkfan: restarted redis

February 25

  • 19:43 Southparkfan: ran some apt-get commands on prod7 to get some more disk space
  • 19:28 Southparkfan: for now, I did it for trialsintaintedspacewiki, spiralwiki and metawiki too on prod9. The servers should be able to survive a few days/weeks now with the broken logrotate.
  • 19:21 Southparkfan: cleaned some diskspace (5.5GB) by compressing files of some wikis on prod11 too. Wikis: (incomplete list) trialsintaintedspacewiki, rightwiki, corruptionsofchampionswiki, loginwiki, metawiki, allthetropeswiki
  • 19:06 Southparkfan: below has been done for loginwiki and allthetropeswiki too on prod9. Looks all good, moving on to prod11 for now
  • 19:00 Southparkfan: compressed corruptionsofchampions.orain.org.log manually (to corruptionsofchampionswiki.gz), and then deleted the .log file

February 24

  • 20:30 Southparkfan: enable ansible cron again on prod6. php5-fpm will now replace mod_php forever, and this will make Piwik twice as fast!
  • 20:07 Southparkfan: restarted php5-fpm and apache2 on prod6. Apache will now serve stuff via php5-fpm!
  • 20:04 Southparkfan: disable ansible on prod6
  • 11:38 Southparkfan: enable ansible again for now. Will test apache stuff at another moment.
  • 11:34 Southparkfan: disable ansible temporarily on prod6 (apache testing)
  • 11:29 Southparkfan: install python-mwclient.deb (and python-support) on prod6 for orainLog

February 23

  • 23:38 Southparkfan: completed a full security upgrade of all packages on all servers (duration: more than one hour)
  • 16:43 Southparkfan: killed all LC rebuild processes on prod9 (but at least prod9 is up again!)
  • 16:40 Dusti: reboot prod9
  • 12:59 Southparkfan: stop apache2 on prod12

February 22

  • 12:48 Southparkfan: below on prod11 too
  • 12:46 Southparkfan: cd /var/log/mediawiki/ && sudo rm -f spam.orain.org* on prod9

February 21

  • 16:18 Southparkfan: (prod10) sudo service cron restart
  • 16:17 Southparkfan: change root user password on prod10 again
  • 15:33 Southparkfan: change password of root user on prod10
  • 15:12 Southparkfan: sudo service cron restart on prod10
  • 15:12 Southparkfan: sudo service cron start on prod10

February 19

  • 14:12 Southparkfan: (prod8, prod9, prod11) cd /var/log/mediawiki/ && sudo rm -f spam.orain.org*

February 16

  • 06:39 Southparkfan: restart HHVM on prod11

February 15

  • 16:45 Southparkfan: ran below again
  • 16:38 Southparkfan: (prod9, prod11 - /var/log/mediawiki/) sudo rm -f *1.gz - logrotate is having trouble with these files ending with "1.gz" (for an unknown reason these files are not compressed log files, just empty files)
  • 09:28 Southparkfan: re-installed php5-gd package on prod6
  • 09:06 Southparkfan: restart HHVM on prod9 (it died for an still unknown reason)

February 14

  • 19:09 Southparkfan: (prod8, prod9, prod11) cd /var/log/mediawiki/ && sudo rm -f spam.orain.org*
  • 15:09 Southparkfan: remove unnecessary packages on prod9, cleans up another 300MB.
  • 14:50 Southparkfan: forced a logrotate run on prod9 again (all logs)
  • 14:46 Southparkfan: forced a logrotate run on prod9 (mediawiki logs only)

February 13

  • 14:34 Southparkfan: forced a logrotate run on prod11 too per the same reason as below. It seems it partially failed too, but still freed up ~100MB disk space.
  • 14:14 Southparkfan: forced a logrotate run on prod9 due to a critical amount of disk space left (<150 MB). It seems it partially failed, but it at least freed up something like 400MB disk space or so. Finding out now how to make the run succeed and compress even more log files.

February 11

  • 14:22 Addshore: fixed ansible run on prod7 due to conflict of user ids with 'git' user id 2003: Ran the following:
usermod -u 2103 git
groupmod -g 2103 git
find / -user 2003 -exec chown -h 2103 {} \;
find / -group 2005 -exec chgrp -h 2103 {} \;
usermod -g 2103 git
  • 14:15 Addshore: remove and re clone private repo on prod12 (fixes ansible run)

February 9

  • 12:30 Addshore: reloading haproxy on prod10

February 8

  • 13:57 Southparkfan: ran changePassword.php on techwiki for OrainLog

February 7

  • 15:15 Southparkfan: deleted "zacharydubois" on prod6 per request

Dusti upgraded GitHub to the Silver plan which includes private repos. SPF working on moving prod7 to a private git repo.

February 6

  • 17:01 Southparkfan: upgraded packages with security fixes

February 5

Addshore: on prod6! @ 5:00 GMT / 00:30hrs EST

Killed all processes for users noreply and jasper and altered users Ids to fix ansible run. Ran the following:

usermod -u 2101 noreply
groupmod -g 2101 noreply
find / -user 2006 -exec chown -h 2101 {} \;
find / -group 2008 -exec chgrp -h 2101 {} \;
usermod -g 2101 noreply
usermod -u 2102 jasper
groupmod -g 2102 jasper
find / -user 2007 -exec chown -h 2102 {} \;
find / -group 2009 -exec chgrp -h 2102 {} \;
usermod -g 2102 jasper

January 25

  • 17:14 Southparkfan: upgraded packages with security fixes (again)

January 23

  • 22:30 Tanner: Migrated DNS to CloudFlare for stability.
  • 18:49 Southparkfan: installed security updates across the servers

January 22

  • 23:39 Addshore - manually add technoratimedia_sv_115e9.txt file to the root mediawiki directory for Dusti, No point in this being in ansible, it can be removed / vanish whenever...

January 21

  • 15:00 Addshore - Killed udp.py script on prod6 that was point at JDnet
    • Manually copied script to /home/addshore/udp.py for testing (Not in ansible....) - seems to work fine and will run in screen

January 20

  • Sometime - Addshore: Manually patched rebuildtextindex in a secret place and ran accross ALL wikis in a Screen on some prod. Run successful and all indexes rebuilt.
  • 14:10 Southparkfan: ran rebuildtextindex.php on metawiki again (with php instead of php5)
  • 13:45 Southparkfan: ran rebuildtextindex.php on metawiki

January 17

  • 17:47 Southparkfan: changed Southparkfan2's password with changePassword.php (my account of which I forgot the password, and no email was set on the account).
  • 00:25 JohnLewis: prod9 has been running at 100% CPU since December 8th. Missing from ganglia. Hard reboot and investigating.

January 9

  • 18:42 JohnLewis: update.php on donjonwiki for BF tables
  • 18:41 Southparkfan: ran update.php again on donjonwiki to fix dberrors (run conflict with John but k)
  • 16:16 Southparkfan: ran update.php on donjonwiki

January 8

  • 16:30 Southparkfan: ran importImages.php again on donjonwiki (a few files had bad filenames, and now still a few have....)
  • 15:52 Southparkfan: ran importImages.php on donjonwiki

January 4

  • 18:24 Southparkfan: ran importDump.php on donjonwiki

December 30

  • 15:17 Southparkfan: ran importImages.php on robloxclanswiki

December 29

  • 21:13 JohnLewis: delete councilwiki

December 22

  • 22:21 JohnLewis: destoy prod3
  • 22:19 JohnLewis: push prod3 decom changes and pool prod12 in its place
  • 17:32 JohnLewis: deleted 5 wikis form prod3 and cleared respective tables in CA and loginwiki.
  • 09:05 JohnLewis: boot prod3 after uninitiated power down. Investigating.

December 19

  • 22:20 JohnLewis: password have been migrated
  • 21:05 JohnLewis: begin password type migration (pbkdf2-legacyB)
  • 20:58 JohnLewis: prod8 and prod11 are now running MW1.24. prod9 is still depooled pending finalising the update. Passwords needs to be wrapped (will do shortly)

December 5

  • 21:10 JohnLewis: MariaDB [(none)]> drop database dalieuwiki;

November 20

  • 15:30 Arcane: ran a database/ansible update.

November 14

  • 15:43 JohnLewis: MariaDB [(none)]> drop database Powersystemswiki;

October 29

  • 20:13 JohnLewis: update hhvm

October 27

  • 16:52 JohnLewis: drop database esourcewnywiki; (per technical reasons)

October 15

  • 20:02 JohnLewis: shutdown prod4 (planned for reinstall tomorrow)
  • 19:50 JohnLewis: remove prod4 from lb and purge DNS on ns1 and ns2
  • 17:27 JohnLewis: powercycle prod4 (was not responding to anything)

October 11

  • 19:47 JohnLewis: switch ns2.orain.org to prod7 (dns cache)
  • 18:46 JohnLewis: deleted wikis in the list here from prod3.
  • 18:43 JohnLewis: php5 /srv/mediawiki/w/maintenance/Orain/removeDeletedWikis.php --wiki loginwiki
  • 18:00 JohnLewis: security updates for MediaWiki
  • 16:40 JohnLewis: DNS change confirmed to have propagated to myself
  • ~14:00 JohnLewis: change orain.org's DNS to ns1.orain.org/ns2.orain.org from pam.ns.cloudflare.com/woz.ns.cloudflare.com

October 5

  • 18:10 JohnLewis: php5 /srv/mediawiki/w/maintenance/importDump.php --wiki classwiki /home/johnflewis/backup.xml

October 3rd

  • 18:22 JohnLewis: applied security fixes; email going to sysadmin shortly regarding this.

September 27

  • 17:48 JohnLewis: prod6 seems good. Moving onto prod7
  • 16:14 JohnLewis: lb.orain.org changed to prod8 (HHVM) for migration over to HHVM and Ubuntu

September 19

  • 19:46 JohnLewis: changed prod8's kernel to 3.13.0-32-generic (from 3.13.0-35-generic)
  • 17:36 JohnLewis: prod8 is now our first Ubuntu machine
  • 17:32 JohnLewis: begin upgrading prod8 to Ubuntu 14.04 (Trusty)
  • 17:30 JohnLewis: shutdown prod8

September 12

  • 16:37 JohnLewis: DELETE from page WHERE page_id = "409341"; (prod5; allthetropeswiki)

September 10

  • 16:28 JohnLewis: root@prod7:/var/mediawiki/uploads/common/skins/foreground/assets# mv font/ fonts/
  • 16:09 JohnLewis: usermod -u 1003 www-scripts on prod8
  • 15:28 JohnLewis: removed the below
  • 15:22 JohnLewis: added 'www-scripts ALL=(ALL) NOPASSWD:ALL' to /etc/sudoers
  • 15:21 JohnLewis: chmod 0777 /var/mediawiki/private /var/mediawiki/uploads

September 9

  • 15:10 JohnLewis: ufw delete allow 5070 on prod6

September 7

  • 11:05 php5-redis wanted to be installed so "dpkg -i /root/debs/php5-redis.deb". This is why our repo for debs / deb locations should be put in dpkg properly
  • 11:03 Addshore - ran dpkg --configure -a on prod4 to try to get ansible working

August 27

  • Migration done? - Been done for a few days now.

August 20

  • 14:01 JohnLewis: disable ansible on all servers
  • 14:00 JohnLewis: Migration start!

August 17

  • 18:08 JohnLewis: php createLocalAccount.php --wiki airwiki --username Dxing97
  • 18:07 JohnLewis: php migrateAccount.php --wiki loginwiki --username Dxing97


August 16

  • 13:30 JohnLewis: use allthetropeswiki; DELETE from watchlist WHERE wl_user = '21'; SELECT * from watchlist WHERE wl_user = '21';

August 2

  • 00:32 JohnLewis: MariaDB [(none)]> drop database aerowikiwiki;

July 29

  • 17:20 JohnLewis: /usr/lib/mailman/bin/withlist -l -r fix_url mailman,allthetropes,meta-admin --urlhost=lists.orain.org

July 28

  • 01:11 JohnLewis: (prod4 and prod5) ip6tables -I INPUT 1 -p tcp --dport 443 -j ACCEPT
  • 01:10 JohnLewis: (prod4 and prod5) ip6tables -I INPUT 1 -p tcp --dport 80 -j ACCEPT

July 26

  • 13:26 JohnLewis: chown -R www-data:www-data private/mediawiki/
  • 13:24 JohnLewis: mv static.orain.org/jasperinternal.orain.org/* private/mediawiki/jasperinternal.orain.org/
  • 13:20 JohnLewis: mkdir /usr/share/nginx/private/mediawiki/jasperinternal.orain.org/

July 23

  • 22:53 JohnLewis: reboot prod3 - MySQL health and load is fluctuating massively

July 22

  • 19:30 JohnLewis: gave SELECT to archive, revision, user and recentchanges tables on all PUBLIC WKIS for user 'useranalysis' on prod3. Account used by Cyberpower678 for his useranalysis tool. +1 for Orain-Community relations!

July 21

  • 15:42 JohnLewis: restart memcached (causing MediaWiki exceptions)

July 16

  • 20:00 JohnLewis: install prod5's basic requirements.
  • 18:55 JohnLewis: reboot prod4; irregular issues occuring
  • 16:05 JohnLewis: revert DNS back after fixing necessary issues relating to DNS
  • 15:20 JohnLewis: changed DNS for Orain directly to prod4 - broke all non-prod4 services

July 14

  • 21:09 JohnLewis: new SSL cert is installed and confirmed to be functioning correctly

July 13

  • 21:38 Addshore: Everything back up

July 8

  • 14:43 JohnLewis: mysql -p -e "drop database Techwritewiki"

July 7

  • 14:09 JohnLewis: speed seems to have improved. Need to further monitor the SQL downtimes however.
  • 14:08 JohnLewis: reboot prod3
  • 14:00 JohnLewis: prod3 is not responding to shell commands; matches downtimes with the farm

July 5

  • 17:00 JohnLewis: php createLocalAccount.php --wiki detectiveconanwiki --username KidProdigy
  • 16:58 JohnLewis: php migrateAccount.php --wiki metawiki --auto --homewiki metawiki --username KidProdigy
  • 13:45 JohnLewis: php maintenance/runJobs.php --wiki allthetropeswiki
  • 11:51 JohnLewis: mysql -p -e "drop database Revitestwiki"

June 28

  • 09:56 JohnLewis: confirmed security fix
  • 09:55 JohnLewis: reboot prod4 to force restart
  • 09:54 JohnLewis: manually patch OpenSSL to the latest release to fix a security issue (again)

June 20

  • 22:17 JohnLewis updated php5-fpm
  • 22:10 JohnLewis: updated OpenSSL to a security fix release
  • 00:10 Addshore: Started a run of rebuildFileCache.php for ATT wiki in a SCREEN on prod4 to fix pages once CSS extension has been re enabled

June 19

  • 23:45 Addshore: Reenabled ansible on prod4
  • 23:30 Addshore: Ran update.php on all wikis

June 18

  • 22:24 Addshore: Ran update.php on all wikis

June 15

  • 19:41 JohnLewis: mkdir OrainHacks; add a basic extension file and a .magic. file with LQT magicwords in. php rebuildLocalisationCache.php --force --wiki extloadwiki. Happy days! Now need to do it for the other 10 extensions disabled.

June 14

  • 19:45 Addshore: metawiki up, running the get db list script
  • 19:44 Addshore: DBlist is corrupt, replacing with "metawiki|meta|"
  • 19:35 Addshore: Removed Popups extension from mediawiki and reenabled ansible cron
  • 19:24 Addshore: all sites getting DB errors

June 11

  • 16:48 JohnLewis: disabled ansible to prevent ansible running while I do stuff (staggered committing)

June 10

  • 11:10 JohnLewis: added new .log files and rearranged the logging structure

June 9

  • 16:21 JohnLewis: upgrade spamassassin on prod1
  • 16:03 JohnLewis: php update.php --wiki jossewiki --quick

June 8

  • 0:00 JohnLewis: php deleteArchivedRevisions.php --wiki allthetropeswiki --delete

June 7

  • 21:36 JohnLewis: re-enable ansible
  • 21:30 JohnLewis: ran update.php on all wikis for MW 1.23 update
  • 19:35 JohnLewis: disabled ansible (for safety)
  • 16:08 JohnLewis: restarted memcached to clean up stuff
  • 16:00 JohnLewis: renamed 'spacetimewiki' database to 'timespacewiki'

June 3

  • 17:57 JohnLewis: purge torblock's node index

June 1

  • 15:34 JohnLewis: force password reset for "Stef99"
  • 15:33 JohnLewis: restart memcached

May 31

  • 12:46 addshore: ran update.php on ALL wikis
  • 12:43 addshore: updating to MW 1.22.7

May 30

  • 18:50 JohnLewis: remove 'notice' for CreateWiki on GitHub
  • 12:30 JohnLewis: ran ansible on prod4 to catch new nginx rules
  • 12:29 JohnLewis: change ufw rules on prod1 for mail
  • 01:02 JohnLewis: ufw allow 9300 and ufw allow 9200
  • 01:02 JohnLewis: playing tennis for elasticsearch on prod1. restarting it a bit.
  • 00:50 JohnLewis: remove elasticsearch from prod1
  • 00:25 JohnLewis: massive reduce in disk space :D
  • 00:24 addshore: on prod4 rm /root/old

May 29

  • 22:53 JohnLewis: ran ansible on prod1; needed to get the port rule in
  • 22:17 JohnLewis: del
  • 22:17 JohnLewis: restarted nginx
  • 22:06 addshore: rebooting prod1
  • 18:44 JohnLewis: restarted nagios3 on prod1
  • 18:30 addshore: ansible successfully runs on prod1 now, adding to cron
  • 18:25 JohnLewis: prod1: nagios3 -v *
  • 18:02 addshore: update ansible to 1.6.2 on prod1
  • 18:01 addshore: update ansible to 1.6.2 on prod3
  • 18:01 JohnLewis: Removed 'notice' from OrainMessages calls from GitHub
  • 17:58 addshore: update ansible to 1.6.2 on prod4
  • 17:57 addshore: orainLog back up...
  • 13:00 - 17:00 - Addshore - Poking prod4 and ansible. Prod4 now again has ansible on a cronjob. There were multiple shot downtimes during this time due to the poking of ufw (the firewall), but this was for the greater good!!!

May 24

  • 12:51 - JohnLewis - service php5-fpm restart
  • 12:46 - pingdom reports site down
  • 12:06 - JohnLewis - rename verkeerswiki to verkeerwiki. A bunch of SQL stuff.

May 23

  • 20:06 - JohnLewis - php createLocalAccount.php --wiki=espiralarchivowiki John

May 14

  • 16:07 - JohnLewis - php createLocalAccount.php --wiki=onepiecewiki Bocaniko
  • 16:01 - JohnLewis - clear apc cache

May 12

  • Recent downtime was caused by prod4 being suspended by the host, this is resolved.

May 11

  • 19:00 - JohnLewis: php reassignEdits.php --wiki allthetropeswiki 300154507a A300154507

May 09

  • 13:15 - addshore: added values for duplicity and AWS to prod3 vars
  • 13:15 - addshore: added AWS_BACKUPS_ACCESS_KEY_ID to prod3 vars.yml

April 15

  • prod2 died, migrated to a new user and the set up was pretty much so hacky nothing worked really. Kudu knows more about that than me.
  • A key server file became corrupted and the server crashed. That account s for around 24 hours, then we moved to a new server and had to deal with a hacky set up which accounts for the other 40 ish hours downtime.
  • This is kinda bad to say this downtime happened while we were still looking at the old downtime.. so :/
  • At least we know *why* this one occured.

April 14

  • 16:37 Addshore: prod4 - Killed db loop scripts running i18n cache updates
  • 16:42 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (this is all that is ever needed as extload has everything loaded and i18n cache is shared)
  • 16:45 JohnLewis: Reboot prod4
  • 16:50 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (in a SCREEN)
  • 16:51 JohnLewis: root@prod4:/# /etc/init.d/apache2 stop
  • 16:51 JohnLewis: root@prod4:/# /etc/init.d/nginx start
  • 17:17 Addshore: Remove JohnLewis IP from deny hosts file for sshd again on prod3

April 9

  • 22:30 JohnLewis: re enabled ansible cron
  • 16:23 JohnLewis: disabled ansible cron (doing live work on prod2 for ATTwiki). I'll post a note when I'm done.

April 6

  • 13:16 JohnLewis: run update.php on dangsunsnwiki and cheer
  • 13:08 JohnLewis: eval.php some more stuff into my dangsunsnwiki account...
  • 13:06 JohnLewis: eval.php an email into my dangsunsnwiki account
  • 13:03 JohnLewis: get annoyed about things
  • 12:57 JohnLewis: rename buswiki -> dangsunsnwiki

April 5

  • 9:10 JohnLewis: prod2 nginx killed and restarted, i18n cache reloaded

April 4

  • 13:00 pingdom reports orain down

April 3

  • 17:10 JohnLewis: dropped centralnoticetestwiki database as all worked - not needed now
  • 17:06 JohnLewis: manually ran ansible
  • 16:41 JohnLewis: ran update.php on all wikis

April 1

  • 19:14 JohnLewis: restarted nginx (not an April fools)

March 30

  • 16:41 JohnLewis: manually ran ansible because Joe is right about my stupidity sometimes
  • 16:19 JohnLewis: drop temp database (used to fix some issues with importing)
  • 16:18 JohnLewis: run update.php on archivoespiral and metawiki
  • 16:15 JohnLewis: do a bunch of SQL stuff on prod3 to get archivoespiral working

March 29

  • 19:46 Addshore: Remove JL IP from from prod2 deny hosts file

March 28

  • 21:30 Addshore: Remove JL IP from from prod2 deny hosts file

March 18

  • 20:27 JohnLewis: re enabled ansible cron on prod2
  • 19:47 JohnLewis: disabled ansible cron on prod2

March 15

  • 22:47 kudu: Run fixDoubleRedirects.php on ATT

March 12

  • 17:35 JohnLewis: populated interwiki table on some databases

March 8

  • 14:32 JohnLewis: changed some centralauth database entries to suit wiki move

March 7

  • 22:40 JohnLewis: dumped allthetropeswiki for Arcane
  • 16:53 JohnLewis: manually updated ansible
  • 16:51 JohnLewis: renamed database trainwiki to reviwiki

March 4

  • 20:46 JohnLewis: ran CentralAuth's createLocalAccount.php for myself on a few wikis to fix things

March 2

  • 01:25 JohnLewis: ran update.php on all wikis
  • 00:54 JohnLewis: manually ran ansible again
  • 00:47 JohnLewis: manually updated ansible (debugging - yay)

March 1

  • 23:58 JohnLewis: manually ran ansible and update.php on jdwiki

February 26

  • 21:53 JohnLewis: ditto on metawiki
  • 21:52 JohnLewis: ran update on jh67wiki

February 25

  • 03:32 kudu: Ran fixDoubleRedirects.php on ATT

February 21

  • 23:54 addshore: ran update.php for pmr2014wiki
  • 23:48 addshore: prod2 uninstalled dvipng texlive-latex-base etc. cjk-latex
  • 23:30 addshore: .... all of which we have and work.... GAH!
  • 23:30 addshore: for the record it is stuck on.. Failed to parse(PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert))
  • 23:29 addshore: apt-get installed dvipng texlive-latex-base texlive-latex-extra tex-live-recommended cjk-latex while trying to fix Math, no success
  • 22:36 addshore: chmod and chown Math extension .. we should have this all pulled as www-data
  • 22:22 addshore: prod2 ran make in /usr/share/nginx/.orain.org/w/extensions/Math/math
  • 22:07 addshore: ran update.php on extload
  • 21:56 addshore: reenable prod2 ansible cron
  • 21:14 addshore: disabling ansible on prod2
  • 20:14 addshore: ran i18n cache rebuild
  • 20:14 JohnFLewis: rebooted prod2
  • 15:20 JohnLewis: manually update ansible
  • 10:34 addshore: ran update.php on all wikis

February 20

  • 21:57 JohnLewis: ran update.php on jdwiki
  • 21:11 addshore: reenabling ansible pull cron on prod2 after resolving issue 173
  • 20:06 addshore: comment out ansible pull from prod2 cron while I manually poke collection extension
  • 16:25 JohnLewis: Info: Mail is fully working with a final dovecot restart!
  • 16:20 JohnLewis: changed dovecot config and the restarted x3 (issues first two times)
  • 15:32 JohnLewis: restarted dovecot on prod1

February 19

  • 22:40 addshore: prod2 on mediawiki submodules ran git submodle foreach --recursive git config core.fileMode false - this also solves the dirsty Elastica folder
  • 22:33 addshore: prod2 on mediawiki submodules ran git submodle foreach git config core.fileMode false
  • 22:26 addshore: prod2 chown www-data:www-data /w/extensions/*
  • 22:19 addshore: simplified the two ansible cronjobs on prod2
  • 22:13 addshore: rm /root/ans on prod2, this files is wrong!
  • 00:30 kudu: Ran rebuildtextindex.php on all wikis

February 13

  • 16:23 addshore: i18n cache broke, tried rebuilding off extload but the script wouldn't run, ran off metawiki first then off extloadwiki and everything returned to normal. The question remains why did the cache break in the first place and why could we not rebuild from extload wiki in the first place?
  • 16:20 addshore: MWEXCEPTIONS EVERYWHERE!

February 11

  • 03:11 kudu: Compress revisions on ATT using concat mode

February 8

  • 17:52 kudu: Disabled ufw on prod2, wasn't working well

February 7

  • 21:13 addshore: reenable ansible cron on prod2
  • 21:07 addshore: i18n rebuilt
  • 21:06 addshore: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php. - rebuild fails
  • 21:05 addshore: exceptions everywhere, disabling ansible cron on prod2
  • 21:03 JohnLewis: rebooted prod2
  • 20:19 addshore: re enable ansible cron on prod2
  • 21:00 Kudu: indexing for elastic search on prod3
  • 19:06 Kudu: Modified ufw settings on prod3 following this article: , http://blog.kylemanna.com/linux/2013/04/26/ufw-vps/
  • 13:49 addshore: 504 Gateway Time-out on extload
  • 13:02 addshore: added another case to cc.php, apc clearing is working again
  • 12:38 addshore: running manual ansible pull
  • 12:33 addshore: disabling ansible cron on prod2

January 29

  • 23:14 - addshore - rename Promethiawiki db to promethiawiki. Also created Renaming_a_database to help people fix this in the future

January 27

  • 18:48 - John - re enabled ansible pull
  • 18:36 - John - disabled ansible pull temporarily for now -- needs to reenabled shortly

January 26

  • 19:08 - John - prod3 is now set up. Waiting before enabling however.
  • 18:25 - John - finished installing basic things onto prod3.
  • 15:17 - addshore - attempted everything I could think of but think Kudu will have to fix this, no idea what he has done! For tracking see github issue
  • 14:40 - addshore - after trying to revert several changes ansible still doesn't seem to pull, now it appears to just be hanging once updating from the repo. i.e. no tasks are run
  • 14:08 - addshore - ansible pull broken caused by the 'Create User' task, investigating...

January 25

  • 18:18 - John - Ran update.php on techwiki per Addshore's request
  • 18:07 - John - Manually pull ansible to fix my stupidity

January 20

  • 3:14 Kudu (talk) Indexed mediawikitesterswiki pages in CirrusSearch and launched an indexing loop for ATT.

January 19

  • 19:51 - addshore - remove pipeing to log files for all ansible-pulls. since http://git.io/JMVAXQ we make ansible do it itself

January 18

  • 17:15 - addshore - remove unused /etc/nginx/sites-enabled/test file that was being included but not in playbook
  • 15:27 - addshore - ansible-pull now runs as a success again!
  • 15:25 - addshore - update ansible to 1.4.4
  • 15:22 - addshore - ran updatedb
  • 15:05 - addshore - failed_when was added in 1.4 per this need to update ansible in order for the below to work
  • 15:03 - addshore - ansible-pull >> ERROR: failed_when is not a legal parameter in an Ansible task or handler >> caused by my commit (in the process of fixing now..)

January 13

January 11

January 06

  • 18:50 - addshore - Manual pull to pickup this commit cleaning cronjobs before the next jobqueue jobqueue run
  • 18:26 - addshore - Manually ran update.php across everything AGAIN as it was needed for the last commit.
  • 18:21 - addshore - Manual pull to pickup another commit fixing more of local settings, John again needs to be slapped!
  • 18:04 - addshore - Run maintenance.php for ALL wikis, I gather some runs have been missed during the whole cron not pulling anisble thing!
  • 17:49 - addshore - Manually stab the job queue, As far as I can tell from this commit the cron should work but I dont want to wait 10 mins to find out.!
  • 17:39 - addshore - Manual anisble pull of this commit to fix broken local settings. John needs to be slaped for this commit
  • 17:25 - addshore - Manual anisble pull
  • 17:22 - addshore - uncomment anisble-pull line from crontab, Not sure who has done this..., Also change the cron to every 10 mins. As this may have been like this for a while I am bracing for some errors...

2014

December 14

  • 18:30 Kudu (talk) Installed git from wheezy-backports, updated MediaWiki to 1.22 and ran update.php on all wikis.

December 1

  • 00:24 Kudu (talk) Chmodded /var/log/mediawiki to 770.

November 26

  • 03:12 Kudu (talk) Ran deleteBatch.php on a list of ATT redirects and ran deleteArchivedRevisions.php on All The Tropes.

November 23

  • 23:54 Kudu (talk) Re-chmodded the web directory and the MediaWiki log directory to 770 and set the git core.fileMode configuration option to false in the MediaWiki directory to stop it from messing with permissions.
  • 18:26 Kudu (talk) Ran deleteBatch.php on a list of Troper Tales pages and ran deleteArchivedRevisions.php on All The Tropes.

November 21

  • 00:25 addshore - manually run i18n cache update

November 17

  • 18:54 Addshore - update.php on all wikis
  • 18:48 Addshore - 'git stash' changes on extension/CentralNotice prod2. Not sure why the changes were there but they were stopping ansible from updating the extension

November 12

  • 02:47 Kudu (talk) Imported the new file description pages on ATT and ran deleteOldRevisions.php on the file pages thanks to some SQL/xargs magic.

November 11

  • 23:23 Kudu (talk) Imported the file description pages on ATT and ran deleteArchivedFiles.php and deleteOldRevisions.php on the file pages thanks to some SQL/xargs magic.
  • 22:11 Kudu (talk) Running importImages.php on ATT's missing and numeric images.

November 10

  • 15:42 Kudu (talk) Ran update.php and rebuildTitleKeys.php on all wikis.

November 07

  • 20:44 Addshore - Manually run 18n cache update as the cron isnt working
  • 20:09 Addshore - Reenable cron per the 12 commits just being this....
  • 19:59 Addshore - commenting out anisble-pull from prod2 crontab after somehow pushing 12 unexpected and unknown changes to github...
  • 19:52 Addshore Correct file owners and permissions for allthetropes images directory -R

November 02

  • 12:24 Addshore Rebuilding all caches to reflect file location moves due to upload hostname change

November 01

October 20

September 29

  • 10:11 Kudu (talk) Renamed the database `alleniawikiwiki` to `alleniawiki`.

July 31

  • 22:48 Kudu (talk) Changed MySQL parameters: table_open_cache=2500, thread_cache_size=48. Kudu (talk) 22:48, 31 July 2013 (UTC)
  • 22:37 Kudu (talk) Change MySQL parameters: long_query_time=1, query_cache_size=32M, slow_query_log=1, table_open_cache=400, thread_cache_size=4. Those are preliminary settings, they should be adjusted more carefully eventually.