Tech:Server admin log

From Orain Meta
Revision as of 12:29, 30 May 2014 by imported>OrainLog (change ufw rules on prod1 for mail (JohnLewis))
Jump to navigation Jump to search

May 30

  • 12:29 JohnLewis: change ufw rules on prod1 for mail
  • 01:02 JohnLewis: ufw allow 9300 and ufw allow 9200
  • 01:02 JohnLewis: playing tennis for elasticsearch on prod1. restarting it a bit.
  • 00:50 JohnLewis: remove elasticsearch from prod1
  • 00:25 JohnLewis: massive reduce in disk space :D
  • 00:24 addshore: on prod4 rm /root/old

May 29

  • 22:53 JohnLewis: ran ansible on prod1; needed to get the port rule in
  • 22:17 JohnLewis: del
  • 22:17 JohnLewis: restarted nginx
  • 22:06 addshore: rebooting prod1
  • 18:44 JohnLewis: restarted nagios3 on prod1
  • 18:30 addshore: ansible successfully runs on prod1 now, adding to cron
  • 18:25 JohnLewis: prod1: nagios3 -v *
  • 18:02 addshore: update ansible to 1.6.2 on prod1
  • 18:01 addshore: update ansible to 1.6.2 on prod3
  • 18:01 JohnLewis: Removed 'notice' from OrainMessages calls from GitHub
  • 17:58 addshore: update ansible to 1.6.2 on prod4
  • 17:57 addshore: orainLog back up...
  • 13:00 - 17:00 - Addshore - Poking prod4 and ansible. Prod4 now again has ansible on a cronjob. There were multiple shot downtimes during this time due to the poking of ufw (the firewall), but this was for the greater good!!!

May 24

  • 12:51 - JohnLewis - service php5-fpm restart
  • 12:46 - pingdom reports site down
  • 12:06 - JohnLewis - rename verkeerswiki to verkeerwiki. A bunch of SQL stuff.

May 23

  • 20:06 - JohnLewis - php createLocalAccount.php --wiki=espiralarchivowiki John

May 14

  • 16:07 - JohnLewis - php createLocalAccount.php --wiki=onepiecewiki Bocaniko
  • 16:01 - JohnLewis - clear apc cache

May 12

  • Recent downtime was caused by prod4 being suspended by the host, this is resolved.

May 11

  • 19:00 - JohnLewis: php reassignEdits.php --wiki allthetropeswiki 300154507a A300154507

May 09

  • 13:15 - addshore: added values for duplicity and AWS to prod3 vars
  • 13:15 - addshore: added AWS_BACKUPS_ACCESS_KEY_ID to prod3 vars.yml

April 15

  • prod2 died, migrated to a new user and the set up was pretty much so hacky nothing worked really. Kudu knows more about that than me.
  • A key server file became corrupted and the server crashed. That account s for around 24 hours, then we moved to a new server and had to deal with a hacky set up which accounts for the other 40 ish hours downtime.
  • This is kinda bad to say this downtime happened while we were still looking at the old downtime.. so :/
  • At least we know *why* this one occured.

April 14

  • 16:37 Addshore: prod4 - Killed db loop scripts running i18n cache updates
  • 16:42 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (this is all that is ever needed as extload has everything loaded and i18n cache is shared)
  • 16:45 JohnLewis: Reboot prod4
  • 16:50 Addshore: prod4 - Updating i18n cache for metawiki and extloadwiki (in a SCREEN)
  • 16:51 JohnLewis: root@prod4:/# /etc/init.d/apache2 stop
  • 16:51 JohnLewis: root@prod4:/# /etc/init.d/nginx start
  • 17:17 Addshore: Remove JohnLewis IP from deny hosts file for sshd again on prod3

April 9

  • 22:30 JohnLewis: re enabled ansible cron
  • 16:23 JohnLewis: disabled ansible cron (doing live work on prod2 for ATTwiki). I'll post a note when I'm done.

April 6

  • 13:16 JohnLewis: run update.php on dangsunsnwiki and cheer
  • 13:08 JohnLewis: eval.php some more stuff into my dangsunsnwiki account...
  • 13:06 JohnLewis: eval.php an email into my dangsunsnwiki account
  • 13:03 JohnLewis: get annoyed about things
  • 12:57 JohnLewis: rename buswiki -> dangsunsnwiki

April 5

  • 9:10 JohnLewis: prod2 nginx killed and restarted, i18n cache reloaded

April 4

  • 13:00 pingdom reports orain down

April 3

  • 17:10 JohnLewis: dropped centralnoticetestwiki database as all worked - not needed now
  • 17:06 JohnLewis: manually ran ansible
  • 16:41 JohnLewis: ran update.php on all wikis

April 1

  • 19:14 JohnLewis: restarted nginx (not an April fools)

March 30

  • 16:41 JohnLewis: manually ran ansible because Joe is right about my stupidity sometimes
  • 16:19 JohnLewis: drop temp database (used to fix some issues with importing)
  • 16:18 JohnLewis: run update.php on archivoespiral and metawiki
  • 16:15 JohnLewis: do a bunch of SQL stuff on prod3 to get archivoespiral working

March 29

  • 19:46 Addshore: Remove JL IP from from prod2 deny hosts file

March 28

  • 21:30 Addshore: Remove JL IP from from prod2 deny hosts file

March 18

  • 20:27 JohnLewis: re enabled ansible cron on prod2
  • 19:47 JohnLewis: disabled ansible cron on prod2

March 15

  • 22:47 kudu: Run fixDoubleRedirects.php on ATT

March 12

  • 17:35 JohnLewis: populated interwiki table on some databases

March 8

  • 14:32 JohnLewis: changed some centralauth database entries to suit wiki move

March 7

  • 22:40 JohnLewis: dumped allthetropeswiki for Arcane
  • 16:53 JohnLewis: manually updated ansible
  • 16:51 JohnLewis: renamed database trainwiki to reviwiki

March 4

  • 20:46 JohnLewis: ran CentralAuth's createLocalAccount.php for myself on a few wikis to fix things

March 2

  • 01:25 JohnLewis: ran update.php on all wikis
  • 00:54 JohnLewis: manually ran ansible again
  • 00:47 JohnLewis: manually updated ansible (debugging - yay)

March 1

  • 23:58 JohnLewis: manually ran ansible and update.php on jdwiki

February 26

  • 21:53 JohnLewis: ditto on metawiki
  • 21:52 JohnLewis: ran update on jh67wiki

February 25

  • 03:32 kudu: Ran fixDoubleRedirects.php on ATT

February 21

  • 23:54 addshore: ran update.php for pmr2014wiki
  • 23:48 addshore: prod2 uninstalled dvipng texlive-latex-base etc. cjk-latex
  • 23:30 addshore: .... all of which we have and work.... GAH!
  • 23:30 addshore: for the record it is stuck on.. Failed to parse(PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert))
  • 23:29 addshore: apt-get installed dvipng texlive-latex-base texlive-latex-extra tex-live-recommended cjk-latex while trying to fix Math, no success
  • 22:36 addshore: chmod and chown Math extension .. we should have this all pulled as www-data
  • 22:22 addshore: prod2 ran make in /usr/share/nginx/.orain.org/w/extensions/Math/math
  • 22:07 addshore: ran update.php on extload
  • 21:56 addshore: reenable prod2 ansible cron
  • 21:14 addshore: disabling ansible on prod2
  • 20:14 addshore: ran i18n cache rebuild
  • 20:14 JohnFLewis: rebooted prod2
  • 15:20 JohnLewis: manually update ansible
  • 10:34 addshore: ran update.php on all wikis

February 20

  • 21:57 JohnLewis: ran update.php on jdwiki
  • 21:11 addshore: reenabling ansible pull cron on prod2 after resolving issue 173
  • 20:06 addshore: comment out ansible pull from prod2 cron while I manually poke collection extension
  • 16:25 JohnLewis: Info: Mail is fully working with a final dovecot restart!
  • 16:20 JohnLewis: changed dovecot config and the restarted x3 (issues first two times)
  • 15:32 JohnLewis: restarted dovecot on prod1

February 19

  • 22:40 addshore: prod2 on mediawiki submodules ran git submodle foreach --recursive git config core.fileMode false - this also solves the dirsty Elastica folder
  • 22:33 addshore: prod2 on mediawiki submodules ran git submodle foreach git config core.fileMode false
  • 22:26 addshore: prod2 chown www-data:www-data /w/extensions/*
  • 22:19 addshore: simplified the two ansible cronjobs on prod2
  • 22:13 addshore: rm /root/ans on prod2, this files is wrong!
  • 00:30 kudu: Ran rebuildtextindex.php on all wikis

February 13

  • 16:23 addshore: i18n cache broke, tried rebuilding off extload but the script wouldn't run, ran off metawiki first then off extloadwiki and everything returned to normal. The question remains why did the cache break in the first place and why could we not rebuild from extload wiki in the first place?
  • 16:20 addshore: MWEXCEPTIONS EVERYWHERE!

February 11

  • 03:11 kudu: Compress revisions on ATT using concat mode

February 8

  • 17:52 kudu: Disabled ufw on prod2, wasn't working well

February 7

  • 21:13 addshore: reenable ansible cron on prod2
  • 21:07 addshore: i18n rebuilt
  • 21:06 addshore: No localisation cache found for English. Please run maintenance/rebuildLocalisationCache.php. - rebuild fails
  • 21:05 addshore: exceptions everywhere, disabling ansible cron on prod2
  • 21:03 JohnLewis: rebooted prod2
  • 20:19 addshore: re enable ansible cron on prod2
  • 21:00 Kudu: indexing for elastic search on prod3
  • 19:06 Kudu: Modified ufw settings on prod3 following this article: , http://blog.kylemanna.com/linux/2013/04/26/ufw-vps/
  • 13:49 addshore: 504 Gateway Time-out on extload
  • 13:02 addshore: added another case to cc.php, apc clearing is working again
  • 12:38 addshore: running manual ansible pull
  • 12:33 addshore: disabling ansible cron on prod2

January 29

  • 23:14 - addshore - rename Promethiawiki db to promethiawiki. Also created Renaming_a_database to help people fix this in the future

January 27

  • 18:48 - John - re enabled ansible pull
  • 18:36 - John - disabled ansible pull temporarily for now -- needs to reenabled shortly

January 26

  • 19:08 - John - prod3 is now set up. Waiting before enabling however.
  • 18:25 - John - finished installing basic things onto prod3.
  • 15:17 - addshore - attempted everything I could think of but think Kudu will have to fix this, no idea what he has done! For tracking see github issue
  • 14:40 - addshore - after trying to revert several changes ansible still doesn't seem to pull, now it appears to just be hanging once updating from the repo. i.e. no tasks are run
  • 14:08 - addshore - ansible pull broken caused by the 'Create User' task, investigating...

January 25

  • 18:18 - John - Ran update.php on techwiki per Addshore's request
  • 18:07 - John - Manually pull ansible to fix my stupidity

January 20

  • 3:14 Kudu (talk) Indexed mediawikitesterswiki pages in CirrusSearch and launched an indexing loop for ATT.

January 19

  • 19:51 - addshore - remove pipeing to log files for all ansible-pulls. since http://git.io/JMVAXQ we make ansible do it itself

January 18

  • 17:15 - addshore - remove unused /etc/nginx/sites-enabled/test file that was being included but not in playbook
  • 15:27 - addshore - ansible-pull now runs as a success again!
  • 15:25 - addshore - update ansible to 1.4.4
  • 15:22 - addshore - ran updatedb
  • 15:05 - addshore - failed_when was added in 1.4 per this need to update ansible in order for the below to work
  • 15:03 - addshore - ansible-pull >> ERROR: failed_when is not a legal parameter in an Ansible task or handler >> caused by my commit (in the process of fixing now..)

January 13

January 11

January 06

  • 18:50 - addshore - Manual pull to pickup this commit cleaning cronjobs before the next jobqueue jobqueue run
  • 18:26 - addshore - Manually ran update.php across everything AGAIN as it was needed for the last commit.
  • 18:21 - addshore - Manual pull to pickup another commit fixing more of local settings, John again needs to be slapped!
  • 18:04 - addshore - Run maintenance.php for ALL wikis, I gather some runs have been missed during the whole cron not pulling anisble thing!
  • 17:49 - addshore - Manually stab the job queue, As far as I can tell from this commit the cron should work but I dont want to wait 10 mins to find out.!
  • 17:39 - addshore - Manual anisble pull of this commit to fix broken local settings. John needs to be slaped for this commit
  • 17:25 - addshore - Manual anisble pull
  • 17:22 - addshore - uncomment anisble-pull line from crontab, Not sure who has done this..., Also change the cron to every 10 mins. As this may have been like this for a while I am bracing for some errors...

2014

December 14

  • 18:30 Kudu (talk) Installed git from wheezy-backports, updated MediaWiki to 1.22 and ran update.php on all wikis.

December 1

  • 00:24 Kudu (talk) Chmodded /var/log/mediawiki to 770.

November 26

  • 03:12 Kudu (talk) Ran deleteBatch.php on a list of ATT redirects and ran deleteArchivedRevisions.php on All The Tropes.

November 23

  • 23:54 Kudu (talk) Re-chmodded the web directory and the MediaWiki log directory to 770 and set the git core.fileMode configuration option to false in the MediaWiki directory to stop it from messing with permissions.
  • 18:26 Kudu (talk) Ran deleteBatch.php on a list of Troper Tales pages and ran deleteArchivedRevisions.php on All The Tropes.

November 21

  • 00:25 addshore - manually run i18n cache update

November 17

  • 18:54 Addshore - update.php on all wikis
  • 18:48 Addshore - 'git stash' changes on extension/CentralNotice prod2. Not sure why the changes were there but they were stopping ansible from updating the extension

November 12

  • 02:47 Kudu (talk) Imported the new file description pages on ATT and ran deleteOldRevisions.php on the file pages thanks to some SQL/xargs magic.

November 11

  • 23:23 Kudu (talk) Imported the file description pages on ATT and ran deleteArchivedFiles.php and deleteOldRevisions.php on the file pages thanks to some SQL/xargs magic.
  • 22:11 Kudu (talk) Running importImages.php on ATT's missing and numeric images.

November 10

  • 15:42 Kudu (talk) Ran update.php and rebuildTitleKeys.php on all wikis.

November 07

  • 20:44 Addshore - Manually run 18n cache update as the cron isnt working
  • 20:09 Addshore - Reenable cron per the 12 commits just being this....
  • 19:59 Addshore - commenting out anisble-pull from prod2 crontab after somehow pushing 12 unexpected and unknown changes to github...
  • 19:52 Addshore Correct file owners and permissions for allthetropes images directory -R

November 02

  • 12:24 Addshore Rebuilding all caches to reflect file location moves due to upload hostname change

November 01

October 20

September 29

  • 10:11 Kudu (talk) Renamed the database `alleniawikiwiki` to `alleniawiki`.

July 31

  • 22:48 Kudu (talk) Changed MySQL parameters: table_open_cache=2500, thread_cache_size=48. Kudu (talk) 22:48, 31 July 2013 (UTC)
  • 22:37 Kudu (talk) Change MySQL parameters: long_query_time=1, query_cache_size=32M, slow_query_log=1, table_open_cache=400, thread_cache_size=4. Those are preliminary settings, they should be adjusted more carefully eventually.