User Details
- User Since
- Jan 23 2023, 12:05 PM (94 w, 3 d)
- Availability
- Available
- LDAP User
- EoghanGaffney
- MediaWiki User
- EGaffney-WMF [ Global Accounts ]
Tue, Nov 5
Thu, Oct 24
We've been talking about this back and forth with znuny, they have a few suggestions, tracking here to see what we can try:
Mon, Oct 21
Silenced these alerts for 48 hours.
Oct 4 2024
The file is cleaned up, closing.
Oct 3 2024
Oct 1 2024
I commented out the extra lines and rebooted, the host came back up and mail is flowing as expected.
Sep 26 2024
208.80.154.21/32 and 2620:0:861:1:208:80:154:21/128 are both from the linked puppet change, but the third address, 2620:0:861:1:208:80:154:81/64, isn't. This was generated somewhere else.
Sep 24 2024
Sep 17 2024
This is no longer an issue
Sep 9 2024
This will mask the service from starting inadvertently, and should stop these alarms.
Sep 6 2024
This was an after-effect of rebooting the host.
This was due to the other list host rebooting.
Sep 5 2024
This was me when restarting the host related to T373980
Yeah, that's right -- we moved from ferm to nftables, but then reverted because of T373637. I'll take a look at the cleanup later this afternoon.
@fgiunchedi pointed out that there was still some space free in the VG, so the volume could be expanded instead. There's approximately 90G free on the disk unallocated, and since the mailman2 data will never grow (the newest file in that directory is 2001), this should give us sufficient headroom for logrotate to take care of the rest.
Sep 3 2024
Aug 23 2024
I've changed the backup script to tolerate a failure in the prometheus-pushgateway, so this can be closed.
Aug 22 2024
This seems to have been a blip that hasn't reoccurred.
Aug 16 2024
I've added the user to the wmf group. @dchan, I'm going to close this now, let me know if anything seems missing!
Hi @odimitrijevic, could you please look at this as an approver for the analytics-privatedata-users group? Thanks!
Aug 15 2024
Confirmed working!
Aug 13 2024
Jul 25 2024
Jul 9 2024
lists1001 has been decommissioned and all current hosts are running bookworm.
This was mostly a brain-dump just before I left on PTO, so we're going to close this in favour of some of the other more detailed tasks, namely T278495: Figure out plan for mailman IP situation and T286066: Put lists.wikimedia.org web interface behind LVS
Jul 4 2024
lists1003 doesn't exist anymore, so this can probably be closed.
Jul 2 2024
Spoken with @Ladsgroup , I think there's nothing immediate for sre-collab to do here so reassigning. Feel free to send it back to me if that changes!
I think we can close this, since the puppet module now installs mailman3 on lists2001 (albeit disabled), unless I'm missing something
lists1001 has been powered off, it will stay off for 1 week and then I'll decommission it fully on Tuesday, 9th July, after this we can close this ticket.
This was fixed by the patch merged on June 20th.
Puppet has been re-enabled and run successfully, sorry for the noise
Jun 21 2024
The migration to the new host is done. The last remaining item before we can close this ticket is to decommission the old host. We're going to keep that around for two weeks after the migration, which will be Tuesday 2nd July. The host will be shut down on that date, and decommissioned on the Tuesday after.
Jun 20 2024
The patch for this was merged and should no longer be an issue.
Yep!
Jun 19 2024
The maintenance was completed yesterday and so far the service seems stable. I'm going to close this now, and we can re-open if we come across any issues.
Jun 18 2024
That's right -- we'll be doing that as part of the maintenance work later today. We kept them firewalled off so that the non-active host isn't writing to the database at the same time as the active. In the future it might make more sense to allow all hosts access but have a read/write user for the active host, and read only for the non-active.
Jun 17 2024
It's possible that the grants are already covered by the proxies listed here, but it would be good to check before we start our migration
Jun 14 2024
I've created a sub-task for the migration itself so users and community members can follow the migration itself more easily, rather than trawling through comments and patch notifications. It's been tagged with User-notice so it ends up on tech news. The downtime will be on Tuesday 18th from 10-12 UTC.
Jun 10 2024
This was due to the apt package starting the service (and failing) despite the puppet recipe being set to ensure => stopped
Jun 6 2024
The rough outline for migration is:
May 31 2024
May 30 2024
I've also run the upgrade on gitlab1004, now only leaving the primary (gitlab2002) left to be upgraded.
May 28 2024
I ran the test upgrade (sudo gitlab-ctl pg-upgrade) this afternoon on gitlab1003, and it succeeded. The total time required was 1m51s, and generated no error messages.
This looks to be happening when the host is restarted for a backup/restore
May 27 2024
More details found in T365781
The lists alert was me deploying puppet changes, with more details in T365698
The lists alert was me deploying puppet changes, with more details in T365698
This happened as part of the deployment of mailman3 on the new hosts. It fails because there are firewall rules stopping the new hosts from communicating with the mysql db. This fails when the package is installed, and tries to be started by the package installer, but since we want the package to be stopped on the host, we can tolerate this failure temporarily.