Task #3362

Migrate Mentat system to new hardware

Added by Jan Mach over 2 years ago. Updated about 1 year ago.

Status:ClosedStart date:03/21/2017
Priority:HighDue date:
Assignee:Jan Mach% Done:

100%

Category:Installation
Target version:2.0

Description

Overall migration status (periodically updated)

  • (DONE) Install new base server
    • New server mentat-alt.cesnet.cz is ready and installed using Ansible
  • (DONE) Install Mentat system and Warden client on the server
    • Development version of Mentat system is installed on the server using Debian package system
    • Warden client is installed on the server and connected to production instance of Warden server
  • (IN PROGRESS) Perform data and service migration to new server
    • Prepare database migration scripts
    • Prepare filesystem migration scripts
    • Prepare utility migration scripts
    • Migrate the Mentat service
  • Verify functionality

General guidelines for migration process

  1. Day before migration lower the TTL of relevant DNS records for mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  2. Presynchronize filesystem data (rsync), so that the actual migration will be much quicker later.
    1. report attachments
    2. RRD databases and chart images
    3. cache files
    4. persistent state files
    5. runlog files? (maybe not necessary)
    6. log files? (maybe not necessary)
  3. Shut down Warden client on mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers and let Mentat empty all queues.
  4. Shut down Mentat systems on mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  5. Perform database migration
    1. users
    2. groups
    3. filters
    4. networks
    5. reports
    6. event statistics
  6. Perform filesystem migration - same data as above
  7. Perform configuration migration
    1. synchronize content of /etc/mentat configuration directory
  8. Switch warden client certificates between mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  9. Switch shibd configuration between mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  10. Switch hostnames and IP addresses between mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  11. Reboot both servers and pray to your favorite god, or as an atheist sit quietly with your hands in your lap
  12. Login to new mentat-hub.cesnet.cz and launch everything
    1. Launch Mentat backend services (daemons and scripts)
    2. Launch Warden client and verify messages are being stored into database
    3. Verify that the web interface is accessible
  13. Synchronize crontab for root

Event migration might not be necessary. If the Mentat will be running on new server for some time and without any downtimes, we could skip slow migration of events from MongoDB to PostgreSQL.

Migration process checklist

Trial period before migration

During this period the mentat-alt.cesnet.cz server works as independent and fully operational instance of Mentat system, which can be used for testing and development purposes.

  1. (DONE) mentat-alt.cesnet.cz: Install base server.
  2. mentat-alt.cesnet.cz: Configure server monitoring with Nagios.
  3. mentat-alt.cesnet.cz: Configure server backup.
  4. (DONE) mentat-alt.cesnet.cz: Install development version of Mentat system. Keep it running and updated during trial period.
  5. (DONE) mentat-hub.cesnet.cz: Write script for periodical dump of MongoDB.
    1. Script is called /root/mentatdb-dump-all.sh.
    2. Verified, that script is working properly.
  6. (DONE) mentat-alt.cesnet.cz: Write script for periodical import of MongoDB dumps from mentat-hub.cesnet.cz.
    1. Script is called /root/mentat-sync-mongodb.sh.
    2. Verified, that script is working properly.
  7. (DONE) mentat-hub.cesnet.cz: Install cronjob for script /root/mentat-sync-mongodb.sh to periodically test the import process.
    1. Installed with following root crontab record: 5 */4 * * * /root/mentat-sync-mongodb.sh
    2. The script will perform fresh dump using /root/mentatdb-dump-all.sh on mentat-hub.cesnet.cz, fetch the result and import it to local MongoDB instance.
    3. Verified, that cronjob is working properly.

From day before migration until migration time.

After this period the mentat-alt.cesnet.cz is getting ready for migration process. All Mentat modules will be stopped and data will be synchronized to the local filesystem. Only web interface will be operational to some extend and can be used to verify, that migrated data will be accessible.

  1. (DONE) Stop all Warden client daemons.
  2. (DONE) Stop all Mentat modules.
  3. (DONE) mentat-alt.cesnet.cz: Write script for periodical Mentat filesystem data synchronization.
    1. Script is called /root/mentat-sync-files.sh.
    2. Verified, that script is working properly.
  4. (DONE) mentat-hub.cesnet.cz: Install cronjob for script /root/mentat-sync-files.sh to periodically prefetch filesystem data to target server.
    1. Installed with following root crontab record: 35 * * * * /root/mentat-sync-files.sh --skip-install
    2. Verified, that cronjob is working properly.
  5. (DONE) mentat-alt.cesnet.cz: Prepare new networking configuration into file /etc/network/interfaces.new, backup current setting into file /etc/networking/interfaces.old.
    1. New networking configuration configuration can be enabled by following command cp /etc/networking/interfaces.new /etc/networking/interfaces and restarting the networking service.
  6. (DONE) mentat-alt.cesnet.cz: Write script for quick renaming of the server to different name.
    1. Script will replace all ocurences of mentat-alt with mentat-hub in list of selected configuration files.
    2. Script is called /root/system-rename.sh.
    3. Verified, that script is working properly.
  7. (DONE) mentat-alt.cesnet.cz: Write script for quick switching of most important configurations.
    1. Send various configuration files to source server and fetch corresponding ones from it.
    2. Configurations like server certificates, shibboleth configurations, Warden client configurations, etc.
    3. Script is called /root/mentat-sync-config.sh.
    4. Verified, that script is working properly.
  8. (DONE) Perform initial database migration to be more efficient and less time consuming later.
    1. Fetched current database with /root/mentat-sync-mongodb.sh.
    2. Launch migration with /etc/mentat/scripts/sqldb-migrate-data.py --drop in tmux terminal.

Actual migration process

The migration was initiated at 2018-07-27T13:00:00+0200, so any timestamp values were relevant to that date.
# Disable utility migration scripts that were installed before so that they do not mess with migration process.

root@mentat-alt$ /root/mentat-sync-config.sh
root@mentat-alt$ /root/mentat-sync-files.sh
root@mentat-alt$ /root/mentat-sync-mongodb.sh
root@mentat-alt$ /etc/mentat/scripts/sqldb-migrate-data.py --clear --from-timestamp 1532304000

2018-07-24 13:59:48,096 sqldb-migrate-data.py INFO: Data migration results:
2018-07-24 13:59:48,096 sqldb-migrate-data.py INFO: --------------------------------------------------
2018-07-24 13:59:48,102 sqldb-migrate-data.py INFO: User count:                       133
2018-07-24 13:59:48,105 sqldb-migrate-data.py INFO: Group count:                      295
2018-07-24 13:59:48,107 sqldb-migrate-data.py INFO: Network count:                  1,833
2018-07-24 13:59:48,110 sqldb-migrate-data.py INFO: Filter count:                      55
2018-07-24 13:59:48,114 sqldb-migrate-data.py INFO: Setting count:                    295
2018-07-24 13:59:48,157 sqldb-migrate-data.py INFO: Event reports count:          163,900
2018-07-24 13:59:48,240 sqldb-migrate-data.py INFO: Event stats count:            375,573
2018-07-24 13:59:48,240 sqldb-migrate-data.py INFO: --------------------------------------------------
2018-07-24 13:59:48,240 sqldb-migrate-data.py INFO: Migration started at:  2018-07-24 13:15:49.078338
2018-07-24 13:59:48,240 sqldb-migrate-data.py INFO: Migration finished at: 2018-07-24 13:59:48.096645
2018-07-24 13:59:48,240 sqldb-migrate-data.py INFO: Migration duration:    0:43:59.018307

root@mentat-alt$ psql -f mentat-tweakdb.sql mentat_main

root@mentat-alt$ /root/system-rename.sh

# Configure all Mentat modules by comparing configuration files form old production server.
# Configure all Warden modules, switch warden client certificates.
# Reconfigure IP address settings to values of mentat-hub.cesnet.cz.

root@mentat-alt$ reboot

After rebooting bring the whole system back up:

root@mentat-alt$ mentat-controller.py --command start
root@mentat-alt$ mentat-controller.py --command enable
root@mentat-alt$ /etc/init.d/warden_filer_receiver start
root@mentat-alt$ /etc/init.d/warden_filer_sender start
root@mentat-alt$ update-rc.d warden_filer_receiver defaults
root@mentat-alt$ update-rc.d warden_filer_sender defaults


Related issues

Related to Mentat - Task #3752: Migration from MongoDB to PostgreSQL Closed 10/10/2017
Related to Mentat - Task #3734: Migrate Hawat web user inteface from Perl-base to Python-... Closed
Related to Mentat - Task #3374: Migrate all core modules from legacy Mentat Closed 03/21/2017
Related to Mentat - Task #4210: Release and deploy Mentat package version 2.0 Closed 07/27/2018 07/30/2018

Associated revisions

Revision ece03c44
Added by Jan Mach about 1 year ago

Further improvements in MongoDB → PostgreSQL migration script.

  • Replaced simple prints with Python logging framework.
  • Added command line argument to start migration with given UTC timestamp (reports and statistics).
  • Added command line argument for better duplicate ignoration.
  • Output text improvements for better readability.

(Redmine issue: #3362)

History

#1 Updated by Jan Mach over 1 year ago

  • Status changed from New to In Progress
  • Priority changed from Low to High

Development version of Mentat system is installed on new hardware. Currently it is being used for debugging and testing purposes before releasing new stable version. Database and filesystem migration scripts are ready, but might need one more revision.

#2 Updated by Jan Mach over 1 year ago

  • Related to Task #3752: Migration from MongoDB to PostgreSQL added

#3 Updated by Jan Mach over 1 year ago

  • Related to Task #3734: Migrate Hawat web user inteface from Perl-base to Python-based Mentat framework added

#4 Updated by Jan Mach over 1 year ago

  • Related to Task #3374: Migrate all core modules from legacy Mentat added

#5 Updated by Jan Mach over 1 year ago

  • Description updated (diff)
  • Status changed from In Progress to Feedback
  • Assignee changed from Jan Mach to Pavel Kácha
  • % Done changed from 0 to 30

#6 Updated by Pavel Kácha over 1 year ago

  • Assignee changed from Pavel Kácha to Jan Mach

The actual process of migration will be done according to the following checklist:

Hint: Set short (~minutes) TTL on all related A/AAAA/CNAME/PTR RRs.

  1. Presynchronize filesystem data (rsync), so that the actual migration will be much quicker.

Except db perhaps?

  1. Shut down Mentat and Warden systems on mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers

Hint: Disable automatic start of whatever does state changes - warden-filer, cron scripts, automatic downloads, etc.
Hint: Also disable start of Mentat itself...

  1. Perform database migration

So real migration of data or just run with month of already saved data? (No hard opinion here, we can import older data later if we find it important.)

  1. Perform filesystem migration

rsync again? Or do you mean something else?

  1. Perform configuration migration
  2. Switch warden client certificates between mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  3. Switch hostnames and IP addresses between mentat-hub.cesnet.cz and mentat-alt.cesnet.cz servers
  4. Reboot both servers and pray to your favorite god, or as an atheist sit quietly with your hands in your lap

Ph’nglui mglw’nafh Cthulhu R’lyeh wgah’nagl fhtagn!

  1. Login to new mentat-hub.cesnet.cz, launch all services

Hint: If only basic system started automatically, daemons start can be tested by hand from the end (starting from storage), and data inflow (warden-filer) and disruptive scripts can be started only when everything is checked as ok.

#7 Updated by Jan Mach over 1 year ago

  • Description updated (diff)
  • Status changed from Feedback to In Progress

#8 Updated by REST Automat Admin about 1 year ago

  • Description updated (diff)

#9 Updated by REST Automat Admin about 1 year ago

Remarks regarding database migration

MongoDB database dump on server mentat-hub.cesnet.cz:

  • /root/mentatdb-dump-all.sh (dump script for Mentat databases)

MongoDB database restore on server mentat-alt.cesnet.cz:

  • /root/mentat-sync-db.sh (executed regularly at 8am by cron to verify functionality)

MongoDB → PostgreSQL database migration on server mentat-alt.cesnet.cz:

  • /etc/mentat/scripts/sqldb-migrate-data.py (migrate metadata database containing users, groups, reports, statistics, etc.)
  • /etc/mentat/scripts/sqldb-migrate-events.py (migrate IDEA messages, might not be necessary)

At this point database migration should be ready.

Remarks regarding data migration

Migrate data:

  • rsync --archive --update --delete --progress /var/mentat root@target:/var

Cleanup runlogs and logs (might cause issue with new version):

  • find /var/mentat/log -name=*.log* -delete
  • find /var/mentat/run -name=*.runlog -delete
  • find /var/mentat/run -name=*.pstate -delete
  • find /var/mentat/run -name=*.state -delete

#10 Updated by Pavel Kácha about 1 year ago

REST Automat Admin wrote:

At this point database migration should be ready.

should implies it might not. What if something goes awry?

Cleanup runlogs and logs (might cause issue with new version):

What issue? Something critical?

#11 Updated by Jan Mach about 1 year ago

Pavel Kácha wrote:

REST Automat Admin wrote:

At this point database migration should be ready.

should implies it might not. What if something goes awry?

You can never be 100% sure I have tested that many many times, so that the should is as close to will as possible .

Cleanup runlogs and logs (might cause issue with new version):

What issue? Something critical?

Some modules have additional runlog attributes. Everything is written with backwards compatibility in mind, but some really old runlogs could cause problems. However these problems will only show when evaluating runlogs using --action=runlogs-evaluate module action. So these possible problems are not critical, they just make the deloper look bad.

#12 Updated by Jan Mach about 1 year ago

New Mentat installation guide in official documentation:

https://alchemist.cesnet.cz/mentat/doc/development/html/_doclib/installation.html

New Mentat migration guide in official documentation:

https://alchemist.cesnet.cz/mentat/doc/development/html/_doclib/migration.html

New Mentat reporting guide in official documentation:

https://alchemist.cesnet.cz/mentat/doc/development/html/_doclib/reporting.html

#13 Updated by Jan Mach about 1 year ago

  • Description updated (diff)

#14 Updated by Jan Mach about 1 year ago

  • Description updated (diff)

#15 Updated by Jan Mach about 1 year ago

  • Description updated (diff)

#16 Updated by Jan Mach about 1 year ago

  • Description updated (diff)

#17 Updated by Jan Mach about 1 year ago

  • Description updated (diff)
  • Category changed from Installation to Documentation

#18 Updated by Jan Mach about 1 year ago

  • Description updated (diff)

#19 Updated by Jan Mach about 1 year ago

  • Description updated (diff)

#20 Updated by Jan Mach about 1 year ago

  • Category changed from Documentation to Installation
  • Status changed from In Progress to Feedback
  • % Done changed from 30 to 100

Migration was successfully performed on 24.7. 2018. Waiting for any feedback from users before closing as successfull.

#21 Updated by Jan Mach about 1 year ago

  • Related to Task #4210: Release and deploy Mentat package version 2.0 added

#22 Updated by Jan Mach about 1 year ago

  • Status changed from Feedback to In Progress
  • All Ansible roles related to Mentat server management were improved and polished.
  • Automated build system Alchemist received big overhaul and is now back online. It provides building packages of newly introduced release suite, which is something in between of development and production. This is going to enable us test the Mentat code in our production environment before releasing it as true production level code.
  • I am now waiting for confirmation from the manager of our monitoring system based on Nagios, that he updated the monitoring configuration according to new requirements.

#23 Updated by Jan Mach about 1 year ago

  • Status changed from In Progress to Closed

Migration complete, all Nagios monitoring scripts are fixed, up and running. Closing issue as resolved, this also completes the work on version 2.0.

Also available in: Atom PDF