Task #3752

Migration from MongoDB to PostgreSQL

Added by Jan Mach almost 2 years ago. Updated about 1 year ago.

Status:ClosedStart date:10/10/2017
Priority:NormalDue date:
Assignee:Jan Mach% Done:

100%

Category:Development - Core
Target version:2.0

Description

After many months of rigorous testing we have finally decided to migrate from MongoDB to PostgreSQL. The MongoDB performance and long response times are decreasing the usability of the web interface and even of the reporting subsystem. We will now abandon the MongoDB in favor of PostgreSQL.

The migration will occur in following phases:
  1. Define data models for utility objects (users, groups, etc.)
  2. Implement new storage service module for utility object management.
  3. Rewrite all current code working with utility objects.
  4. Implement new storage service for IDEA event management (maybe it will be possible to use the same service as above, maybe there will have to be different one, highly optimized and with lower overhead).
  5. Rewrite all current code working with IDEA messages through persistent database storage.

Related issues

Related to Mentat - Task #3362: Migrate Mentat system to new hardware Closed 03/21/2017
Related to Mentat - Bug #2808: Python and batch_size constraint Closed 03/01/2016

Associated revisions

Revision 9efd34c7
Added by Jan Mach almost 2 years ago

PostgreSQL migration: Implemented models of main datatypes for PostgreSQL.

Following models were implemented using the awesome SQLAlchemy library:

  • UserModel
  • GroupModel
  • NetworkModel
  • FilterModel
  • SettingsReportingModel

(Redmine issue: #3752)

Revision a3bddee3
Added by Jan Mach almost 2 years ago

PostgreSQL migration: Implemented prototype of SQL storage service to be used both in backend and web GUI modules.

(Redmine issue: #3752)

Revision 88d1edbf
Added by Jan Mach almost 2 years ago

PostgreSQL migration: Implemented prototype script for MongoDB to PostgreSQL data migration.

Script is capable of following data migrations:

  • user accounts
  • group account
  • network records
  • reporting filters
  • reporting settings

(Redmine issue: #3752)

Revision 7de3ad42
Added by Jan Mach almost 2 years ago

PostgreSQL migration: Reimplemented DBAL, authentication layer and user management modules in Hawat.

Base on the previous work the DBAL in Hawat was reimplemented to use the new mentat.services.sqlstorage module and sqlalchemy backend. The authentication layer was rewritten to use the new dabase interface. The user management plugin was rewritten to use new database interface.

(Redmine issue: #3752,#3734)

Revision 24a68103
Added by Jan Mach almost 2 years ago

Implemented prototype SQL-based whois service.

(Redmine issue: #3752)

Revision 9699402f
Added by Jan Mach almost 2 years ago

Reimplemented the whois service with the SQL backend.

The old whois service was using MongoDB as data backend. Old features were permanently removed and service was reimplemented with SQL backend. (Redmine issue: #3752)

Revision 430b0ee0
Added by Jan Mach over 1 year ago

Added time tracking to data and event migration scripts.

(Redmine issue: #3752)

Revision fc248306
Added by Jan Mach over 1 year ago

Fix: Fixed cascade configuration in Group - ReportingSettings relationship.

(Redmine issue: #3752)

Revision 01681c23
Added by Jan Mach over 1 year ago

Reimplemented mentat-netmngr.py to work with PostgreSQL instead of MongoDB.

This commit also contains following related improvements:

  • documentation enhancements
  • configuration file cleanup
  • runlog enhancement, better information about items created and deleted
  • code simplification, the status and update command now share the same code
  • coding style improvements using pylint tool

(Redmine issue: #3752)

Revision 32e3d3cb
Added by Jan Mach over 1 year ago

Reimplemented the mentat-backup.py module to work with PostgreSQL instead of MongoDB.

This commit contains also following updates:

  • improved module documentation
  • improved module coding style with pylint tool
  • improved configurability by putting remote_host and remote_dir to configuration options

(Redmine issue: #3752)

Revision 4dc9b0fb
Added by Jan Mach over 1 year ago

Changed PostgreSQL database model for IDEA events.

This commit contains following new features:

  • The PostgreSQL model for IDEA events now contains '(Source|Target|Node).Type’ and 'Description’ attributes. These are now represented as separate IDEA event table columns to enable better searching. This means, that the table schema changed !!!
  • The migration script and all necessary underlying functions had to be updated accordingly.
  • Minor facelift in output of migration script.

(Redmine issue: #3752)

Revision 2576a5b3
Added by Jan Mach over 1 year ago

Fix: Fixed bug in sqldb-migrate-events.py migration script.

(Redmine issue: #3752)

Revision 984b51e1
Added by Jan Mach over 1 year ago

Minor code and output improvements in sqldb-migrate-events.py migration script.

(Redmine issue: #3752)

Revision d0b0a2f2
Added by Jan Mach over 1 year ago

Next iteration in IDEA event search form design.

  • Added the ability to search according to the 'source|target|detector types’ and 'description’.
  • Added the ability to search according to the host address|ports|types. Host is an alias for Source OR Target.
  • Minor design improvements in main menu and other page widgets.
  • Updated mentat-precache.py according to changes mentioned above.
  • Added new 'allow_empty’ feature to mentat-precache.py.

(Redmine issue: #3752,#3734)

Revision 7c979b02
Added by Jan Mach over 1 year ago

Changed PostgreSQL data model for IDEA events.

Added new columns for '_CESNET.EventClass’, '_CESNET.EventSeverity’ and '_CESNET.InspectionErrors’ IDEA message attributes. (Redmine issue: #3752)

Revision 96582cd0
Added by Jan Mach over 1 year ago

Further minor improvements in sqldb-migrate-events.py script.

Improved documentation and output of the script. (Redmine issue: #3752)

Revision 778b3ef6
Added by Jan Mach over 1 year ago

Next iteration in IDEA event search form, result page and detail page design.

This commit contains following bulk changes:

  • The search form was extended to include event class and event severity search options.
  • Fields in event search form were reordered to more logical groups.
  • Event field activation buttons were grouped into toolbar.
  • Unified display of item search button inside event result table.
  • Improved background HTML and JS code for event searc result table.

(Redmine issue: #3752)

Revision e8d57fd8
Added by Jan Mach over 1 year ago

Improvements in MongoDB to PostgreSQL migration scripts.

Improved internal code of both migration scripts, lots of code was moved to shared libraries, improved documentation, coding style and command output. (Redmine issue: #3752)

Revision f9038885
Added by Jan Mach over 1 year ago

Polished design and implementation of whois and geoip Hawat pluggable modules.

(Redmine issue: #3752)

Revision beba2ff5
Added by Jan Mach over 1 year ago

Rewritten the mentat-cleanup.py module to work with PostgreSQL instead of MongoDB.

Part of the work was extending the eventstorage service to enable parametrized deletion of multiple events similar to searching. (Redmine issue: #3381,#3752)

Revision 362ad620
Added by Jan Mach over 1 year ago

Improved data migration script.

The data migration script was enhanced for quick and little bit dirty fix of invalid string character encoding when converting records from MongoDB to new PostgreSQL representation. The old implementation contained a bug and some strings were not stored in correct UTF-8 encoding. After this patch everything should be UTF-8 in PostgreSQL tables. (Redmine issue: #3752)

Revision fc632e4f
Added by Jan Mach over 1 year ago

CHANGE: Changed the datatype for storing IDEA messages within PostgreSQL database.

The JSONB datatype is not capable of storing IDEA messages containing null characters anywhere in the content. This is a feature, that will not be fixed anytime soon, so we had to change the datatype for storing IDEA messages from JSONB to BYTEA. We are losing the ability to use PostgreSQL tools for working with JSON structures. There is however no other way of solving this issue, because the IDEA messages are permitted to containg binary data. (Redmine issue: #3752)

Revision 06c7adb5
Added by Jan Mach over 1 year ago

Implemented new 'enabled’ attribute of user and group objects.

Instead of deleting them entirelly, in many cases it might be better just to mark user accounts and groups as disabled. This commit contains following relevant changes:
  • Updates in SQL data model for users and groups
  • New base views, base form and form templates for generic changes of items
  • Support for enable and disable actions in item forms and views (show,list)
  • Support for enabled/disabled accounts in authentication controllers (disabled accounts will not permit login)

(Redmine issue: #3752)

Revision 9da8ae5b
Added by Jan Mach over 1 year ago

Fix: Fixed the SQL data model to enable cascading deletes into the association tables.

It was not possible to clear the user accounts or groups from the tables due to the reference integrity constraints. This patch fixes that and adds '--clear’ option to data migration script. (Redmine issue: #3752)

Revision b8accc98
Added by Jan Mach over 1 year ago

Fix: Fixed the bug in searching for port numbers using event search form.

The PostgreSQL database complained, that it is unable to serach for integers using list of strings. Added explicit conversion from list of strings (received by user form) to list of integers. (Redmine issue: #3752,#3734)

Revision ec8a1cc1
Added by Jan Mach over 1 year ago

Changed sparse index for cesnet_resolvedabuses column for better performance.

(Redmine issue: #3752)

Revision fb689edf
Added by Jan Mach about 1 year ago

Fix: Fixed invalid cascading rules for database objects.

The cascading rules were invalid, for example SQLAlchemy prevented from deleting users with group memberships. Also cascading rules were defined on user*group association tables, which resulted in deleting all members and managers when group was deleted. It should be fixed now and unit tests were written to ensure that. (Redmine issue: #3752)

Revision 397479a5
Added by Jan Mach about 1 year ago

Improved exception propagation from underlying psycopg2 library through eventstorage library.

Internal psycopg2 exceptions are converted into custom exception classess to better encapsulate the actual implementation. (Redmine issue: #3752)

Revision 6e3b85db
Added by Jan Mach about 1 year ago

Fix: The IP4R PostgreSQL extension does not recognize certain network addressess.

User was presented with HTTP 500 error page without knowing what is actually wrong. Because the error is encountered at database level and the value passess form validation, it was necessary to analyze the exception thrown by database and present user with meaningfull message. For example 195.113.134.0/14 is not a valid network address from the point of view of IP4R extension, because there are more 1s than necessary and the extension does no adjusting on its own. This pathc solves only the error presentation to the user, actual solution stil has to be implemented. (Redmine issue: #3752,#3734)

Revision 217d20c6
Added by Jan Mach about 1 year ago

Improved conversion of 'enabled’ attribute for user accounts.

The 'enabled’ attribute is now set True or False depending on the presence of role 'user’ in the list of active user roles. (Redmine issue: #3752)

Revision 56eb954e
Added by Jan Mach about 1 year ago

Tweaked SQL database migration script to be able to delete reports table separatelly.

TODO: Truncation would be much faster. (Redmine issue: #3752)

E

History

#1 Updated by Pavel Kácha almost 2 years ago

Notes for future production migration:
  • both db engines on the same machine
  • user/group management + relevant data – can be migrated independently of event db
    • then management GUI
    • we will need migration script, which will allow reporter/statistician use Perl DB until also rewritten
  • statistician
    • simple incremental conversion script from Mongo to PQ + GUI, which already looks into PQ
    • then statistician can be written in python, switched and conversion script ditched
  • reporter
    • simple incremental conversion script from Mongo to PQ + GUI, which already looks into PQ
    • then reporter can be written in python, switched and conversion script ditched
  • incoming events forked into both old (Mongo) storage and new (PQ) storage daemons (events go simultaneously into both DBs)
    • then GUI can be written independently

#2 Updated by Jan Mach over 1 year ago

  • Related to Task #3362: Migrate Mentat system to new hardware added

#3 Updated by Jan Mach about 1 year ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

On tuesday 24.7. 2018 we have successfully migrated our production instance of Mentat system to version of Mentat system that is completely written in Python and uses PostgreSQL as database backend. Therefore this task can be considered complete.

#4 Updated by Jan Mach about 1 year ago

  • Related to Bug #2808: Python and batch_size constraint added

Also available in: Atom PDF