Feature #4273
openConsider/choose/implement different communication protocol
0%
Description
Filer communication protocol serves well and is simple enough. However it has its limitations and might make sense to pursue different direction, so this issue is meant for review/discussion. Also, this need not necessarily mean complete replacement, Mentat can happily support multiple protocols for different situations if that makes sense.
Filer protocol deficiencies:
- too big or too many events may hit disk and cause trashing because of interference between different daemon queues and db disk access
- does not support inter-machine communication (ok, not easily)
- although easy, is nonstandard
New chosen protocol(s) should:
- be memory based to prevent potential disk trashing
- support both efficient local and network communication
- be at least somewhat standard
- be comparatively performing to current solution
- broker (if applicable/used) should be small and lightweight, based on sane language/platform
Related issues
Updated by Pavel Kácha over 6 years ago
- Assignee deleted (
Jan Mach)
Some contenders in no particular order:
- http://zeromq.org/ (https://github.com/zeromq/pyzmq)
- https://stomp.github.io/
- http://mqtt.org/
- http://www.amqp.org/
(RabbitMQ supports also Stomp and MQTT.)
Updated by Pavel Kácha over 6 years ago
- Related to Bug #4261: Shorten events/search output in case of too long events added
Updated by Pavel Kácha over 6 years ago
- Related to Bug #4253: Handling of too big events added
Updated by Pavel Kácha over 6 years ago
- Related to deleted (Bug #4261: Shorten events/search output in case of too long events)
Updated by Pavel Kácha almost 5 years ago
Some additions:
- https://nanomsg.org/ (successor of ZeroMQ, same author)
- https://redis.io/ (supports push/pop, used by IntelMQ)
Heretic thought:
- Message queue within PostgreSQL, while we're already running it..?
Updated by Radko Krkoš almost 5 years ago
Pavel Kácha wrote:
Heretic thought:
- Message queue within PostgreSQL, while we're already running it..?
My intuitive reaction would be a strict NO, as that would never work. But then a simple search provides this: https://gist.github.com/chanks/7585810
Updated by Radko Krkoš almost 5 years ago
Then, if we want to use a SQL DB, we could use sqlite
. According to [1], it should work fine. The communication is on localhost only, there is just one writer as it is a pipeline (so exclusive write lock is not an issue), the amount of blocks should be lower than with files (and you can still use caching) - but lower than with PostgreSQL - no WAL.
Updated by Pavel Kácha almost 5 years ago
Radko Krkoš wrote:
Then, if we want to use a SQL DB, we could use
sqlite
. According to [1], it should work fine. The communication is on localhost only, there is just one writer as it is a pipeline (so exclusive write lock is not an issue), the amount of blocks should be lower than with files (and you can still use caching) - but lower than with PostgreSQL - no WAL.
Sure, we could use all sorts of things. PQ just came to my mind as thing we are already heavily use, and also, you can use different "queues" in compeletely different dbs (and machines, when we are at it). However, should I consider lighter db, I'd go for (for example) Berkeley DB, which is much simpler, solves locking, and provides queueing primitives itself.
Updated by Jan Mach almost 5 years ago
I think, that the goal of using some better queue mechanism is also to move to the producer-consumer design pattern, use subscriptions, simplify configuration and enable arbitrary module ordering/coupling/chaining. This will be considerable amount of work, so I think the payout should be worth it. And it should definitely enable process level paralelism, single processes are not enough...we are already running five enrichers to speed up the event enrichment process.
We can of course discuss this at length, but I think, that there is no point of using some overly simple tool and then end up implementing all the safety features yourself on top of that. And we would implement them eventually, we need safe communication protocol without the data loss.
Updated by Pavel Kácha almost 5 years ago
Jan Mach wrote:
I think, that the goal of using some better queue mechanism is also to move to the producer-consumer design pattern
Maybe.
, use subscriptions,
Why? Seems not relevant to me. But I may be wrong.
simplify configuration and enable arbitrary module ordering/coupling/chaining.
Maybe, however not sure about configuration simplification, as I hoped for minimal changes.
This will be considerable amount of work, so I think the payout should be worth it. And it should definitely enable process level paralelism, single processes are not enough...we are already running five enrichers to speed up the event enrichment process.
Maybe.
Do mentioned "simple" tools preclude us from any of these?
We can of course discuss this at length, but I think, that there is no point of using some overly simple tool and then end up implementing all the safety features yourself on top of that. And we would implement them eventually, we need safe communication protocol without the data loss.
And safe we do NOT have now. That's why I'm thinking about something simple enough to replace current state with relative ease.
Updated by Pavel Kácha over 4 years ago
Couple of forgotten notes from some old discussion:
- do we want memory based one (to avoid disk trashing)?
- do we really want network based one (for possibility of running some daemons on separate machine)?
- do we want broker based one (may bring own ready to use ecosystem and tooling)?
- do we want centralised one (brings single point of failure)?
- do we want some standard protocol based one (brings possible remote interoperability)?