Who uses MongoDB for event sourcing

Golo Roden 9

Event sourcing is also regularly discussed in connection with CQRS and Domain-Driven Design (DDD). The concept describes the storage of business events from which the current status of the application can be calculated. How does the underlying database, the so-called event store, work?

Event sourcing is a relatively simple concept that is used to store the state of the application. In contrast to the conventional approach of saving the status directly in a relational database, with event sourcing, technical events are collected in a long list. This is stored in a database called the event store.

The most noticeable difference to the conventional approach is that the stored events are no longer changed or even deleted. The event store serves as an append-only data store. The treasure trove of data therefore continues to grow over time.

What at first sounds like a waste of storage space opens up completely unexpected advantages in practice: If you save the technical events that led to you instead of the current status, you get completely new options for evaluating such events. The historical data is practically available en passer-by, so that, for example, the analysis of time series can be carried out extremely easily.

Connections between different technical events can also be searched, which may lead to new insights into the underlying domain. The special thing about it is that it even works in retrospect, since the historical data is available.

Event sourcing in practice

In addition, event sourcing is a concept that has been used successfully in practice for many years. Every bank that manages accounts does the job in this way: instead of saving the current balance, only incoming and outgoing payments are noted. The current balance can then be calculated from them - or at any point in time in the past. The process is called replay.

For example, if an account was first opened with a deposit of 500 euros, then another 200 euros were deposited, and then 300 euros were debited, the following calculation results:

500 (deposit)
+ 200 (deposit)
- 300 (payout)
---
= 400 (balance)

Every incoming or outgoing payment represents a technical event. These are facts that cannot be reversed: They happened. If a booking was made incorrectly, only a counter-transaction can be generated that cancels the effect.

However, the technical events must first be modeled so that they map the domain of the application. Domain-Driven Design (DDD), an approach to technical modeling, is ideal for this purpose. Event sourcing and DDD are not necessarily linked, but they are an excellent match and complement each other perfectly, which is why the combination makes sense.

Speed ​​up the replay

If only events are added to the event store, the treasure trove of data continues to grow. This is desirable from a technical point of view, but poses a problem from a technical point of view. The more of these events are to be played during a replay, the more time-consuming the process and the slower the entire application becomes. However, it is easy to find a way out of the problem.

Since new events are always added to the end of the list and existing events are never changed, the calculated replay for a certain point in time always gives the same result. If one tries again to the analogy with the account management, that is logical: The account balance at a given point in time is always the same, regardless of whether there were deposits or withdrawals afterwards.

You can take advantage of this by occasionally saving the currently calculated state as a so-called snapshot. The entire history does not always have to be played on the way. In most cases, it is sufficient to start with the last snapshot and only look at the events that have been saved since then. Since a snapshot only supplements the history, but does not replace it, the older events are still available if they are required for an evaluation.

Define the data structure

In principle, any data source that can insert data at the end of a list and read out and search the entire list again can be used for the event store. This applies to relational and NoSQL databases as well as to simple text files that are stored in the file system. To do this, a format must first be defined in which the events are to be stored.

If a classic relational database is used, this means that the table schema must first be defined. Since every event relates to an object (or in the language context of DDD to an aggregate), its ID is necessary in order to be able to assign the event again afterwards. In addition, the sequence of events for an object must be clear, which is why an additional sort key is required.

In addition, the actual user data of the event must be saved, for example as a JSON or XML structure. The Event Store only ever accesses the events via their ID and the sort key, which is why a field of type is always used for the data blob or something similar is sufficient. Since the combination of aggregate ID and sort key must also be unique, it is also advisable to define a suitable table constraint. The basic table structure is therefore as follows:

CREATE TABLE IF NOT EXISTS "events" (
"aggregateId" uuid NOT NULL,
"revision" integer NOT NULL,
"event" jsonb NOT NULL,
CONSTRAINT "aggregateId_revision" UNIQUE ("aggregateId", "revision")
);

As a rule, a replay only takes place for a single object. However, if a client has lost the connection in the meantime, it can also happen that it has to request all events from a certain point onwards. Time stamps only work to a limited extent in distributed systems due to potential deviations between the systems, which is why a continuous ID, the so-called position. It can also serve as a primary key, so that the previous definition can be expanded as follows:

CREATE TABLE IF NOT EXISTS "events" (
"position" bigserial NOT NULL,
"aggregateId" uuid NOT NULL,
"revision" integer NOT NULL,
"event" jsonb NOT NULL,
CONSTRAINT "$ {this.namespace} _events_pk" PRIMARY KEY ("position"),
CONSTRAINT "aggregateId_revision" UNIQUE ("aggregateId", "revision")
);

It can also be useful to store the publication status of an event in order to be able to decide in case of doubt which events are already known to the clients and which are not. The field below is used for this hasBeenPublished, which, however, would not be absolutely necessary depending on the scenario.

Access the Event Store

There UPDATE and DELETE are not used, developers basically only have to make the calls INSERT and SELECT to implement. The open source module savings book implements such an event store for Node.js and supports PostgreSQL and MongoDB as databases as standard. PostgreSQL is the better performing choice. Take a look at the schema definition of the events-Table, it is noticeable that any event can actually be processed with a single scheme:

CREATE TABLE IF NOT EXISTS "$ {this.namespace} _events" (
"position" bigserial NOT NULL,
"aggregateId" uuid NOT NULL,
"revision" integer NOT NULL,
"event" jsonb NOT NULL,
"hasBeenPublished" boolean NOT NULL,
CONSTRAINT "$ {this.namespace} _events_pk" PRIMARY KEY ("position"),
CONSTRAINT "$ {this.namespace} _aggregateId_revision" UNIQUE ("aggregateId",
"revision")
);

The schema definition of the snapshots-Table, which also refers to the data type jsonb fall back:

CREATE TABLE IF NOT EXISTS "$ {this.namespace} _snapshots" (
"aggregateId" uuid NOT NULL,
"revision" integer NOT NULL,
"state" jsonb NOT NULL,
CONSTRAINT "$ {this.namespace} _snapshots_pk" PRIMARY KEY ("aggregateId",
"revision")
);

Now you can start implementing the functions for loading and saving events or for loading and saving snapshots. For performance reasons, the use of so-called prepared statements or stored procedures is recommended.

Use the Event Store

If you want to use the Event Store afterwards, you first have to use the module savings book to install. As usual with Node.js, this is done using the npm package management:

$ npm install passbook

In the code you can then choose which of the two databases you want to use by loading the respective sub-module - either savings book / postgres or savings book / mongodb:

const passbook = require ('passbook / postgres');

Either way, the instance must then be initialized, which internally takes care of creating the tables, among other things. That's what function comes for initialize to be used, to which the connection string and a namespace for the application must be passed:

savingsbook.initialize ({
url: '...',
namespace: 'myApp'
}, err => {
// ...
});

This is used to react to a disconnection disconnect-Event that can be subscribed to:

sparbuch.on ('disconnect', () => {
// ...
});

Disconnection by hand is also provided. This is what the destroy-Function:

sparbuch.destroy (() => {
// ...
});

Save events

The next step is to save events that have occurred. This is what the function is for saveEventsto which the corresponding events are to be passed as an array. Expected format for this savings book the format that the module commands-events pretends. In addition, there is only the revision of the property metadata to be set so that the sequence of the individual events for the desired object is clear.

const {Event} = require ('commands-events');

const event1 = new Event (...);
const event2 = new Event (...);

event1.metadata.revision = 1;
event2.metadata.revision = 2;

sparbuch.saveEvents ({events: [event1, event2]}, (err, savedEvents) => {
// ...
});

If only a single event is to be saved, the event can be specified directly instead of the array:

sparbuch.saveEvents ({events: event}, (err, savedEvents) => {
// ...
});

Load events

To load the events for an object, you only need its ID. This allows the function getEventStream call:

const id = '24d03f83-dd8d-4a11-93fa-1f3bba8ad3e6';

sparbuch.getEventStream (id, (err, eventStream) => {
// ...
});

Optionally, the range of events can also be limited, for example to only load the last events for an object. The restriction can be made with both a lower and an upper limit, including the properties fromRevision and toRevision serve:

const id = '24d03f83-dd8d-4a11-93fa-1f3bba8ad3e6';

sparbuch.getEventStream (id, {
fromRevision: 23,
toRevision: 42
}, (err, eventStream) => {
// ...
});

If, on the other hand, only the very last event is required, this can be done with the function getLastEvent determine:

const id = '24d03f83-dd8d-4a11-93fa-1f3bba8ad3e6';

sparbuch.getLastEvent (id, (err, event) => {
// ...
});

Access to snapshots is just as easy, this is what the functions are for saveSnapshot and getSnapshot.

Conclusion

In principle, implementing an event store is not difficult if you understand the basic concept and the optimization with the snapshots. Nevertheless, as is so often the case, the devil is in the details, especially when it comes to the efficiency of the individual queries. It is important to remember that they may be called very frequently, which is why you should avoid any unnecessary line of code. A finished module like savings book for Node.js can save a lot of work and help to avoid common mistakes. (jul)

Golo Roden
is the founder, CTO and managing director of the native web GmbH, a company specializing in native web technologies. For the development of modern web applications he prefers JavaScript and Node.js and with "Node.js & Co." wrote the first German-language book on the subject.

9 comments