In the micro-services architecture, when we persist data, sometimes we keep referenced entities ids to another micro service’s DB.
This can be considered as side effect of the architecture.
lets describe a scenario.
Service S1 has collection C1 with a referenced id to Service S2 on collection C2.
We want to find all documents that has a referenced value and get some information about them.
Cool, so lets query the data by this field.
That will work pretty well for a small collection of documents.
But what if we have embedded documents consisting referenced ids as well? we need to find them too !
We are still good with our naive query.
We live in the real world.
Collections usually have large amount of documents, naive query is likely to produce performance issues and a delay in fetching our data.
When we fetch our information, we want to get into the relevant documents only,quickly.
We can index our field and get access only to the relevant documents when querying the data.
But we have embedded documents! Mongo indexing can deal with that too.
That can solve most of our problems, but when we deal with unstructured data, its unstructured indeed.
Indexing is an idea from structured databases, therefore it expects the schema structure to be known upon indexing.
I know my structure, i’m good.
Well you might, but fetching the referring document is enough for you?
If so, you are fine, but usually we want to know some more information about the documents we just found, and that can be found on another collection in our service.
That means that we need to do another query on our fetched data (performance!).
Mapping enables us more flexibility and keeps only the data we really need within reach.
Also it enables to be less depended on strict data structure. we can have a data structure commonly used by most components in our system, and add functionality for special components that has different , or extended structure.
On each document persist, we want to store our mapping in a new collection,
By saving to new collection, we minimize the coupling and keep us micro-service oriented.
on the mapping document we save only the data we really need from service S1 among with our referenced ids from service S2.
If we need more information from another collection, this is our chance. take whatever you need and keep it aside on the mapping document.
So when we want to query for the information, we fetch the data by the referenced ids from the mapping collection, and we have only what we need, quickly.
But we duplicate the data !
Correct, but we stay decoupled,increasing performance and availability percentage.
Ok,this is what i need, how i do it?
First we need to change our persist mechanism.
When we persist our data, we need to create additional document in the mapping collection.
In most cases our mapping logic will be the same, so we need to write it in one place where we inherit common logic.
But i have many components and inheritance structure is not consistent.
So, standardize your inheritance
or write a new module that implements this logic.
Write it once, and require it in each of your model implementations.
I used observer design pattern to solve this issue.
By this, we don’t need to refactor anything and have 100% coverage of our models, by definition.
The observer is called when a model ‘Save’ event is triggered and do all the work after model is persisted.
By this, we have a single point of logic that is always called and commonly used.
But we talked about being more flexible, right?
While most of our models will share the mapping logic, some models have special case dependencies that our common logic doesn’t handle.
In this case, after our observer logic finished inferring the dependencies, we can trigger additional functionality on the model ( if exists) and extend our ready set of dependencies.
In order to stay decoupled , I created a separated API that handles the mapping logic over a new DB collection.
Most programming languages have support of observer design pattern and some have implemented ones.