How to design a proactive monitoring system?

Tags: , , , ,

This is a vague question on design. I have microservice which performs order management. The service orchestrates every order from Placed to Delivered. A lot of things happening in between. Let say these are statuses an order can be.

  1. Placed
  2. Authorized
  3. Shipped
  4. Delivered

I have an elastic search dashboard which visualizes if an order stuck in particular status and not moving forward – This is a kind of reactive approach. I want to design a monitoring subsystem which actually monitors every order placed in the system is moving to next status within the SLA configured.

The general idea would be to tag every order placed and have cron worker which checks if the order crossed the configured SLA for every status. But I’m thinking this won’t scale well if we have like 100k order placed in one single day the cron is not a better way of designing this kind of systems.

So how do people solve these design problems? Pointers to any existing approach / any idea is welcome.


You mentioned a microservice so I think the most “scalable” way of doing it while respecting a microservice architecture whould be to perform the monitoring in an asynchronous manner. If you don’t already have one, you could setup a Message Queueing service like Google PubSub or RabbitMQ. There are a lot of differents Message Queueing service out there with specific features and performance so you’d need to make some research to find the best fit to your use case.

Once you have setup your MQ service, your Order microservice would dispatch a message like { orderId: 12345, status: 'Authorized', timestamp: 1610118449538, whatEver: 'foo' }. That way this message could be consumed by any service registered to your specific topic (and also depending of the architecture of your MQ).

Then I would develop another microservice: the Monitoring microservice. This microservice would register to the topics dispatched by the Order microservice. This way it would be aware of any Order status changes and you could setup crons on your microservice to check i.e every 5min which orders you didn’t receive the message regarding their status change and act accordingly. This microservice could communicate with your ElasticSearch. I’d also recommend you’d mutualize as much as possible of the code managing the business logic regarding the orders status changes between the Order and Monitoring microservices. You could use private NPM packages. This way you are less likely to end up with business requirements mismatches between the two microservices.

Using a MQ service allows you to scale as much as needed because you can then horizontally scale your Monitoring and Order microservices. You’d need to handle some kind of lock/semaphore mechanism between the different instances of your Monitoring service though so you don’t handle the same message by multiple instances. In case of any microservices shutdown your queue would store the message to prevent data loss. Once back-up they can process the queued messages. You’d have to consider how to handle the downtime on your MQ service too.

Source: stackoverflow