
vunet built vusmartmaps (vsm) — an observability platform that financial institutions use to identify, diagnose, and resolve errors across their distributed components. in 2025, i was hired as a design-consultant, to work on ambiguous asks.
a fourth of what the product enabled people to do was create, consume & act on alerts. i worked for 3 months, to synthesise product & engineering work done over the last 3 years, and proposed a better system for alert-consumption & an industry-first approach to alert-creation.
vsm monitored things — individual computing components (such as routers, servers, load-balancers, et-cetera), applications that run on those components (such as a bank's log-in page, core-banking-system, et-cetera), and custom-defined business-journeys (comprising many components & applications).
through human-defined or machine-defined rules, vsm generated alerts for the above-mentioned things. when clicked on, an alert was expected to help a person resolve the issue quickly — by enabling faster escalation, helping with root-cause-analyses, and aiding in debugging.
due to the interdependencies in banking-networks, a single active alert (for a few minutes) could result in negative business-impact (and potential fines by authorities).
to avoid this, financial institutions hire external teams (such as vunet) to monitor, filter, and escalate alerts quickly to the relevant personnel (who then fix these issues).
people from these teams were having to deal with multiple alerts in a single minute, manually assessing impact, and jumping through windows (or tabs) to form root-cause-hypotheses. previous design-work had been put in to improve these workflows, but failed to provide a comprehensive solution that was, both, sellable & usable.
i first studied all the different kinds of alerts that vsm could possibly produce, and then proposed a unified system for consumption that aimed to reduce cognitive load.
next, i observed debugging-conversations in banks, and proposed an appropriate system to display an alert's information for all levels of problem escalation (monitoring person, engineer trying to fix it, and managers who wish to perform analyses).
due to the complex computing-architecture that indian banks often have, vsm had struggled in the past to offer accurate root-cause-hypotheses. so, i worked with engineering teams, understanding siloed work that had been put in, and suggested a unified mechanism to display hypotheses. we used correlations (grouping relevant alerts together), time-series data (to identify the first problem in a chain of events), and chose to dynamically fetch poor-performing granular metrics to help with the debugging of a chosen hypothesis.
via this proposal, the product-design team helped all engineering teams working on alerts to strive towards a common aspiration — reducing the possibility of siloed work happening in the future (that is then difficult to stitch together as a product).
while proposing a better system for consumption, shobhan (co-instigator on this project) & i realised that the system people used to configure alerts opened up many possibilities of creating 'bad' alerts (that are later reported as problems by people who consume them).
so, we got rid of mechanical norms that the industry used to set up alerts (via numbers & thresholds), and proposed a wysiwyg-editor (what-you-see-is-what-you-get-editor) to help people estimate future-frequency of alerts generated by a particular rule.