Mangrove
Mangrove (formerly: MDL Standard) is a project to enable "write once, compute anywhere" for metric definitions. Rather than writing separate queries for Kusto, Scope, and Spark, users should be able to write a metric definition once, in a language-agnostic way:
Avg(Sum<User>(Revenue))
which will compile to trustworthy queries in all the supported compute fabrics.
Here are some helpful links for getting started.
- Articles, especially how to use Blitz.
- Request for Comment, documents explaining the motivation and design decisions in both past and planned future work.
- API Documentation, generated from the codebase using DocFX.
- Release Notes, announcing important milestones.
- Codebase
- This page: aka.ms/mangrove/docs.
Philosophy
There are some operations which are "universal" across most big data compute fabrics. For example, the operation "one string contains another" is supported by U-SQL (known internally as Scope), Apache Spark, Spark SQL, and Azure Data Explorer (known internally as Kusto). The operation may be written differently in all of those, but it conceptually makes sense in them all. Many key aggregations, like sum or maximum, also make sense in them all.
Mangrove is an effort to provide a simple standard for metric definition, which will allow a simple definition like
Avg(Sum<User>(Revenue))
to be compiled into queries supported by the specific compute fabrics. Moreover, if the data underlying a metric definition lives in one compute fabric (e.g., Cosmos) which isn't well suited for fast compute, Mangrove will provide a way to seamlessly cache intermediate expressions between different compute fabrics. Two key existing examples: the Hypercube Project and MetricNet.