Replacement plan for xForay

Owner	Approvers	Participants
Daniel Miller	Craig Boucher, Venky Venkateshaiah	A&E Analysts, ACE v-team

Both the initial Mangrove Initiative Review and the more recent Samurai Review called out flaws in xForay as being one of key motivations for building Mangrove. However, when those documents were written, the Mangrove codebase was not functional enough to seriously consider immediately replacing xForay. Now, Mangrove is being used to generate code in A&E's shadow mode pipeline, as well as its Azure Exp (Spark) pipeline, with a clear plan towards Mangrove completely taking over Spark, Kusto, and Scope code-generation from the predecessors. So a plan for deprecating xForay is necessary. This RFC (Request For Comment) provides a set of required use cases this replacement needs to support, a proposed design for that replacement, along with a discussion of some of the considered alternatives and any open concerns.

Since this document is proposing a radical re-architecture of xForay, it does not include a summary of the architecture of xForay—only the use cases we need to migrate over.

Requirements

This section outlines some of the use cases that Blitz extensions must be able to support. It is not intended to be comprehensive. There will likely be new use cases not anticipated by this document.

Note: this document uses the terms "Blitz" and "Mangrove" roughly interchangeably. Technically, Mangrove is the entire codebase including C# APIs, and Blitz is a command-line wrapper for the core code-generation functionality in Mangrove. So to be very precise, this document recommends that contributors add to the Mangrove codebase, and those additions will change / improve the functionality of Blitz, but for most users, that distinction is moot.

Interaction Detection

Representative: Jonathan Litz.

We get the flight column from xForay and parse it to get a list of flights for each user. We cross apply the flight list for each user twice and de-duplicate to create two columns containing each pairs of flights a user was in (one pair of flights per row). We then compute the Foray output using one flight from the pair as FLIGHT and the other as SEGMENT_VALUE. (See lines 71–120 in the Avocado task Bing interaction detection.) We then do a bunch of computations with the output (lines 122 onward).

Metric Health

Representative: Carl Mitchell.

We extract the "user-level table" for each metric in a metric set. For example, if a metric is defined as STAVG(col), we extract a rowset with one row per user with that user's UserId and their value of "col". We then either:

(slow and inefficient) Process that rowset with the SeedFinderSegmentProcessor and aggregate up to the flight level (i.e. compute the STAVG for each metric-segment pair), or
(faster) Create hundreds of copies of that rowset with hashed UserIds with different seeds as dummy flight assignment, then aggregate each of these copies to the flight level (compute STAVG) and UNION ALL the copies back together.

Sync Offline and Online Pipeline

Representative: Kaska Adoteye.

The Windows 10 QF team, as well as others, maintain a pipeline for offline data. This data is then used to compute metrics that are used for their dashboards, as well as to train their ranking models. This has them redefining metrics that are already defined for the online ExP scorecards. Also, since their data pipeline is different from what is used in the scorecard, the values they get can differ. Here, we use xForay as an easy to use way for them to extract row level events, as well as the full metrics, exactly as they're used to generate the scorecard so that there is no discrepancy between the scorecard and their model training or dashboards.

Variance calculation validation

Representative: Pawel Janowski.

The variance for some metrics will be underestimated if standard variance calculation is done. In this case delta method should be use to calculate variance. For each aggregation level, we extract the table on which the metrics would be calculated. We randomly sample with replacement to obtain a table with same number of rows but randomly sampled. Next we continue the job calculating the metrics from that table. Store the variances. Now repeat 200 times. We now have 200 bootstrapped variances. Take the mean of those and check if it agrees with the metric variance reported in the scorecard.

Empty scorecard debugging

Representative: Thomas Portet.

Having the option to remove the scorecard filters from the WHERE clause of the first select statement, and expose each filter as an individual column instead. I've found myself doing this manually multiple times: when a scorecard has unexpected traffic volume (either completely empty, or SRM) first step is generally to figure out if a filter in particular is excluding a lot of data.

Being able to easily move the filters from the WHERE clause to individual columns would greatly help the person investigating the unexpected traffic volume, and could also be very helpful for building some sort of "automatic empty/SRM scorecard debugger tool".

Big Movers

Representative: Widad Machmouchi.

Choose a small list of metrics along with a high-cardinality segment, compute a scorecard with those metrics and segment. Compute p-values and FDR to identify the most interesting segment.

Proposal

This section gives a high-level design proposal for giving Mangrove the ability to replace xForay. It does not specify all the implementation details, nor is it intended to specify the answers to questions best answered by the implementer. The proposal here is loosely inspired by the .NET Core Global Tool framework.

Add an interface:

namespace Mangrove.Coordinator.Extensions
{
  public interface ICoordinatorExtension
  {
    (MetricsPlan, ComputationConfig) Transform(MetricsPlan plan, ComputationConfig config);
  }
}

in the Coordinator project. Also, add a parameter Extension to the ComputationConfig class. When a value is supplied, use reflection to look up the implementation of ICoordinatorExtension whose name is the provided value. Then, before constructing or running the main part of MasterCoordinator, first run the resolved instance of ICoordinatorExtension, and then use the resulting ComputationConfig and MetricsPlan in the rest of the master coordinator's logic.

Default behavior would be implemented by a NoOp coordinator extension which returned back the same metrics plan and configuration objects.

Unlike the ForayApp model, where each ForayApp had (in theory) a well-defined owner, every extension in the Mangrove codebase will be production code which is supported by the Mangrove team. This means the bar on code quality will be quite high.

Workflow

This subsection outlines the workflow an analyst / dev who wished to add an extension to Blitz would follow.

Create a local clone and private branch of the Mangrove repository.
- Fortunately, the only dependencies the Mangrove codebase has are: latest versions of Visual Studio and the .NET Core SDK.
Open Mangrove.sln in the latest version of Visual Studio.
Create a new class in the Extensions folder of the Coordinator project.
Have your class implement the ICoordinatorExtension interface.
Give your implementation to the Transform method, following the other extensions for examples.
Using your local copy of Blitz, make sure your extension works as expected. Bonus points for creating unit tests for your extension at this stage.
Create a pull request of your branch against master. It will be reviewed via the same process as the rest of our production codebase, and changes to your code + unit tests will likely be required before your pull request is approved.
Use your extension in an analysis request! Or, use Blitz to generate a Scope script using your extension, and submit it yourself.

Pipeline Integration

The analysis request object model should be extended to allow an extension name to be passed in. For extensions producing output that is not xCard-compatible, the analysis request should also plug into the planned ExP Insights Compute Framework (alternate doc: Why ACE?). Note: this would require the ExP Insights framework to be able to consume data which is not necessarily xCard-compatible, a slight extension of their current design. In the interim, the document Scope Script as an Input linked to from this Æther wiki page explains how dynamically-generated Scope scripts may be run in Æther modules. A ForayApp could call Foray as a Service to generate a Scope script, then run it via Æther.

The Mangrove "split" command for turnkey caching, outlined in RFC #5, could be potentially implemented as a pair of extensions, one to produce the "cache" component and another to produce the "consume" component of the split MetricsPlan.

Use Cases

This section gives a brief walkthrough of each of the uses cases from the requirements section, outlining how they could be (not necessarily will be) implemented in this framework.

Interaction Detection. This would be a bit tricky, as cross apply is not currently a "native Mangrove" concept. For Scope and Kusto, it could be implemented by creating extern nodes capturing the appropriate fabric-specific expressions. The rest of the logic (scorecard computation using specific flight and segment columns) would be simple and robust.
Metric Health. This is a perfect example of the sort of logic which should easily transfer to the new model. "Cutting off" metrics computation at the user level is simple (just follow syntax trees down their first aggregation), as is creating copies of a table and then unioning them back together.
Sync Offline and Online Pipeline. This should also be straightforward to transfer over. For row-level events, it should ideally be a special case of the turnkey cache, tuned to "split" the MetricsPlan at a specific table. For metric values, it just means not turning on the "calculate variance" flag in the configuration file.
Variance calculation validation. Depending on the precise logic here, it might be a bit tricky to port this over.
Empty scorecard debugging. Find all Filter nodes and replace the Filter object with a ColumnReference, and re-use the names of those created ColumnReference nodes as segments.
Big Movers. The core logic of computing a few metrics against a high-cardinality should be easy to port. The stats calculation at the end might be a bit subtle, but should be easier if done before any unpivoting is carried out.

Concerns

This section outlines some of the potential concerns / risks of this proposal.

Difficult to contribute

Someone writing an extension would have to be able to write C# code to manipulate syntax trees in potentially non-trivial ways. Since the visitor pattern is somewhat sophisticated, this might be a bit difficult.

This difficulty would be mitigated by several factors:

The Mangrove codebase provides a suite of "helper" base classes and methods to make writing logic for modifying syntax trees relatively easy.
The Mangrove team should provide a couple "noob examples" of extension implementations for new contributors to follow.
- The proposed "noob examples" are:
  - Cut off MetricsPlan at data source extraction level.
  - Cut off MetricsPlan at top non-aggregation level.
  - Cut off MetricsPlan at per-use metrics computation level.
  - Don't unpivot (all this does is modify the configuration object).
Expand the introduction article with a more detailed set of recommendations on the main classes to use (Change*, Fluent*, etc.) and a list of good examples to follow.
Not only the "helper" base classes and methods are well-documented, but the extensions themselves will be required to have careful docstrings explaining what the extension does, and how to use it.

Name conflicts

What if extension names aren't unique? For example, what if someone creates two extensions MyExtension and myExtension. First, good engineering practise (more pertinently, code review for the Mangrove codebase) should catch that. Second, it could easily be enforced by a unit test using reflection. Third, it seems quite unlikely that extension name conflict will ever be a real problem.

Scary to contribute

Since it is expected that non-developers (hopefully, people who are not even in the Analysis & Experimentation team) will contribute extensions, there needs to be a clear set of expectations and processes for new contributors to follow.

Fortunately, the CONTRIBUTING.md and pull_request_template.md files explain pretty clearly how to contribute to the Mangrove codebase, and how to describe your code change when you're ready to actually make a pull request.

Alternatives

This section outlines some of the alternatives we considered before arriving on the proposal in this document.

Per-fabric implementation

One possibility is to implement and support a small "whitelist" of xForay-style operations in each compute fabric. For example, the Hypercube Engine already supports "compute metric value per-user" as a first-class operation. This approach has two downsides:

It requires per-fabric implementation of each operation. This is O(operations × fabrics ) to implement.
It is not nearly as expressive as allowing arbitrary manipulation of syntax trees.

If Mangrove extensions are implemented properly, some basic Mangrove operations (e.g. "compute metrics up to a user-level") could be re-used across multiple extension which carry out more sophisticated manipulations of the syntax trees. This follows a similar pattern to how implementations of the IMetricsPlanTransformer interface may re-use logic by calling each other.

Multiple extensions in a single config

Another possibility is to allow users to pass in an ordered sequence of extension names in the config, and having the master coordinator run those in sequence. We are choosing not to implement this, as some sequences of extensions are not really meaningful (e.g., first "cut off" the syntax trees at user-level, then do variance-calculation validation). It is better to force users to explicitly create an extension for meaningful extension sequences, allowing that sequence to be unit tested and code reviewed.