we will walk through approaches we’ve developed for measuring the maturity of our AI models and tracking management outcomes. (View Highlight)
We’ll start by exploring the way we think about AI model management (one of the core components of Meta’s broader AI governance strategy), then discuss how to move toward consistently defining concepts across authoring, training, and deploying AI model (View Highlight)
t Meta, we have developed measurement processes for specific metrics about AI systems that can be used to make managing models more effective and efficient, and we’ve tested these processes across a diverse ecosystem of tools and systems. (View Highlight)
we provide product teams with opportunities to match AI tooling to their needs. Some product teams use tightly coupled custom solutions to meet exceptional feature and scale requirements, while others benefit from the ease of use of simpler, more general tools. This variance allows for effective AI development and fast iterations, but it also adds complexity to model management. Vari (View Highlight)
The first step in assessing the artifacts of AI modeling is selecting the right component for each use case. To do so, it can be helpful to picture a general structure of model development, pictured below as it looks at Meta. (View Highlight)
Model source code is written by engineers to solve a given purpose (such as ranking posts), and then one or more model binaries are trained from this source, often being created periodically for either iterative model improvements or retrainin (View Highlight)
After selecting the model artifact to measure, additional complexity arises when artifacts with the same function have different labels in different tools (View Highlight)
, where system 1 calls the model source code the “Model,” with another term denoting the trained binaries, and system 2 considers binaries the “Model,” calling the model source code something else. If we were to select all “Models” from these systems, we’d get a mix of source code and binaries, which are not comparable. T (View Highlight)
To avoid inconsistency while maintaining decentralization of our AI systems, we’ve worked toward consolidating certain logging information into a single platform that can then serve various needs when queried. (View Highlight)
• A feature source record stores the origin of data for a particular feature.
• These sources are linked to any number of logical feature records, which store the logic to transform the raw data into a training dataset.
• Model code and its training dataset are connected to a workflow run, which is a pipeline that executes to produce one or more model binaries.
• Finally, the model binaries are linked to one or more model deployments, which track each place where binaries are serving inference requests. (View Highlight)
These describe the outcomes expected in models with a standardized, bucketed format. (View Highlight)
Even with consistently defined metrics, it can be challenging to make sense of outcomes across each metric individually. Implementing a hierarchical framework structure to group metrics can help alleviate this problem. These groupings constitute a graph of metrics and their aggregations for each model, and various nodes on that graph can be reported for different purposes. Ag (View Highlight)
• Think about how different tools contribute to a generic AI development process, and map outputs from various tools to consistent artifacts in that process.
• If your AI systems are complex and can’t be consolidated, think of measurement as a platform with a metadata graph and common interoperability standards.
• For a broad view of outcomes, consider a metric framework that can compare and aggregate across model management domains. (View Highlight)