MLTRL
Machine Learning Technology Readiness Levels
“While the building blocks are in place, the principles for putting these blocks together are not, and so the blocks are currently being put together in ad-hoc ways… Unfortunately, we are not very good at anticipating what the next emerging serious flaw will be. What we’re missing is an engineering discipline with principles of analysis and design.” -Michael Jordan, Professor UC Berkeley (source)
Derived from the robust processes and testing standards of spacecraft development, Machine Learning Technology Readiness Levels (MLTRL) is an industry-proven systems engineering framework for efficient, reliable, robust, responsible AI/ML research, productization, and deployment.
See the main paper “Technology Readiness Levels for Machine Learning Systems” (in-press) and the Github for details on the framework and results in areas including ML for medical diagnostics, consumer computer vision, satellite imagery, and particle physics.
MLTRL Cards
From TikTok and Facebook recommendations to algorithms for loan approvals and medical care, machine learning systems are all around us. But the accelerating use of machine learning technologies in systems of software, hardware, data, and people introduces vulnerabilities and risks due to dynamic and unreliable behaviours. Fundamentally, machine learning systems learn from data—introducing known and unknown challenges in how these systems behave and interact with their environment. The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end, leading to possibly dangerous consequences.
Other domains of engineering, however, such as civil and aerospace, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. It is no surprise that such approaches to system engineering is standard process and parlance at NASA and DARPA.
MLTRL takes inspiration from systems engineering to create a general framework for developing robust, reliable, & responsible machine learning from basic research through productization and deployment, including essential data considerations. The framework takes care to prioritize the role of machine learning ethics and fairness, and our systems machine learning approach can help curb the large societal issues that can result from poorly deployed and maintained machine learning technologies, such as the automation of systemic human bias, denial of individual autonomy, and unjustifiable outcomes.
The standardization of the MLTRL systems engineering framework for machine learning development across industry, academia, and government should help teams and organizations develop principled, safe, and trusted machine learning technologies.
Example Cards
Tensorflow
Tensorflow Model Card Toolkit
Hugging Face
Building a Model Card Toolkit
GPT 2
Hugging Face and Open AI Model Card Toolkit
MLTRL Cards should leverage other community tools (documentation, provenenance, etc.) where beneficial. For instance, datasets should have their own “datacards”, which we don’t specify (yet) in MLTRL.
- Google’s “Datasheets for Datasets” (paper, template) — it is straightforward to follow this practice within the context of MLTRL. (Note that Microsoft also provides a version, but in a format that is less implementable and transparent as a deliverable: MS Datasheets for Datasets)
- Semantic versioning for datasets, prescribed in MLTRL, is becoming a standard practice and supported by many tools: for example, one can easily coordinate datacards and MTRL Cards with DVC (including programmatic tools for keeping track with data commits/tags and data artifacts/registries).
- Data accountability, ethics, and overall best-practices are constantly evolving areas that should be tracked, potentially for incorporating new methods into MLTRL, and potentially for MLTRL learning lessons to inform the field. Hutchinson et al. ’21 is a good place to start.
How Does It Work?
It’s useful to view the Card contents in the context of real example Cards: Level 4 BayesOpt Card
Card content
First is a quick summary table…
- Tech name, project ID
- Current level
- Owner(s)
- Reviewer(s)
- link to main project page (in company wiki, for example)
- link to code, documentation
- link to ethics checklist
Then we have more details…
Top-level requirements
A quick view of the main req’s is very handy for newcomers and stakeholders to quickly grok the tech and development approach. The req’s listed here will be top-level, i.e. referenced with integers 1, 2, 3, … aligned with the full project req’s document, which follows the format requirement number.subset.component
.
Then there will be link(s) to full requirements + V&V table; refer to the main manuscript to see how MLTRL defines research- and product-requirements, and the use of verification and validation (V&V).
Model/algorithm info
Concise “elevator pitch” — think from the perspective of a MLTRL reviewer who may be a domain expert but not skilled in ML.
Intended use
This can have mission-critical information and should clearly communicate what, how, why of intended and unacceptable use-cases. In general this section is most verbose at the early stages of commercialization (Levels 5-8).
Testing status
What’s tested algorithmically? How is the code/implementation tested? What testing recommendations should be acted on in other MLTRL stages?
Data considerations
- Refer to the MLTRL manuscript for level-by-level data specs.
- Highlight important/interesting data findings — for example, class imbalances, subpar labels, noise and gaps, acquisition assumptions, etc.
- Point to specific datasets in use (internal and external sources) and data versioning info.
- Explain precisely what data and conditions a model has been trained and developed on.
Note this section’s content, amongst others, can vary significantly depending on the ML model or algorithm and the domain. For instance, this section can be verbose with examples for image models, but not necessarily for time series algorithms. And in fields such as medicine there should be notes on acquisition, sharing, privacy, and ethics that may not be as significant in e.g. manufacturing.
Caveats, known edge cases, recommendations
Additional notes to highlight — for example, this is a good place to call out potential biases, technical or knowledge debt, and other matters to be considered in later stages.
MLTRL stage debrief
Succinct summary of stage progress, in a question-response format that is specified at review time (or using defaults like those in the MLTRL Card examples.
Get Started
It’s Never Too Late or Too Early to Get Started
Planning your ML projects at every step sets you up for success overall and ensures you meet those business objectives and deliver projects on time and on budget.