Machine-Learning Lifecycle and Workflows¶
Operational Machine-Learning Lifecycle¶
Unlike a competition lifecycle of model development, the operational lifecycle is an iterative process and has workflows and metrics associated with each stage, and also cuts across multiple stages. The competition is between near-peer nations for superior operational capability, not between model developers for a better metric in a Kaggle-style AI competition.
Stages/Steps of the Lifecycle¶
Consensus machine-learning lifecycle derivation [1] [2] [3]
not a linear, sequential process
roles and responsibilities across stages and personnel are dynamic
“mind” (models) and “data” are both important
Scope And Objectives¶
define the scope of the problem and goals for the solution
specify operational requirements
material release
safety analysis
doctrine
human factors
…
specify operational constraints
restrictions on generative factors for evaluating completeness
access to labels/groundtruth
restrictions on lifetime learning
…
Data Engineering¶
develop data pipelines
data linting
develop labeling protocols and pipelines
label-error detection
formulate sampling protocols
curate static training and test datasets
assess leakage
train/test shift
perform exploratory data analysis
data complexity / metafeatures to evaluate achievability of objectives
Model Development¶
model selection
metafeatures
model/data complexity matching
sufficiency assessment
model training
training-data partitioning
training-data augmentation
leakage, bias and label errors
model evaluation
performance
calibration
fairness and generalization
robustness and fault tolerance
Deployment¶
online or batch prediction?
online or streaming features?
model update cycle?
model compression
model optimization
Note
Deployment decisions can impact model performance metrics and these impacts need to be assessed.
Monitoring¶
data shifts
covariate shift - data monitoring
label shift - prediction monitoring
concept drift
data monitoring
data distribution-shift
feature monitoring
feature distribution-shift
model monitoring
prediction distribution-shift
uncertainty/confidence shifts
accuracy/performance metrics
Analysis¶
determine whether model achieves specified goal and objective requirements
refine data engineering and model development stages as needed to achieve objective requirements
perform analysis on model predictions to generate operational insight that drive refinement of scope and objectives for future iterations
Always we begin again…¶
People tend to ask me: ‘How often should I update my models?’… The right question to ask should be: ‘How often can I update my models?’
—Chip Huyen
developing and deploying an ML system is a never-ending cyclical process
the world changes and models must change to adapt to the changing world
modern ML deployment is approaching DevOps timelines
Weibo, Alibaba, and ByteDance deploy new ML models on a 10 minute update cycle