Introduction

In this notebook, we’ll look at how to save and restore your machine learning models with Weights & Biases.
W&B lets you save everything you need to reproduce your models – weights, architecture, predictions, and code to a safe place.
This is useful because you don’t have to re-train your models, you can simply view their performance days, weeks, or even months later. Before you’re ready to deploy, you can compare the performance of all the models you trained in the previous months and restore the best-performing one.
If you’d like to follow along, I’ve created a Colab to provide working code samples and an easy demo.

Save your machine learning model

There are two ways to save a file to associate with a run.
  • Use wandb.save(filename).
  • Put a file in the wandb run directory, and it will get uploaded at the end of the run.
If you want to sync files as they’re being written, you can specify a filename or glob in wandb.save.
Here’s how you can do this in just a few lines of code. See this colab for a complete example:
# "model.h5" is saved in wandb.run.dir & will be uploaded at the end of training
model.save(os.path.join(wandb.run.dir, "model.h5"))

# Save a model file manually from the current directory:
wandb.save('model.h5')

# Save all files that currently exist containing the substring "ckpt":
wandb.save('../logs/*ckpt*')

# Save any files starting with "checkpoint" as they're written to:
wandb.save(os.path.join(wandb.run.dir, "checkpoint*"))

You can view your saved models by navigating to a run page, clicking on the Files tab, then clicking on your model file. See an example here.
See the docs for frequently asked questions about saving and restoring.

Restore your machine learning model

Now let’s look at how to restore a file, such as a model checkpoint, into your local run folder to access in your script.
Common use cases:
  • restore the model architecture or weights generated by past runs
  • resume training from the last checkpoint in the case of failure (see the section on resuming for crucial details)
Here’s how you can do this in just a few lines of code. See this colab for a complete example:
# restore the model file “model.h5” from a specific run by user “lavanyashukla”
# in project “save_and_restore” from run “10pr4joa”
best_model = wandb.restore(‘model.h5’, run_path=“lavanyashukla/save_and_restore/10pr4joa”)
# use the "name" attribute of the returned object if your framework expects a filename, e.g. as in Keras
model.load_weights(best_model.name)
See the restore docs for more details.
This blog has been republished by AIIA. To view the original article, please click HERE.