Learn about the elements you need to build an efficient, scalable edge ML architecture.
There are four components that can help bring order to the chaos that is running ML at the edge and allow you to build an efficient, scalable edge ML architecture:
- Central management hub: rather than manually configuring your device, a central management hub allows you to define configurations for your models, including model-specific dependencies and underlying system-level dependencies that you need to run models on devices.
- Device agnostic: ensuring your architecture is device agnostic can save you a lot of time. While many IoT devices come built with Arm chips, you’ll want to also make sure they work for AMD chips. Same goes for data transfer protocols. For example, while MQTT might be the standard in manufacturing, you’ll also want to make sure your architecture works for gRPC, Rest, etc.
- Low latency inferences: if fast response time is important for your use case, low latency inference will be a non-negotiable.
- Ability to operate in a disconnected environment: if you’re running ML models at the edge, chances are, there will be situations where the devices go offline. It’s better to account for these scenarios from the start.
By adopting a first principles approach to your building out your edge ML architecture, you first consider your device locations, and then create a mechanism to configure and interact with them accordingly. Taking things one level down, the key components of your edge-centric AI system include:
- Containers: store libraries, scripts, data files, and assets in an immutable format, locking in your model dependencies and provide flexibility to take your models and put them on a range of devices.
- Centralized model store: hosts your containers and allows your edge devices to grab container images and pull models in from the outside
- More than one “device:” allows you to run and manage models on multiple devices. This doesn’t mean small board computers, it can include cloud or on-prem computers, and is a great way to address challenges associated wit running models on multi-cloud compute.
- Docker: provides your container runtime, which is helpful for remotely processing data in these locations using the same models
- REST or gRPC: provides your high speed, low latency rapid inferencing. gRPC isn’t quite as user-friendly as REST, which can be great for working offline because of network speed or when latency doesn’t matter.
The main benefit of combining these elements is that they generate high performance with low latency because you’re moving your compute to where your data is being collected. This allows you to be more efficient with your resources by minifying your models and distributing them to run on many smaller devices. This allows you to be cost and hardware efficient, and the great thing is that all these models can run on any other computer!
To learn more:
This blog has been republished by AIIA. To view the original article, please click HERE.
Recent Comments