We carried a 4-core laptop around Boston, comparing runs of sparsified
#YOLOv5
object detection model running on the
#DeepSparse
Engine and
#ONNXRuntime
.
End result: Pruning + INT8 quantization = 10x faster and 12x smaller model.
Replicate our results: