7/ 📡 Communication Optimization: Techniques like DiLoCo, DiPaCo, and Cocktail-SGD are reducing the information shared over slower internet connections. For example, Cocktail-SGD shows only a 1.2x reduction in training speed with a 500Mbps connection, compared to data-center