@tianle_cai
Very cool work! 💫 we have a NeurIPS 2023 workshop paper with a similar idea and observations. The delta between the finetuned and pretrained model is extremely compressible with quantization and even with simple magnitude-based sparsification:
How much value does your fine-tuning add? Believe it or not, just 1 bit 🤏 Thrilled to unveil BitDelta, a super simple yet effective method for compressing fine-tuning deltas into a single bit while barely touching performance. This approach slashes storage and GPU memory demands…