We share the configs, checkpoints, training logs, as well as our negative attempts towards improving pre-training efficiency.
Advanced optimizers like Lion, Sophia, ALiBi positional embeddings, and FP16 mixed precision training didn't yield expected benefits.