Another useful resource for multimodal LLama3: OBELICS.
OBELICS is an open, massive, and high-quality collection of interleaved image-text web documents, containing 141M English documents, 115B text tokens, and 353M images, extracted from Common Crawl dumps between February 2020…