Skip to content

New Multimodal Dataset MMSci for Phd-level Scientific Knowledge Understanding #1

@Leezekun

Description

@Leezekun

Hi,

Thank you for the excellent work in collecting multimodal datasets! I would like to draw your attention to a new multimodal dataset: MMSci, a multimodal multi-discipline benchmark based on Nature Communications journals for PhD-level scientific comprehension. This dataset encompasses over 72 advanced scientific disciplines, 131k articles, and 742k figures, all directly crawled from the web rather than extracted from PDFs, ensuring both diversity and quality.

You can find more details at:
ArXiv: https://arxiv.org/abs/2407.04903
Code & Data: https://github.com/Leezekun/MMSci

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions