.. currentmodule:: torchtune.datasets
For a detailed general usage guide, please see :ref:`datasets_overview`.
torchtune supports several widely used text-only datasets to help quickly bootstrap your fine-tuning.
.. autosummary::
:toctree: generated/
:nosignatures:
alpaca_dataset
alpaca_cleaned_dataset
grammar_dataset
hh_rlhf_helpful_dataset
samsum_dataset
slimorca_dataset
stack_exchange_paired_dataset
cnn_dailymail_articles_dataset
wikitext_dataset
.. autosummary::
:toctree: generated/
:nosignatures:
multimodal.llava_instruct_dataset
multimodal.the_cauldron_dataset
multimodal.vqa_dataset
torchtune also supports generic dataset builders for common formats like chat models and instruct models. These are especially useful for specifying from a YAML config.
.. autosummary::
:toctree: generated/
:nosignatures:
instruct_dataset
chat_dataset
preference_dataset
text_completion_dataset
Class representations for the above dataset builders.
.. autosummary::
:toctree: generated/
:nosignatures:
TextCompletionDataset
ConcatDataset
PackedDataset
PreferenceDataset
SFTDataset