General Information

This readme file was generated on 2025-09-02 by the Learning Commons evaluators team.

Note: This file shares high-level details about the dataset. For detailed methodology on dataset creation, see the Literacy Dataset technical doc.

Dataset Information

Title: Learning Commons annotations of CLEAR for qualitative text complexity

Version: v1.0 2025-09-02

Download link: Learning Commons annotations of CLEAR for qualitative text complexity v1.0 2025-09-02.csv

License: https://github.com/learning-commons-org/evaluators/blob/main/LICENSE.md

Data Structure

Number of columns	14
Number of rows	1097

Column Definitions

UID: Unique identifier for each row
Clear ID: Identifier for texts based on the CLEAR corpus. This is not a unique identifier, as some texts were scored for multiple grades.
Grade: Grade-level for which text is scored. For example, if Grade=3 and Sentence Score=slightly complex, then the text is slightly complex for a third grade student (see overall project documentation for specific assumptions).
Flesch Kincaid: Flesch-Kincaid Grade Level score for the text, provided from the CLEAR corpus.
Text: Text from the CLEAR corpus that was annotated
Sentence Score: Overall annotator rating for sentence structure complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See the technical docs for additional details on these categories.
Sentence Score Rationale: Annotators’ explanations for their sentence structure score.
Vocabulary Score: Overall annotator rating for vocabulary complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See overall project documentation for additional details on these categories.
Vocabulary Score Rationale: Annotators’ explanations for their vocabulary score.
Tier 2 Words: Tier 2 words identified by annotators.
Tier 3 Words: Tier 3 words identified by annotators.
Archaic Words: Archaic words identified by annotators.
Other Complex Words: Additional complex words for students of the grade level, as identified by annotators.
Background Knowledge Assumption: LLM-generated information on the background knowledge that students of a particular grade are likely to have about a topic. Information was provided to annotators as part of the scoring process. See the overall project documentation for detailed methodology on how this was generated.

Missing Data Code

not scored: Text was not annotated for this column.

Methodological Information

Annotators were provided with specific assumptions about the student, text and information to score. These and other details are included in the project docs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Information

Dataset Information

Data Structure

Column Definitions

Missing Data Code

Methodological Information

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

General Information

Dataset Information

Data Structure

Column Definitions

Missing Data Code

Methodological Information