Skip to content

Latest commit

 

History

History
56 lines (31 loc) · 3.21 KB

File metadata and controls

56 lines (31 loc) · 3.21 KB

General Information

This readme file was generated on 2025-09-02 by the Learning Commons evaluators team.

Note: This file shares high-level details about the dataset. For detailed methodology on dataset creation, see the Literacy Dataset technical doc.

Dataset Information

Title: Learning Commons annotations of CLEAR for qualitative text complexity

Version: v1.0 2025-09-02

Download link: Learning Commons annotations of CLEAR for qualitative text complexity v1.0 2025-09-02.csv

License: https://github.com/learning-commons-org/evaluators/blob/main/LICENSE.md

Data Structure

Number of columns 14
Number of rows 1097

Column Definitions

  • UID: Unique identifier for each row

  • Clear ID: Identifier for texts based on the CLEAR corpus. This is not a unique identifier, as some texts were scored for multiple grades.

  • Grade: Grade-level for which text is scored. For example, if Grade=3 and Sentence Score=slightly complex, then the text is slightly complex for a third grade student (see overall project documentation for specific assumptions).

  • Flesch Kincaid: Flesch-Kincaid Grade Level score for the text, provided from the CLEAR corpus.

  • Text: Text from the CLEAR corpus that was annotated

  • Sentence Score: Overall annotator rating for sentence structure complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See the technical docs for additional details on these categories.

  • Sentence Score Rationale: Annotators’ explanations for their sentence structure score.

  • Vocabulary Score: Overall annotator rating for vocabulary complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See overall project documentation for additional details on these categories.

  • Vocabulary Score Rationale: Annotators’ explanations for their vocabulary score.

  • Tier 2 Words: Tier 2 words identified by annotators.

  • Tier 3 Words: Tier 3 words identified by annotators.

  • Archaic Words: Archaic words identified by annotators.

  • Other Complex Words: Additional complex words for students of the grade level, as identified by annotators.

  • Background Knowledge Assumption: LLM-generated information on the background knowledge that students of a particular grade are likely to have about a topic. Information was provided to annotators as part of the scoring process. See the overall project documentation for detailed methodology on how this was generated.

Missing Data Code

  • not scored: Text was not annotated for this column.

Methodological Information

Annotators were provided with specific assumptions about the student, text and information to score. These and other details are included in the project docs.