Improve description and use

cgarbin · web-flow · commit a46e13402038 · 2020-08-01T17:56:15.000-04:00
diff --git a/README.md b/README.md
@@ -9,11 +9,24 @@ The template is [here](./datasheet-for-dataset-template.md).
 [Datasheets for datasets](https://arxiv.org/abs/1803.09010) were created to increase transparency
 of datasets.
 
+> [Datasheets for datasets] document [the dataset] motivation, composition, collection process,
+> recommended uses, and so on. [They] have the potential to increase transparency and accountability
+> within the machine learning community, mitigate unwanted biases in machine learning systems, facilitate
+> greater reproducibility of machine learning results, and help researchers and practitioners select more
+> appropriate datasets for their chosen tasks.
+
 The problem it is trying to solve:
 
 > Despite the importance of data to machine learning, there is no standardized process for
 > documenting machine learning datasets. To address this gap, we propose _datasheets for datasets_.
 
+The datasheet is not a passive, after-the-fact document. Dataset creators are expected to read the
+questions in the _motivation_, _composition_, and _collection process_ sections **before** they start
+collecting data for the dataset. The questions in these sections have considerations that, if not taken
+into account before data is gathered, cannot be easily rectified later. Similarly, the dataset creators
+are expected to read the questions in the _preprocesssing/cleaning/labeling_, before they preprocessing
+the raw data.
+
 ## Why use a markdown file for the datasheet?
 
 The short explanation: using a markdown file allows us to easily compare (diff) one version