Skip to content

Commit 10f1ccd

Browse files
authored
Add section descriptions
1 parent 4cf3346 commit 10f1ccd

1 file changed

Lines changed: 27 additions & 5 deletions

File tree

datasheet-for-dataset-template.md

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Datasheet for dataset "add dataset name here"
22

3-
Question from the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) paper, v7.
3+
Questions from the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) paper, v7.
44

55
Jump to section:
66

@@ -12,12 +12,12 @@ Jump to section:
1212
- [Distribution](#distribution)
1313
- [Maintenance](#maintenance)
1414

15-
The questions in this section are primarily intended to encourage dataset creators
16-
to clearly articulate their reasons for creating the dataset and to promote transparency
17-
about funding interests.
18-
1915
## Motivation
2016

17+
_The questions in this section are primarily intended to encourage dataset creators
18+
to clearly articulate their reasons for creating the dataset and to promote transparency
19+
about funding interests._
20+
2121
### For what purpose was the dataset created?
2222

2323
_Was there a specific task in mind? Was there a specific gap that needed to be filled?
@@ -34,6 +34,12 @@ name and number._
3434

3535
## Composition
3636

37+
_Most of these questions are intended to provide dataset consumers with the
38+
information they need to make informed decisions about using the dataset for
39+
specific tasks. The answers to some of these questions reveal information
40+
about compliance with the EU’s General Data Protection Regulation (GDPR) or
41+
comparable regulations in other jurisdictions._
42+
3743
### What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)?
3844

3945
_Are there multiple types of instances (e.g., movies, users, and ratings; people and
@@ -115,6 +121,9 @@ _If so, please provide a description._
115121

116122
## Collection process
117123

124+
_\[T\]he answers to questions here may provide information that allow others to
125+
reconstruct the dataset without access to it._
126+
118127
### How was the data associated with each instance acquired?
119128

120129
_Was the data directly observable (e.g., raw text, movie ratings), reported by subjects (e.g.,
@@ -173,6 +182,11 @@ or other access point to any supporting documentation._
173182

174183
## Preprocessing/cleaning/labeling
175184

185+
_The questions in this section are intended to provide dataset consumers with the information
186+
they need to determine whether the “raw” data has been processed in ways that are compatible
187+
with their chosen tasks. For example, text that has been converted into a “bag-of-words” is
188+
not suitable for tasks involving word order._
189+
176190
### Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values)?
177191

178192
_If so, please provide a description. If not, you may skip the remainder of the questions in
@@ -190,6 +204,11 @@ _If so, please provide a link or other access point._
190204

191205
## Uses
192206

207+
_These questions are intended to encourage dataset creators to reflect on the tasks
208+
for which the dataset should and should not be used. By explicitly highlighting these tasks,
209+
dataset creators can help dataset consumers to make informed decisions, thereby avoiding
210+
potential risks or harms._
211+
193212
### Has the dataset been used for any tasks already?
194213

195214
_If so, please provide a description._
@@ -247,6 +266,9 @@ reproduce, any supporting documentation._
247266

248267
## Maintenance
249268

269+
_These questions are intended to encourage dataset creators to plan for dataset maintenance
270+
and communicate this plan with dataset consumers._
271+
250272
### Who is supporting/hosting/maintaining the dataset?
251273

252274
### How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

0 commit comments

Comments
 (0)