You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: datasheet-for-dataset-template.md
+27-5Lines changed: 27 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Datasheet for dataset "add dataset name here"
2
2
3
-
Question from the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) paper, v7.
3
+
Questions from the [Datasheets for Datasets](https://arxiv.org/abs/1803.09010) paper, v7.
4
4
5
5
Jump to section:
6
6
@@ -12,12 +12,12 @@ Jump to section:
12
12
-[Distribution](#distribution)
13
13
-[Maintenance](#maintenance)
14
14
15
-
The questions in this section are primarily intended to encourage dataset creators
16
-
to clearly articulate their reasons for creating the dataset and to promote transparency
17
-
about funding interests.
18
-
19
15
## Motivation
20
16
17
+
_The questions in this section are primarily intended to encourage dataset creators
18
+
to clearly articulate their reasons for creating the dataset and to promote transparency
19
+
about funding interests._
20
+
21
21
### For what purpose was the dataset created?
22
22
23
23
_Was there a specific task in mind? Was there a specific gap that needed to be filled?
@@ -34,6 +34,12 @@ name and number._
34
34
35
35
## Composition
36
36
37
+
_Most of these questions are intended to provide dataset consumers with the
38
+
information they need to make informed decisions about using the dataset for
39
+
specific tasks. The answers to some of these questions reveal information
40
+
about compliance with the EU’s General Data Protection Regulation (GDPR) or
41
+
comparable regulations in other jurisdictions._
42
+
37
43
### What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)?
38
44
39
45
_Are there multiple types of instances (e.g., movies, users, and ratings; people and
@@ -115,6 +121,9 @@ _If so, please provide a description._
115
121
116
122
## Collection process
117
123
124
+
_\[T\]he answers to questions here may provide information that allow others to
125
+
reconstruct the dataset without access to it._
126
+
118
127
### How was the data associated with each instance acquired?
119
128
120
129
_Was the data directly observable (e.g., raw text, movie ratings), reported by subjects (e.g.,
@@ -173,6 +182,11 @@ or other access point to any supporting documentation._
173
182
174
183
## Preprocessing/cleaning/labeling
175
184
185
+
_The questions in this section are intended to provide dataset consumers with the information
186
+
they need to determine whether the “raw” data has been processed in ways that are compatible
187
+
with their chosen tasks. For example, text that has been converted into a “bag-of-words” is
188
+
not suitable for tasks involving word order._
189
+
176
190
### Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values)?
177
191
178
192
_If so, please provide a description. If not, you may skip the remainder of the questions in
@@ -190,6 +204,11 @@ _If so, please provide a link or other access point._
190
204
191
205
## Uses
192
206
207
+
_These questions are intended to encourage dataset creators to reflect on the tasks
208
+
for which the dataset should and should not be used. By explicitly highlighting these tasks,
209
+
dataset creators can help dataset consumers to make informed decisions, thereby avoiding
210
+
potential risks or harms._
211
+
193
212
### Has the dataset been used for any tasks already?
194
213
195
214
_If so, please provide a description._
@@ -247,6 +266,9 @@ reproduce, any supporting documentation._
247
266
248
267
## Maintenance
249
268
269
+
_These questions are intended to encourage dataset creators to plan for dataset maintenance
270
+
and communicate this plan with dataset consumers._
271
+
250
272
### Who is supporting/hosting/maintaining the dataset?
251
273
252
274
### How can the owner/curator/manager of the dataset be contacted (e.g., email address)?
0 commit comments