Skip to content

Fix retriving available datasets#67

Open
lorebalbo wants to merge 4 commits intomainfrom
available-datasets-fix-plus-tests
Open

Fix retriving available datasets#67
lorebalbo wants to merge 4 commits intomainfrom
available-datasets-fix-plus-tests

Conversation

@lorebalbo
Copy link
Copy Markdown
Contributor

Summary of Changes

  • Added the InteractiveUtils package to resolve issues with the subtypes function, used inside available_datasets, which returns the list of datasets exposed by SoleData.Artifacts (NatOps, Libras, HuGaDb, Epilepsy).
  • Added simple test cases for both available_datasets and load_dataset.
  • Removed the Manifest.toml file from the docs directory.
  • Updated .gitignore to ignore all Manifest.toml files.

Issue Identified

The test for load_dataset fails, not because the dataset cannot be loaded, but because of the expected return type, which is Tuple{<:AbstractDataFrame, <:CategoricalArray}. This return type is fine for NatOps, Libras, and Epilepsy but not for HuGaDb, which loads a Tuple{DataFrames.DataFrame, Tuple{Vector{SubString{String}}, Vector{Int64}}, Vector{String}}.

This type mismatch triggers the failure. It needs to be clarified whether:

  1. HuGaDb should be updated so its loader returns a consistent type, or
  2. The return type of Dataset should be made more flexible.

Additional Observations

HuGaDb contains duplicated labels (e.g., multiple "walking" and "standing"). It is unclear if this is intended, but it is worth noting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant