Motivation
It would be hugely beneficial to have to employ the least number of work-arounds possible when trying to simply and quickly get a CVAT-labeled dataset directly into FiftyOne, configuring this as an easy way to:
- Facilitate a "first import" of a CVAT dataset into FiftyOne without intermediate steps (e.g., without doing CVAT -> export to disk -> import into Fiftyone)
- Having the possibility to periodically keep the FiftyOne dataset "mapped" with the latest additions/modifications to the labels done in CVAT (this is different from sporadically sending quick annotations job from FiftyOne to CVAT, which creates new tasks and has different use cases).
Describe the problem
fiftyone.utils.cvat.import_annotations() silently fails to import any annotations when CVAT stores filenames with subdirectory structure (e.g. coco8-seg/images/train/000000000009.jpg rather than just
000000000009.jpg). This is common when data is uploaded to CVAT as a directory tree or zip archive.
The root cause is that the data_map dictionary — which maps CVAT filenames to local filepaths — is built using os.path.basename(), stripping all directory components. But CVAT's task metadata API returns
the full relative path as the frame name. The lookup data_map.get(filename, None) then fails for every frame, and all annotations are silently ignored.
There are three related bugs in the data_map construction, plus one downstream consequence:
Bug 1 — data_path=None: basenames of existing samples don't match CVAT filenames
|
data_map = {os.path.basename(f): f for f in existing_filepaths} |
os.path.basename(f) produces "000000000009.jpg", but CVAT returns "coco8-seg/images/train/000000000009.jpg". No match.
Bug 2 — data_path is a directory: same basename issue
|
if os.path.isdir(data_path): |
|
data_map = { |
|
os.path.basename(f): f |
|
for f in etau.list_files( |
|
data_path, abs_paths=True, recursive=True |
|
) |
|
} |
Same problem — files are listed recursively but keyed by basename only.
Bug 3 — download_media=True with relative data_path: path mismatch after round-trip
|
if filepath is None and data_dir: |
|
filepath = os.path.join(data_dir, filename) |
When download_media=True, the code bypasses data_map and constructs the filepath directly via os.path.join(data_dir, filename). Media downloads correctly and samples are created. However, if data_path
is a relative path (e.g. "my_downloads/"), the samples are stored with the relative filepath. FiftyOne then converts it to an absolute path internally when persisting to the database.
Later, _build_sparse_frame_id_map reads back the absolute paths from the database and tries to look them up in cvat_id_map, which still has relative-path keys. Every lookup returns None, so
frame_id_map ends up empty and all annotations are silently skipped.
In all three cases the result is the same: a warning about ignored files followed by no annotations being imported, with no error raised.
Code to reproduce issue
python
import fiftyone as fo
import fiftyone.utils.cvat as fouc
# Assumes a CVAT task where files have subdirectory structure
# e.g. frame names like "subdir/image.jpg" rather than just "image.jpg"
# Bug 1: data_path=None
ds = fo.Dataset()
ds.add_sample(fo.Sample(filepath="/path/to/subdir/image.jpg"))
fouc.import_annotations(ds, task_ids=[TASK_ID])
# Warning: Ignoring annotations for N files in CVAT (eg subdir/image.jpg)
# that do not appear in the provided data map
# Bug 2: data_path is a directory
ds2 = fo.Dataset()
fouc.import_annotations(ds2, task_ids=[TASK_ID],
data_path="/path/to/data/root/",
insert_new=True)
# Same warning — no annotations imported
# Bug 3: download_media=True with relative data_path
ds3 = fo.Dataset()
fouc.import_annotations(ds3, task_ids=[TASK_ID],
data_path="relative_dir",
download_media=True, insert_new=True)
# Media downloads, samples are created, but labels are empty
# Workaround for Bugs 1 & 2: build a dict manually with relative-path keys
import os
data_dir = "/path/to/data/root"
data_map = {
os.path.relpath(os.path.join(root, f), data_dir): os.path.join(root, f)
for root, _, files in os.walk(data_dir)
for f in files
}
fouc.import_annotations(ds, task_ids=[TASK_ID], data_path=data_map)
# Works correctly
# Workaround for Bug 3: use an absolute data_path
fouc.import_annotations(ds3, task_ids=[TASK_ID],
data_path=os.path.abspath("relative_dir"),
download_media=True, insert_new=True)
System information
- OS Platform and Distribution: Linux Ubuntu (kernel 6.8.0-94-generic)
- Python version: 3.10.12
- FiftyOne version: 1.14.0
- FiftyOne installed from: source (develop branch at f488054)
Other info/logs
Suggested fixes:
- Bugs 1 & 2: Use os.path.relpath(f, data_path) instead of os.path.basename(f) when building data_map from a directory. For the data_path=None case, a fallback chain could try the full basename first, then
progressively longer relative suffixes — though the right approach may warrant discussion.
- Bug 3: Convert data_dir to an absolute path before constructing filepaths (e.g. data_dir = os.path.abspath(data_path)), so that the paths stored in cvat_id_map and in the database are consistent.
Willingness to contribute
Motivation
It would be hugely beneficial to have to employ the least number of work-arounds possible when trying to simply and quickly get a CVAT-labeled dataset directly into FiftyOne, configuring this as an easy way to:
Describe the problem
fiftyone.utils.cvat.import_annotations()silently fails to import any annotations when CVAT stores filenames with subdirectory structure (e.g.coco8-seg/images/train/000000000009.jpgrather than just000000000009.jpg). This is common when data is uploaded to CVAT as a directory tree or zip archive.The root cause is that the
data_mapdictionary — which maps CVAT filenames to local filepaths — is built usingos.path.basename(), stripping all directory components. But CVAT's task metadata API returnsthe full relative path as the frame name. The lookup
data_map.get(filename, None)then fails for every frame, and all annotations are silently ignored.There are three related bugs in the
data_mapconstruction, plus one downstream consequence:Bug 1 —
data_path=None: basenames of existing samples don't match CVAT filenamesfiftyone/fiftyone/utils/cvat.py
Line 171 in f488054
os.path.basename(f)produces"000000000009.jpg", but CVAT returns"coco8-seg/images/train/000000000009.jpg". No match.Bug 2 —
data_pathis a directory: same basename issuefiftyone/fiftyone/utils/cvat.py
Lines 175 to 181 in f488054
Same problem — files are listed recursively but keyed by basename only.
Bug 3 —
download_media=Truewith relativedata_path: path mismatch after round-tripfiftyone/fiftyone/utils/cvat.py
Lines 311 to 312 in f488054
When
download_media=True, the code bypassesdata_mapand constructs the filepath directly viaos.path.join(data_dir, filename). Media downloads correctly and samples are created. However, ifdata_pathis a relative path (e.g.
"my_downloads/"), the samples are stored with the relative filepath. FiftyOne then converts it to an absolute path internally when persisting to the database.Later,
_build_sparse_frame_id_mapreads back the absolute paths from the database and tries to look them up incvat_id_map, which still has relative-path keys. Every lookup returnsNone, soframe_id_mapends up empty and all annotations are silently skipped.In all three cases the result is the same: a warning about ignored files followed by no annotations being imported, with no error raised.
Code to reproduce issue
System information
Other info/logs
Suggested fixes:
progressively longer relative suffixes — though the right approach may warrant discussion.
Willingness to contribute
from the FiftyOne community