Skip to content

[BUG] CVAT import_annotations() fails to match CVAT filenames containing subdirectory paths #7057

@vittorio-prodomo

Description

@vittorio-prodomo

Motivation

It would be hugely beneficial to have to employ the least number of work-arounds possible when trying to simply and quickly get a CVAT-labeled dataset directly into FiftyOne, configuring this as an easy way to:

  • Facilitate a "first import" of a CVAT dataset into FiftyOne without intermediate steps (e.g., without doing CVAT -> export to disk -> import into Fiftyone)
  • Having the possibility to periodically keep the FiftyOne dataset "mapped" with the latest additions/modifications to the labels done in CVAT (this is different from sporadically sending quick annotations job from FiftyOne to CVAT, which creates new tasks and has different use cases).

Describe the problem

fiftyone.utils.cvat.import_annotations() silently fails to import any annotations when CVAT stores filenames with subdirectory structure (e.g. coco8-seg/images/train/000000000009.jpg rather than just
000000000009.jpg). This is common when data is uploaded to CVAT as a directory tree or zip archive.

The root cause is that the data_map dictionary — which maps CVAT filenames to local filepaths — is built using os.path.basename(), stripping all directory components. But CVAT's task metadata API returns
the full relative path as the frame name. The lookup data_map.get(filename, None) then fails for every frame, and all annotations are silently ignored.

There are three related bugs in the data_map construction, plus one downstream consequence:


Bug 1 — data_path=None: basenames of existing samples don't match CVAT filenames

data_map = {os.path.basename(f): f for f in existing_filepaths}

os.path.basename(f) produces "000000000009.jpg", but CVAT returns "coco8-seg/images/train/000000000009.jpg". No match.

Bug 2 — data_path is a directory: same basename issue

if os.path.isdir(data_path):
data_map = {
os.path.basename(f): f
for f in etau.list_files(
data_path, abs_paths=True, recursive=True
)
}

Same problem — files are listed recursively but keyed by basename only.

Bug 3 — download_media=True with relative data_path: path mismatch after round-trip

if filepath is None and data_dir:
filepath = os.path.join(data_dir, filename)

When download_media=True, the code bypasses data_map and constructs the filepath directly via os.path.join(data_dir, filename). Media downloads correctly and samples are created. However, if data_path
is a relative path (e.g. "my_downloads/"), the samples are stored with the relative filepath. FiftyOne then converts it to an absolute path internally when persisting to the database.

Later, _build_sparse_frame_id_map reads back the absolute paths from the database and tries to look them up in cvat_id_map, which still has relative-path keys. Every lookup returns None, so
frame_id_map ends up empty and all annotations are silently skipped.


In all three cases the result is the same: a warning about ignored files followed by no annotations being imported, with no error raised.

Code to reproduce issue

  python
  import fiftyone as fo
  import fiftyone.utils.cvat as fouc

  # Assumes a CVAT task where files have subdirectory structure
  # e.g. frame names like "subdir/image.jpg" rather than just "image.jpg"

  # Bug 1: data_path=None
  ds = fo.Dataset()
  ds.add_sample(fo.Sample(filepath="/path/to/subdir/image.jpg"))
  fouc.import_annotations(ds, task_ids=[TASK_ID])
  # Warning: Ignoring annotations for N files in CVAT (eg subdir/image.jpg)
  # that do not appear in the provided data map

  # Bug 2: data_path is a directory
  ds2 = fo.Dataset()
  fouc.import_annotations(ds2, task_ids=[TASK_ID],
                          data_path="/path/to/data/root/",
                          insert_new=True)
  # Same warning — no annotations imported

  # Bug 3: download_media=True with relative data_path
  ds3 = fo.Dataset()
  fouc.import_annotations(ds3, task_ids=[TASK_ID],
                          data_path="relative_dir",
                          download_media=True, insert_new=True)
  # Media downloads, samples are created, but labels are empty

  # Workaround for Bugs 1 & 2: build a dict manually with relative-path keys
  import os
  data_dir = "/path/to/data/root"
  data_map = {
      os.path.relpath(os.path.join(root, f), data_dir): os.path.join(root, f)
      for root, _, files in os.walk(data_dir)
      for f in files
  }
  fouc.import_annotations(ds, task_ids=[TASK_ID], data_path=data_map)
  # Works correctly

  # Workaround for Bug 3: use an absolute data_path
  fouc.import_annotations(ds3, task_ids=[TASK_ID],
                          data_path=os.path.abspath("relative_dir"),
                          download_media=True, insert_new=True)

System information

  • OS Platform and Distribution: Linux Ubuntu (kernel 6.8.0-94-generic)
  • Python version: 3.10.12
  • FiftyOne version: 1.14.0
  • FiftyOne installed from: source (develop branch at f488054)

Other info/logs

Suggested fixes:

  • Bugs 1 & 2: Use os.path.relpath(f, data_path) instead of os.path.basename(f) when building data_map from a directory. For the data_path=None case, a fallback chain could try the full basename first, then
    progressively longer relative suffixes — though the right approach may warrant discussion.
  • Bug 3: Convert data_dir to an absolute path before constructing filepaths (e.g. data_dir = os.path.abspath(data_path)), so that the paths stored in cvat_id_map and in the database are consistent.

Willingness to contribute

  • Yes. I can contribute a fix for this bug independently
  • Yes. I would be willing to contribute a fix for this bug with guidance
    from the FiftyOne community
  • No. I cannot contribute a bug fix at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugBug fixes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions