Skip to content

Support using partial pretrained weights for finetuning #124

@pete-machine

Description

@pete-machine

To report a bug, please provide the following:

  • the output of the darknet version command
Darknet V4 "Slate" v4.0-51-g53faaf9c-dirty []
CUDA runtime version 12080 (v12.8), driver version 12080 (v12.8)
cuDNN version 12080 (v9.8.0), use of half-size floats is ENABLED
  • the exact command you ran
# Training will not break - but weights will be silently ignored
./darknet detector train data/obj.data yolo-obj.cfg yolov4.conv.137  

# Weights will throw an error
./darknet detector train data/obj.data yolo-obj.cfg yolov4.conv.137.weights
  • the operating system you are using
    OpenCV v4.6.0, Ubuntu 24.04

First thank you for all your hard work maintaining darknet. Incredible work!

My issue is that it is no longer possible to train/finetune a model using partial weights. Using partial weights is important if users want to finetune a model from pre-trained weights where the number of classes have been changed.

I have previously reported a related issue #69 . Back then I found that it was possible to use partial weights, if just yolov4.conv.137 was renamed to have a .weights suffix (yolov4.conv.137.weights). However, after I have update to a more recent version of darknet, I have now discovered that it is no longer allowed to use partial weights for training/finetuning. The reason is that an internal check have been added, that will now raise a weight file corrupted error.

Below I have added the section where the issue happens:

// src-lib/weights.cpp, line 9-22
inline void xfread(void * dst, const size_t size, const size_t count, std::FILE * fp)
{
	const auto items_read = std::fread(dst, size, count, fp);
	if (items_read != count)
	{
		Darknet::display_warning_msg(
			"The .weights file does not match the .cfg file (not enough fields to read in the weights).\n"
			"Normally this means the .weights file was corrupted, or you've mixed up which .cfg file goes with which .weights file.\n");

		// darknet_fatal_error(DARKNET_LOC, "expected to read %lu fields, but only read %lu", count, items_read);
	}
	return;
}

Note: The simple fix for me as demonstrated above is to comment-out darknet_fatal_error. Not sure if that would be a proper fix though.

To me it is a critical issue that it is not possible to use partial pre-trained weights for finetuning as this is a common approach.

Generally, it would be very helpful to improve and document the flow for training with partial weights. You would then be able to solve four tickets in one go :) This issue and the three issues below

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions