Class Weights For Balancing Data In Tensorflow Object Detection Api
Solution 1:
the API expects a weight for each object (bbox) directly in the annotation files. Due to this requirement the solutions to use class weights seem to be:
1) If you have a custom dataset you can modify the annotations of each object (bbox) to include the weight field as 'object/weight'.
2) If you don't want to modify the annotations you can recreate only the tf_records file in order to include the weights of the bboxes.
3) Modify the code of the API (seemed to me quite tricky)
I decided to go for #2, so I put here the code to generate such weighted tf records file for a custom dataset with two classes ("top", "dress") with weights (1.0, 0.1) given a folder of xml annotations as:
import os
import io
import glob
import hashlib
import pandas as pd
import xml.etree.ElementTree as ET
import tensorflow as tf
import random
from PIL import Image
from object_detection.utils import dataset_util
# Define the class names and their weight
class_names = ['top', 'dress', ...]
class_weights = [1.0, 0.1, ...]
tree = ET.parse(xml_file)
root = tree.getroot()
image_name = root.find('filename').text
image_path = root.find('path').text
file_name = image_name.encode('utf8')
width = int(size[0].text)
height = int(size[1].text)
xmin = []
ymin = []
xmax = []
ymax = []
classes = []
classes_text = []
truncated = []
poses = []
difficult_obj = []
weights = [] # Important linefor member in root.findall('object'):
xmin.append(float(member[4][0].text) / width)
ymin.append(float(member[4][1].text) / height)
xmax.append(float(member[4][2].text) / width)
ymax.append(float(member[4][3].text) / height)
class_name = member[0].text
class_id = class_names.index(class_name)
if class_name == 'top':
elif class_name == 'dress':
print('E: class not recognized!')
full_path = image_path
with tf.gfile.GFile(full_path, 'rb') as fid:
encoded_jpg =
encoded_jpg_io = io.BytesIO(encoded_jpg)
image =
if image.format != 'JPEG':
raise ValueError('Image format not JPEG')
key = hashlib.sha256(encoded_jpg).hexdigest()
#create TFRecord Example
example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(file_name),
'image/source_id': dataset_util.bytes_feature(file_name),
'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
'image/object/truncated': dataset_util.int64_list_feature(truncated),
'image/object/view': dataset_util.bytes_list_feature(poses),
'image/object/weight': dataset_util.float_list_feature(weights) # Important line
return example
weighted_tf_records_output = 'name_of_records_file.record'# output file
annotations_path = '/path/to/annotations/folder/*.xml'# input annotations
writer_train = tf.python_io.TFRecordWriter(weighted_tf_records_output)
init = (tf.global_variables_initializer(), tf.local_variables_initializer())
list =
for xml_file inlist:
print('-> Processing {}'.format(xml_file))
example = create_example(xml_file)
print('-> Successfully converted dataset to TFRecord.')
if __name__ == '__main__':
If you have other kinds of annotations the code will be very similar but this one unfortunately will not work.
Solution 2:
The Object Detection API losses are defined in:
In particular, the following loss classes have been implemented:
Classification losses:
- WeightedSigmoidClassificationLoss
- SigmoidFocalClassificationLoss
- WeightedSoftmaxClassificationLoss
- WeightedSoftmaxClassificationAgainstLogitsLoss
- BootstrappedSigmoidClassificationLoss
Localization losses:
- WeightedL2LocalizationLoss
- WeightedSmoothL1LocalizationLoss
- WeightedIOULocalizationLoss
The weight parameters are used to balance anchors (prior boxes) and are of size [batch_size, num_anchors]
in addition to hard negative mining. Alternatively, the focal loss down weighs well classified examples and focusses on the hard examples.
The primary class imbalance is due to many more negative examples (bounding boxes without objects of interest) in comparison to very few positive examples (bounding boxes with object classes). That seems to be the reason why class imbalance within positive examples (i.e. unequal distribution of positive class labels) is not implemented as part of object detection losses.
