tensormonk.detection¶
Implementation may vary when compared to what is referred, as the intension was not to replicate but to have the flexibility to utilize concepts across several papers.
Contents
AnchorDetector¶
-
class
AnchorDetector
(config: tensormonk.detection.config.CONFIG)[source]¶ A common detection module on top of base network with NoFPN, BiFPN, FPN, and PAFPN.
Base is the backbone network (a pretrained or a custom one) Ex: ResNet-18 1x3x224x224 1x64x56x56 1x128x28x28 1x256x14x14 1x512x7x7 input -> o -> o -> o -> o x1 x2 x3 x4 Lets call x1, x2, x3, x4 as levels. Base2Body has one 1x1 convolutional layer per level to convert the depth of (x1, x2, x3, x4) to a constant depth (config.encoding_depth) Ex: config.encoding_depth = 60 Base2Body((x1, x2, x3, x4))[0].shape == [1, 60, 56, 56] Base2Body((x1, x2, x3, x4))[1].shape == [1, 60, 28, 28] Base2Body((x1, x2, x3, x4))[2].shape == [1, 60, 14, 14] Base2Body((x1, x2, x3, x4))[3].shape == [1, 60, 7, 7] Body can have stacks of NoFPN/FPN/BiFPN/PAFPN layers. Essentially, these act as context layers that are interconnected across levels (exception is NoFPN layer).
-
batch_detect
(p_label: torch.Tensor, p_boxes: torch.Tensor, p_point: torch.Tensor)[source]¶ A list of Responses from detect.
- Parameters
p_label (Tensor) – label predictions at each pixel for all levels
p_boxes (Tensor) – boxes predictions at each pixel for all levels
p_point (Tensor) – boxes predictions at each pixel for all levels
p_label.size – self.centers.size(0)
- Return type
[
tensormonk.detection.Responses
,tensormonk.detection.Responses
, …]
-
batch_encode
(r_label: Union[list, tuple], r_boxes: Union[list, tuple], r_point: Union[list, tuple])[source]¶ Encode’s raw labels, boxes and points of a batch of images.
- Parameters
r_label (list/tuple) – list/tuple of tensor’s to encode. See encode for more information
r_boxes (list/tuple) – list/tuple of tensor’s to encode. See encode for more information
r_point (list/tuple) – list/tuple of tensor’s to encode. See encode for more information
- Return type
-
detect
(p_label: torch.Tensor, p_boxes: torch.Tensor, p_point: torch.Tensor)[source]¶ Detects labels, boxes and points of a single image.
- Parameters
p_label (Tensor) – label predictions at each pixel for all levels
p_boxes (Tensor) – boxes predictions at each pixel for all levels
p_point (Tensor) – boxes predictions at each pixel for all levels
p_label.size – self.centers.size(0)
- Return type
-
encode
(r_label: torch.Tensor, r_boxes: torch.Tensor, r_point: torch.Tensor)[source]¶ Encode’s raw labels, boxes and points of a single image.
- Parameters
r_label (Tensor) – label for each object (0 is background)
r_boxes (Tensor) – ltrb boxes of each object (pixel coordinates without any normalization)
r_point (Tensor) – x, y, x, y, … for each object (pixel coordinates without any normalization), nan’s are avoided in loss computation.
- Return type
-
Block¶
-
class
Block
(encoding_depth: int, n_features: int, fusion: str = 'softmax')[source]¶ DepthWiseSeparable + FeatureFusion or FeatureFusion + DepthWiseSeparable. (EfficientDet: Scalable and Efficient Object Detection)
- Parameters
encoding_depth (int, required) – depth of all the input tensor’s
n_features (int, required) – #Features to fuse. When n_features = 1, FeatureFusion is performed with input and the ouput of DepthWiseSeparable layer. Otherwise, FeatureFusion is performed on all the inputs, followed by DepthWiseSeparable layers.
fusion (str, optional) – fusion logic after resizing all the tensor’s to match the first tensor in the list/tuple/args using bilinear interpolation. Options -
"sum"
,"fast-normalize"
,"softmax"
. (default ="softmax"
)
# TODO: More options for Block
Classifier¶
-
class
Classifier
(config: tensormonk.detection.config.CONFIG)[source]¶ Classifier layer to predict labels, boxes, points, objectness and centerness.
- Parameters
config (
CONFIG
) – Seetensormonk.detection.CONFIG
for more details.- Return type
CONFIG¶
-
class
CONFIG
(name: str)[source]¶ CONFIG is used to configure all the options for object detection tasks.
Example: Assume an object detection model that is trained on 320x320 images to detect dogs and cats.
import tensormonk config = tensormonk.detection.CONFIG("mnas_bifpn_dogs_cats") # Define input size config.t_size = (1, 3, 320, 320) # Use pretrained MNAS model as base network config.base_network = "mnas_100" config.base_network_pretrained = True # Given the above config and input size of (4, 3, 320, 320), base # network will return a tuple of tensor's of shape # ((4, 24, 80, 80), (4, 40, 40, 40), (4, 96, 20, 20), (4, 320, 10, 10)) # By using base_network_forced_stride, base network will return a tuple # of tensor's of shape # ((4, 24, 40, 40), (4, 40, 20, 20), (4, 96, 10, 10), (4, 320, 5, 5)). config.base_network_forced_stride = True # All the ouputs from base network are encoded to have constant depth # (96) using a 1x1 convolution per level. # Essentially, the base network output with tensor shapes # ((4, 24, 40, 40), (4, 40, 20, 20), (4, 96, 10, 10) and (4, 320,5, 5)) # is converted to # ((4, 96, 40, 40), (4, 96, 20, 20), (4, 96, 10, 10) and (4, 96, 5, 5)) config.encoding_depth = 96 # Define a body network with 4 "bifpn" layers. config.body_network = "bifpn" config.body_network_depth = 4 # Define number of labels (labels to detect + background) config.n_label = 2 + 1 config.label_loss_fn = tensormonk.loss.LabelLoss config.label_loss_kwargs = { "method": "ce_with_negative_mining", "pos_to_neg_ratio": 1 / 3., "reduction": "mean"} # Define loss function and encoding for bounding box config.is_boxes = True config.boxes_loss_fn = tensormonk.loss.BoxesLoss config.boxes_loss_kwargs = { "method": "smooth_l1", "reduction": "mean"} config.boxes_encode_format = "normalized_offset" # Enable objectness and disable centerness config.is_point = False config.is_objectness = True config.is_centerness = False # Define encode_iou # minimum iou required for a prior to set a location as non background config.encode_iou = 0.5 # Define detect_iou - iou_threshold for non-maximal suppression config.detect_iou = 0.2 # Define score_threshold - minimum score required to label an anchor as # non background during inference. config.score_threshold = 0.46 # Define ignore_base - As a pretrained base network is used in this # example, disable the gradients to reach base_network for 5000 # iterations. config.ignore_base = 5000 # Define anchors config.anchors_per_layer = ( # anchors at 40x40 (config.an_anchor(32, 32), config.an_anchor(46, 46)), # anchors at 20x20 (config.an_anchor(64, 64), config.an_anchor(90, 90)), # anchors at 10x10 (config.an_anchor(128, 128), config.an_anchor(180, 180)), # anchors at 5x5 (config.an_anchor(256, 256), config.an_anchor(320, 320)) print(config)
-
property
anchors_per_layer
¶ All anchors per layer. A list/tuple of list/tuple of config.an_anchor’s.
- Parameters
value (int, optional) – default =
0
.
-
property
base_network
¶ Base network for anchor detector (str/nn.Module).
- Parameters
value (str, optional) – Current options are
"mnas_050"
,"mnas_100"
, and"mobilev2"
. Seetensormonk.architectures.MNAS
, andtensormonk.architectures.MobileNetV2
for more information. Also accept a custom network. default ="mnas_050"
.
Example custom network:
import torch import torch.nn as nn import torch.nn.functional as F import tensormonk class Tiny(torch.nn.Module): def __init__(self, **kwargs): super(Tiny, self).__init__() self._layer_0 = torch.nn.Sequential( nn.Conv2d(3, 16, 3, stride=2, padding=1), nn.PReLU(), nn.Conv2d(16, 16, 3, stride=2, padding=1), nn.PReLU()) self._layer_1 = torch.nn.Sequential( nn.Conv2d(16, 24, 3, stride=2, padding=1), nn.PReLU(), nn.Conv2d(24, 24, 3, stride=1, padding=1), nn.PReLU()) self._layer_2 = torch.nn.Sequential( nn.Conv2d(24, 32, 3, stride=2, padding=1), nn.PReLU(), nn.Conv2d(32, 32, 3, stride=1, padding=1), nn.PReLU()) self._layer_3 = torch.nn.Sequential( nn.Conv2d(32, 48, 3, stride=2, padding=1), nn.PReLU(), nn.Conv2d(48, 48, 3, stride=1, padding=1), nn.PReLU()) def forward(self, tensor: torch.Tensor): x0 = self._layer_0(tensor) x1 = self._layer_1(x0) x2 = self._layer_2(x1) x3 = self._layer_3(x2) return (x1, x2, x3) config = tensormonk.detection.CONFIG("tiny") config.base_network = Tiny
-
property
base_network_forced_stride
¶ Used when base_network is
"mnas_050"
,"mnas_100"
, or"mobilev2"
to add an additional stride in the second or third convolution layer.
-
property
base_network_pretrained
¶ Used when base_network is
"mnas_050"
,"mnas_100"
, or"mobilev2"
to load pretrained weights.
-
property
body_fpn_fusion
¶ Fusion scheme used by FPN and NoFPN. See
tensormonk.layers.FeatureFusion
andtensormonk.detection.Block
for more information.- Parameters
value (str, optional) – default =
"softmax"
. Seetensormonk.layers.FeatureFusion
for all available options.
-
property
body_network
¶ Body network options are
- Parameters
value (str, optional) –
"bifpn"
,"fpn"
,"nofpn"
, and"pafpn"
. default ="bifpn"
.
"bifpn"
=tensormonk.detection.BiFPNLayer
"fpn"
=tensormonk.detection.FPNLayer
"nofpn"
=tensormonk.detection.NoFPNLayer
"pafpn"
=tensormonk.detection.PAFPNLayer
-
property
body_network_depth
¶ Number of FPN or NoFPN layers to stack. Below is an example config of body network that has 6
"bifpn"
layers:import tensormonk config = tensormonk.detection.CONFIG("mnas_bifpn") config.base_network = "mnas_050" config.encoding_depth = 96 config.body_network = "bifpn" config.body_network_depth = 6
- Parameters
value (int, optional) – default =
2
.
-
property
body_network_return_responses
¶ When True, compute_loss in
tensormonk.detection.AnchorDetector
also return’s the responses from body network.
-
property
boxes_encode_format
¶ Boxes encoding format. See
tensormonk.detection.ObjectUtils
for more options.Note: IOU based loss functions require “normalized_offset”
- Parameters
value (str, optional) – Options “normalized_gcxcywh” or “normalized_offset”. default =
"normalized_gcxcywh"
.
-
property
boxes_encode_var1
¶ Single Shot MultiBox Detector <https://arxiv.org/pdf/1512.02325.pdf>`_.
-
property
boxes_encode_var2
¶ Single Shot MultiBox Detector <https://arxiv.org/pdf/1512.02325.pdf>`_.
-
property
boxes_loss_fn
¶ Loss function to compute loss given p_boxes and t_boxes. This function is initialized in
tensormonk.detection.AnchorDetector
. A custom loss function can be initialized as long as it is a nn.Module and all the boxes_loss_kwargs are set.- Parameters
value (nn.Module, optional) – default =
tensormonk.loss.BoxesLoss
-
property
boxes_loss_kwargs
¶ Dictonary of parameters required to initialize config.boxes_loss_fn function.
- Parameters
value (dict, required) – See
tensormonk.loss.BoxesLoss
for more information if config.boxes_loss_fn istensormonk.loss.BoxesLoss
.
-
property
detect_iou
¶ IOU used to filter boxes during detection.
- Parameters
value (float, optional) – default =
0.5
.
-
property
encode_iou
¶ IOU required by a box to map it to an anchor.
- Parameters
value (float, optional) – default =
0.5
.
-
property
encode_iou_max_background
¶ IOU below which is considered as background.
- Parameters
value (float, optional) – default =
0.5
.
-
property
encoding_depth
¶ Encoding depth to convert all the base network outputs to a constant depth in order to enable FPN and NoFPN layers.
- Parameters
value (int, required) – See the example in
tensormonk.detection.AnchorDetector
for more information.
-
property
hard_encode
¶ Eliminates boxes with centers that are not within pix2pix_delta.
-
property
ignore_base
¶ Gradients are not propagated to base network for ignore_base iterations. Used when a pretrained network is used to tune parameters.
- Parameters
value (int, optional) – default =
0
.
-
property
is_boxes
¶ Flag to enable bounding box detection. Not used in current implementation (default =
"True"
), will get updated with inclusion of segmentation task.- Parameters
value (bool, optional) – default =
"True"
-
property
is_centerness
¶ Fully Convolutional One-Stage Object Detection <https://arxiv.org/pdf/1904.01355.pdf>`_
-
property
is_objectness
¶ An Incremental Improvement <https://pjreddie.com/media/files/papers/YOLOv3.pdf>`_.
-
property
is_point
¶ Flag to enable point localization within a bounding box.
- Parameters
value (bool, optional) – default =
"False"
-
property
label_loss_fn
¶ Loss function to compute loss given p_label and t_label. This function is initialized in
tensormonk.detection.AnchorDetector
. A custom loss function can be initialized as long as it is a nn.Module and all the label_loss_kwargs are set.- Parameters
value (nn.Module, optional) – default =
tensormonk.loss.LabelLoss
-
property
label_loss_kwargs
¶ Dictonary of parameters required to initialize config.label_loss_fn function.
- Parameters
value (dict, required) – See
tensormonk.loss.LabelLoss
for more information if config.label_loss_fn istensormonk.loss.LabelLoss
.
-
property
n_label
¶ Number of labels (including background) to predict.
- Parameters
value (int, required) – Must be >= 2.
-
property
n_point
¶ Number of points to detect in an object. This is relavent to tasks like identifying body parts/joints in person detection, facial landmarks in face detection, etc.
- Parameters
value (int, optional) – Must be >= 1 and set when config.is_point is True.
-
property
point_encode_format
¶ Point encoding format. See
tensormonk.detection.ObjectUtils
for more information.- Parameters
value (str, optional) – default =
"normalized_xy_offsets"
.
-
property
point_encode_var
¶ SSD normalization variance is used for points.
- Parameters
value (float, optional) – default =
0.5
.
-
property
point_loss_fn
¶ Loss function to compute loss given p_point and t_point. This function is initialized in
tensormonk.detection.AnchorDetector
. A custom loss function can be initialized as long as it is a nn.Module and all the point_loss_kwargs are set.- Parameters
value (nn.Module, optional) – default =
tensormonk.loss.PointLoss
-
property
point_loss_kwargs
¶ Dictonary of parameters required to initialize config.point_loss_fn function.
- Parameters
value (dict, required) – See
tensormonk.loss.PointLoss
for more information if config.point_loss_fn istensormonk.loss.PointLoss
.
-
property
score_threshold
¶ Score threshold used to filter boxes during detection.
- Parameters
value (float, optional) – default =
0.5
.
-
property
single_classifier_head
¶ Flag to enable single classifier head in
tensormonk.detection.Classifier
.- Parameters
value (bool, optional) – default =
False
. Seetensormonk.detection.Classifier
for more information.
-
property
FPN Layers¶
All FPN layers use DepthWiseSeparable convolution (with BatchNorm2d and Swish) and FeatureFusion layer.
BiFPNLayer¶
-
class
BiFPNLayer
(config: tensormonk.detection.config.CONFIG)[source]¶ A modified version of BiFPNLayer compatible with
CONFIG
. Upscale/downscale is done with bilinear interpolation. (EfficientDet: Scalable and Efficient Object Detection).Logic: n_scales = 4 ------------------- low-resolution o ------> o -> _\_____ ^ | \ \ | o -> o -> o -> ___ | _ ^ | v \ | o -> o -> o -> \ ^ \ | high-resolution o ------> o ->
FPNLayer¶
-
class
FPNLayer
(config: tensormonk.detection.config.CONFIG)[source]¶ A modified version of FPN compatible with
CONFIG
. Upscale/downscale is done with bilinear interpolation. (Feature Pyramid Networks for Object Detection).n_scales = 3 Ex: Base with single FPN layer ------------ ------------------------------ -> o -> o -> o -> low-resolution | ^ | v | v -> o -> o -> o -> | ^ | v | v -> o -> o -> o -> high-resolution ^ | o ^ | input
NoFPNLayer¶
-
class
NoFPNLayer
(config: tensormonk.detection.config.CONFIG)[source]¶ Residual DepthWiseSeparable is used as base block.
n_scales = 3 ------------ Ex: Base with single FPN layer Pretrained | Detection Layers Ex: ResNet | with anchors -----------|----------------- o | -> o ^ | | | o | -> o ^ | | | o | -> o ^ | | | o | ^ | | | | input |
PAFPNLayer¶
-
class
PAFPNLayer
(config: tensormonk.detection.config.CONFIG)[source]¶ A modified version of PAFPN compatible with
CONFIG
. Upscale/downscale is done with bilinear interpolation. (Path aggregation network for instance segmentation).Logic: n_scales = 3 -------------------- low-resolution -> o -> o -> | ^ v | -> o -> o -> | ^ v | high-resolution -> o -> o ->
Responses¶
-
class
Responses
(label: torch.Tensor, score: torch.Tensor, boxes: torch.Tensor, point: torch.Tensor, objectness: torch.Tensor, centerness: torch.Tensor)[source]¶ An object with all the predictions from anchor detector. The list of properties are (
label
,score
,boxes
,point
,objectness
,centerness
)- Parameters
label (torch.Tensor/None) – Predicted labels.
score (torch.Tensor/None) – Predicted scores.
boxes (torch.Tensor/None) – Predicted boxes (encoded).
point (torch.Tensor/None) – Predicted point (encoded)
objectness (torch.Tensor/None) – Predicted objectness.
centerness (torch.Tensor/None) – Predicted centerness.
Sample¶
-
class
Sample
(image: str, labels: numpy.ndarray, boxes: numpy.ndarray, points: Optional[numpy.ndarray] = None)[source]¶ Sample is an object that contains image path, labels, bounding boxes and points for object detection tasks that can localize landmark. It can augment data (random 90/180/270 rotates, random pad and random cropping) during training – boxes and points are adjusted accordingly. The image can be resized along with boxes and points if Sample.OSIZE is initialized.
- Attributes (are set once):
- INVALID (float): In cases where some points are not available set the
value to float(“nan”). This allows to track those points after augmentation (must be filtered during loss computation – tensormonk.loss.PointLoss automatically handles it). default =
float("nan")
- OSIZE (tuple): (width, height) of output image, when not set returns
image without resize along with its attributes (boxes and points) after augmentation. default =
None
- RESIZE (bool): When True along with OSIZE != None will resize the image
during augmentation and adjust the boxes and points to new image size.
- ROTATE_90 (bool): Enables random rotation (90/180/270).
default =
True
- ROTATE_90_PROBS (tuple): Probability of ROTATE_90.
default =
(0.4, 0.6, 0.8)
. 40%, 20%, 20% and 20% probable to rotate 0, 90, 180, and 270 degrees respectively- PAD (bool): Does random padding.
default =
True
- PAD_PERCENTAGE (float): Maximum percentage of height and width that
is padded. Must be 0 < PAD_PERCENTAGE < 1. default =
0.1
- CROP (bool): Does random cropping.
default =
True
- CROP_MIN_SIDE_PERCENTAGE (float): Minimum percentage of the size that
must be retained. Must be 0 < CROP_MIN_SIDE_PERCENTAGE < 1. default =
0.3
- CROP_MIN_OBJECT_SIDE (int): Minimum side of the object that has to be
maintained after crop and resize. In case of multiple objects, at least one object will have min(w, h) >= CROP_MIN_OBJECT_SIDE. Must be 0 < CROP_MIN_OBJECT_SIDE < min(Sample.OSIZE). default =
16
- CROP_N_ATTEMPTS (int): Number of attempts to find random crop, when
failed randomly selects one object and extracts a crop around it. Depends on cpu (a larger number can slow down dataloader). default =
16
- RETAIN_AREA (float): An object is retained only if
original area * RETAIN_AREA >= visible area after a crop. default =
0.5
- Parameters
image (str, required) – Full path to image (does not accept ndarray or pillow image since large dataset can not fit in memory)
labels (list/tuple/np.ndarray, required) – labels of all the objects in the image. In order to use
LabelLoss
use 0 for background.boxes (list/tuple/np.ndarray, required) – bounding boxes for all the labels. Must be in pixel coordinates and ltrb form (
left
,top
,right
,bottom
)points (list/tuple/np.ndarray, optional) – [x, y, x, y, …] points of all the bounding boxes. If points for some objects are missing use float(“nan”) and maintain all the labels to have same number of points. When not required use None.
import torch from tensormonk.detection import Sample from torchvision import transforms Sample.OSIZE = 320, 320 Sample.RESIZE = True Sample.ROTATE_90 = False Sample.PAD = False Sample.CROP = True Sample.CROP_MIN_SIDE_PERCENTAGE = 0.3 Sample.CROP_MIN_OBJECT_SIDE = 16 Sample.CROP_N_ATTEMPTS = 8 data = [["./image1.jpg", [1], [[4, 6, 4, 6]]], ["./image2.jpg", [4, 6], [[4, 6, 4, 6], [2, 6, 3, 6]]]] class SomeDB(object): def __init__(self, data, osize: tuple): self.samples = [] for x in data: self.samples.append( Sample(image=x[0], labels=x[1], boxes=x[2], points=None)) self.transforms = transforms.RandomApply( [transforms.ColorJitter(0.1, 0.1, 0.1, 0.1), transforms.RandomGrayscale(p=0.25), transforms.ToTensor()]) def __len__(self): return len(self.samples) def __getitem__(self, idx): image, labels, boxes, points = self.samples[idx].augmented() tensor = self.transforms(image) labels = torch.from_numpy(labels).long() boxes = torch.from_numpy(boxes).float() if points is None: return image, labels, boxes points = torch.from_numpy(points).float() return image, labels, boxes, points dataset = SomeDB(data, (320, 320)) # To check how augmentation is working use the following to visualize dataset.samples[0].annotate_augmented() # To visualize original data dataset.samples[0].annotate()
-
annotate
(ids: list = [], image: Optional[PIL.Image.Image] = None, boxes: Optional[numpy.ndarray] = None, points: Optional[numpy.ndarray] = None)[source]¶ Annotates boxes and points on the image.
-
property
boxes
¶ A property that returns a copy all of boxes (np.ndarray) on the image in ltrb format.
-
property
boxes_cxcywh
¶ A property that returns a copy all of boxes (np.ndarray) on the image in cxcywh format.
-
property
image
¶ A property that returns pil image (reads every time).
-
property
image_name
¶ A property that returns image full path.
-
property
labels
¶ A property that returns a copy all of labels (np.ndarray) on the image.
-
property
points
¶ A property that returns a copy all of points (np.ndarray) on the image in pixel coordinates.