tensormonk.layers

Attention, convolution, linear and vision layers.

CondConv2d

class CondConv2d(tensor_size: tuple, n_experts: int, filter_size: int, out_channels: int, strides: int = 1, pad: bool = True, groups: int = 1)[source]

Conditional Convolution (“CondConv: Conditionally Parameterized Convolutions for Efficient Inference”).

Parameters
  • tensor_size (tuple, required) – Input tensor shape in BCHW (None/any integer >0, channels, height, width).

  • n_kernels (int, required) – number of kernels that are used for routing.

  • filter_size (tuple/int, required) – size of kernel, integer or tuple of length 2.

  • out_channels (int, required) – output tensor.size(1)

  • strides (int/tuple, optional) – integer or tuple of length 2, (default=:obj:1).

  • pad (bool, optional) – When True, pads to replicates input size for strides=1 (default=:obj:True).

  • groups (int, optional) – Enables grouped convolution (default=:obj:1).

Return type

torch.Tensor

# TODO: Include normalization and activation similar to Convolution?

Attention Layers

All attention layers.

LocalAttention

class LocalAttention(tensor_size: tuple, filter_size: Union[int, tuple], out_channels: int, strides: int = 1, groups: int = 4, bias: bool = False, replicate_paper: bool = True, normalize_offset: bool = False, **kwargs)[source]

LocalAttention (“Stand-Alone Self-Attention in Vision Models”).

Parameters
  • tensor_size (tuple, required) – Input tensor shape in BCHW (None/any integer >0, channels, height, width).

  • filter_size (int/tuple, required) – size of kernel, integer or list/tuple of length 2.

  • out_channels (int, required) – output tensor.size(1)

  • strides (int/tuple, optional) – convolution stride (default = 1).

  • groups (int, optional) – enables grouped convolution (default = 4).

  • bias (bool) – When True, key, query & value 1x1 convolutions have bias (default = False).

  • replicate_paper (bool, optional) – When False, relative attention logic is different from that of paper (default = True).

  • normalize_offset (bool, optional) – When True (and replicate_paper = False), normalizes the row and column offsets (default = False).

Return type

torch.Tensor

SelfAttention

class SelfAttention(tensor_size: tuple, shrink: int = 8, scale_factor: float = 1.0, return_attention: bool = False, **kwargs)[source]

Self-Attention (“Self-Attention Generative Adversarial Networks”).

Parameters
  • tensor_size (tuple, required) – Input tensor shape in BCHW (None/any integer >0, channels, height, width).

  • shrink (int, optional) – Used to compute output channels of key and query, i.e, int(tensor_size[1] / shrink), (default = 8).

  • scale_factor (float, optional) – Scale at which attention is computed. (use scale_factor <1 for speed). When scale_factor != 1, input is scaled using nearest neighbor interpolation (default = 1).

  • return_attention (bool, optional) – When True, returns a tuple (output, attention) (default = False).

Return type

torch.Tensor

LucasKanade

class LucasKanade(n_steps: int = 64, width: int = 15, sigma: Optional[int] = None)[source]

Lucas-Kanade tracking (based on “Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors”).

A cleaner version based on original repo with some corrections (yx must be xy) and speed improvements.

Parameters
  • n_steps (int, optional) – n correction steps (default: 64).

  • width (int, optional) – Width of patches (default: 15).

  • sigma (float, optional) – Sigma for gaussian kernel (default: None).

Return type

torch.Tensor

forward(frame_t0: torch.Tensor, frame_t1: torch.Tensor, points_xy: torch.Tensor)[source]

Tracks points_xy on frame_t0 to frame_t1.

Parameters
  • frame_t0 (torch.Tensor) – 4D tensor of shape BCHW.

  • frame_t1 (torch.Tensor) – 4D tensor of shape BCHW.

  • points_xy (torch.Tensor) – 3D tensor of shape B x n_points x 2.

Return type

torch.Tensor