Skip to content Skip to sidebar Skip to footer

Filling Torch Tensor With Zeros After Certain Index

Given a 3d tenzor, say: batch x sentence length x embedding dim a = torch.rand((10, 1000, 96)) and an array(or tensor) of actual lengths for each sentence lengths = torch .randi

Solution 1:

You can do it using a binary mask. Using lengths as column-indices to mask we indicate where each sequence ends (note that we make mask longer than a.size(1) to allow for sequences with full length). Using cumsum() we set all entries in mask after the seq len to 1.

mask = torch.zeros(a.shape[0], a.shape[1] + 1, dtype=a.dtype, device=a.device)
mask[(torch.arange(a.shape[0]), lengths)] = 1
mask = mask.cumsum(dim=1)[:, :-1]  # remove the superfluous column
a = a * (1. - mask[..., None])     # use mask to zero after each column

For a.shape = (10, 5, 96), and lengths = [1, 2, 1, 1, 3, 0, 4, 4, 1, 3]. Assigning 1 to respective lengths at each row, mask looks like:

mask = 
tensor([[0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.]])

After cumsum you get

mask = 
tensor([[0., 1., 1., 1., 1.],
        [0., 0., 1., 1., 1.],
        [0., 1., 1., 1., 1.],
        [0., 1., 1., 1., 1.],
        [0., 0., 0., 1., 1.],
        [1., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 1.],
        [0., 1., 1., 1., 1.],
        [0., 0., 0., 1., 1.]])

Note that it exactly has zeros where the valid sequence entries are and ones beyond the lengths of the sequences. Taking 1 - mask gives you exactly what you want.

Enjoy ;)

Post a Comment for "Filling Torch Tensor With Zeros After Certain Index"