Filling Torch Tensor With Zeros After Certain Index

December 11, 2023 Post a Comment

Given a 3d tenzor, say: batch x sentence length x embedding dim a = torch.rand((10, 1000, 96)) and an array(or tensor) of actual lengths for each sentence lengths = torch .randi

Solution 1:

You can do it using a binary mask. Using lengths as column-indices to mask we indicate where each sequence ends (note that we make mask longer than a.size(1) to allow for sequences with full length). Using cumsum() we set all entries in mask after the seq len to 1.

mask = torch.zeros(a.shape[0], a.shape[1] + 1, dtype=a.dtype, device=a.device)
mask[(torch.arange(a.shape[0]), lengths)] = 1
mask = mask.cumsum(dim=1)[:, :-1]  # remove the superfluous column
a = a * (1. - mask[..., None])     # use mask to zero after each column

For a.shape = (10, 5, 96), and lengths = [1, 2, 1, 1, 3, 0, 4, 4, 1, 3]. Assigning 1 to respective lengths at each row, mask looks like:

mask = 
tensor([[0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.]])

After cumsum you get

mask = 
tensor([[0., 1., 1., 1., 1.],
        [0., 0., 1., 1., 1.],
        [0., 1., 1., 1., 1.],
        [0., 1., 1., 1., 1.],
        [0., 0., 0., 1., 1.],
        [1., 1., 1., 1., 1.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 1.],
        [0., 1., 1., 1., 1.],
        [0., 0., 0., 1., 1.]])

Note that it exactly has zeros where the valid sequence entries are and ones beyond the lengths of the sequences. Taking 1 - mask gives you exactly what you want.

Enjoy ;)

Python Guru

Filling Torch Tensor With Zeros After Certain Index

Solution 1:

Post a Comment for "Filling Torch Tensor With Zeros After Certain Index"