PyTorch Reshaping with None
Currently I am learning attention mechanism from Dive into Deep Learning book. In the book I see following implementation in masked softmax:
def sequence_mask (X, valid_len, value = - 1e6 ): """ X is 2D array (number_of_points, maxlen), valid_len is 1D array (number_of_points)""" max_len = X . size( 1 ) mask = torch . arange(max_len, dtype = torch . float32, device = X . device)[ None , :] < valid_len[:, None ] X[ ~ mask] = value return X
In sequential data processing, I mean processing natural language. The sequence length might be variable for each data point. For example :
1 : "Welcome To My Blog"
2 : "Hello World"
To solve that problem , we fill remaining values with a special token.
1 : "Welcome To My Blog"
2 : "Hello World blnk blnk"
In attention, we do not want to attend to blnk tokens. So we create mask for that. In the code portion max_len is the maximum length of the sequence and valid_len is the actual length of the sequence. I mean for 1st data point valid_len is 3 and for 2nd data point valid_len is 2.
... continue reading