Tech News
← Back to articles

PyTorch Reshaping with None

read original related products more articles

PyTorch Reshaping with None

Currently I am learning attention mechanism from Dive into Deep Learning book. In the book I see following implementation in masked softmax:

def sequence_mask (X, valid_len, value = - 1e6 ): """ X is 2D array (number_of_points, maxlen), valid_len is 1D array (number_of_points)""" max_len = X . size( 1 ) mask = torch . arange(max_len, dtype = torch . float32, device = X . device)[ None , :] < valid_len[:, None ] X[ ~ mask] = value return X

In sequential data processing, I mean processing natural language. The sequence length might be variable for each data point. For example :

1 : "Welcome To My Blog"

2 : "Hello World"

To solve that problem , we fill remaining values with a special token.

1 : "Welcome To My Blog"

2 : "Hello World blnk blnk"

In attention, we do not want to attend to blnk tokens. So we create mask for that. In the code portion max_len is the maximum length of the sequence and valid_len is the actual length of the sequence. I mean for 1st data point valid_len is 3 and for 2nd data point valid_len is 2.

... continue reading