PyTorch

init parameters

PyTorch handles this automatically. But there are definitely cases where you need to initialize parameters manually. The basic approach looks like this:

1
class MultiLayerPerceptronClass(nn.Module):
2
    """
3
        Multilayer Perceptron (MLP) Class
4
    """
5
    def __init__(self,name='mlp',xdim=784,hdim=256,ydim=10):
6
        super(MultiLayerPerceptronClass,self).__init__()
7
        self.name = name
8
        self.xdim = xdim
9
        self.hdim = hdim
10
        self.ydim = ydim
11
        self.lin_1 = nn.Linear(
12
            # FILL IN HERE
13
        )
14
        self.lin_2 = nn.Linear(
15
            # FILL IN HERE
16
        )
17
        self.init_param() # initialize parameters
18

19
    def init_param(self):
20
        nn.init.kaiming_normal_(self.lin_1.weight)
21
        nn.init.zeros_(self.lin_1.bias)
22
        nn.init.kaiming_normal_(self.lin_2.weight)
23
        nn.init.zeros_(self.lin_2.bias)
24

25
    def forward(self,x):
26
        net = x
27
        net = self.lin_1(net)
28
        net = F.relu(net)
29
        net = self.lin_2(net)
30
        return net
31

32
M = MultiLayerPerceptronClass(name='mlp',xdim=784,hdim=256,ydim=10).to(device)
33
loss = nn.CrossEntropyLoss()
34
optm = optim.Adam(M.parameters(),lr=1e-3)
35
print ("Done.")

session

A big advantage of PyTorch is the absence of sessions. TensorFlow also dropped sessions from v2 onward. Without sessions, you can run a forward pass directly as shown below.

forward

You don’t strictly need to call forward explicitly — PyTorch handles it automatically. But being explicit makes code easier to read.

1
x_numpy = np.random.rand(2,784)
2
x_torch = torch.from_numpy(x_numpy).float().to(device)
3
y_torch = M.forward(x_torch) # forward path
4
# y_torch = M(x_torch) # forward path
5
y_numpy = y_torch.detach().cpu().numpy() # torch tensor to numpy array
6
print ("x_numpy:\n",x_numpy)
7
print ("x_torch:\n",x_torch)
8
print ("y_torch:\n",y_torch)
9
print ("y_numpy:\n",y_numpy)

model.eval()

I used to have a vague understanding of this, so here’s a proper summary.

Layers like BatchNormalization and Dropout are meant only for training and should not be active during prediction. To disable them, treat calling model.eval() before prediction as a standard practice.

view

A function that reshapes a tensor while keeping the total number of elements the same. It’s the equivalent of numpy’s reshape. Passing -1 for a dimension lets PyTorch figure out the size automatically.

1
batch_in.view(-1, 28*28)

item

All values are managed as tensor objects. If you want to convert one to a plain scalar (e.g., a float), use item.

1
n_correct += (y_pred==y_trgt).sum().item()

train

1
print ("Start training.")
2
M.init_param() # initialize parameters
3
M.train()
4
EPOCHS,print_every = 10,1
5
for epoch in range(EPOCHS):
6
    loss_val_sum = 0
7
    for batch_in,batch_out in train_iter:
8
        # Forward path
9
        y_pred = M.forward(batch_in.view(-1, 28*28).to(device))
10
        loss_out = loss(y_pred,batch_out.to(device))
11
        # Update
12
        optm.zero_grad()      # reset gradient
13
        loss_out.backward()      # backpropagate
14
        optm.step()      # optimizer update
15
        loss_val_sum += loss_out
16
    loss_val_avg = loss_val_sum/len(train_iter)
17
    # Print
18
    if ((epoch%print_every)==0) or (epoch==(EPOCHS-1)):
19
        train_accr = func_eval(M,train_iter,device)
20
        test_accr = func_eval(M,test_iter,device)
21
        print ("epoch:[%d] loss:[%.3f] train_accr:[%.3f] test_accr:[%.3f]."%
22
               (epoch,loss_val_avg,train_accr,test_accr))
23
print ("Done")

optm.zero_grad()

Earlier we defined the optimizer as follows, specifying which parameters to train:

1
optm = optim.Adam(M.parameters(),lr=1e-3)

zero_grad() resets the gradients of those parameters to zero.

loss()

This is the loss function defined earlier as cross entropy. Passing the model output y_pred and the training labels batch_out returns a weight object representing the loss.

backward()

Performs backpropagation for each weight.

step()

Updates the parameters using the optimizer’s learning rate and other hyperparameters.