init parameters
PyTorch handles this automatically. But there are definitely cases where you need to initialize parameters manually. The basic approach looks like this:
class MultiLayerPerceptronClass(nn.Module): """ Multilayer Perceptron (MLP) Class """ def __init__(self,name='mlp',xdim=784,hdim=256,ydim=10): super(MultiLayerPerceptronClass,self).__init__() self.name = name self.xdim = xdim self.hdim = hdim self.ydim = ydim self.lin_1 = nn.Linear( # FILL IN HERE ) self.lin_2 = nn.Linear( # FILL IN HERE ) self.init_param() # initialize parameters
def init_param(self): nn.init.kaiming_normal_(self.lin_1.weight) nn.init.zeros_(self.lin_1.bias) nn.init.kaiming_normal_(self.lin_2.weight) nn.init.zeros_(self.lin_2.bias)
def forward(self,x): net = x net = self.lin_1(net) net = F.relu(net) net = self.lin_2(net) return net
M = MultiLayerPerceptronClass(name='mlp',xdim=784,hdim=256,ydim=10).to(device)loss = nn.CrossEntropyLoss()optm = optim.Adam(M.parameters(),lr=1e-3)print ("Done.")session
A big advantage of PyTorch is the absence of sessions. TensorFlow also dropped sessions from v2 onward. Without sessions, you can run a forward pass directly as shown below.
forward
You don’t strictly need to call forward explicitly — PyTorch handles it automatically. But being explicit makes code easier to read.
x_numpy = np.random.rand(2,784)x_torch = torch.from_numpy(x_numpy).float().to(device)y_torch = M.forward(x_torch) # forward path# y_torch = M(x_torch) # forward pathy_numpy = y_torch.detach().cpu().numpy() # torch tensor to numpy arrayprint ("x_numpy:\n",x_numpy)print ("x_torch:\n",x_torch)print ("y_torch:\n",y_torch)print ("y_numpy:\n",y_numpy)model.eval()
I used to have a vague understanding of this, so here’s a proper summary.
Layers like BatchNormalization and Dropout are meant only for training and should not be active during prediction. To disable them, treat calling model.eval() before prediction as a standard practice.
view
A function that reshapes a tensor while keeping the total number of elements the same. It’s the equivalent of numpy’s reshape. Passing -1 for a dimension lets PyTorch figure out the size automatically.
batch_in.view(-1, 28*28)item
All values are managed as tensor objects. If you want to convert one to a plain scalar (e.g., a float), use item.
n_correct += (y_pred==y_trgt).sum().item()train
print ("Start training.")M.init_param() # initialize parametersM.train()EPOCHS,print_every = 10,1for epoch in range(EPOCHS): loss_val_sum = 0 for batch_in,batch_out in train_iter: # Forward path y_pred = M.forward(batch_in.view(-1, 28*28).to(device)) loss_out = loss(y_pred,batch_out.to(device)) # Update optm.zero_grad() # reset gradient loss_out.backward() # backpropagate optm.step() # optimizer update loss_val_sum += loss_out loss_val_avg = loss_val_sum/len(train_iter) # Print if ((epoch%print_every)==0) or (epoch==(EPOCHS-1)): train_accr = func_eval(M,train_iter,device) test_accr = func_eval(M,test_iter,device) print ("epoch:[%d] loss:[%.3f] train_accr:[%.3f] test_accr:[%.3f]."% (epoch,loss_val_avg,train_accr,test_accr))print ("Done")optm.zero_grad()
Earlier we defined the optimizer as follows, specifying which parameters to train:
optm = optim.Adam(M.parameters(),lr=1e-3)zero_grad() resets the gradients of those parameters to zero.
loss()
This is the loss function defined earlier as cross entropy. Passing the model output y_pred and the training labels batch_out returns a weight object representing the loss.
backward()
Performs backpropagation for each weight.
step()
Updates the parameters using the optimizer’s learning rate and other hyperparameters.