pytorch使用dropout防止过拟合

2022-10-17 11:17:56

推测与验证

现在您已经学会了训练网络，可以将其用于进行预测。这通常称为推测，是从统计信息中借用的术语。但是，神经网络倾向于在训练数据上表现得太好，并且无法将其推广到以前从未见过的数据，这称为过拟合，它会削弱推理性能。为了在训练过程中发现过拟合，我们不在训练集中测试，而在验证集测试性能。我们在训练过程中监控验证效果时，通过正则化（例如dropout）来避免过拟合。我将向您展示如何在PyTorch中执行此操作。

像之前一样，让我们开始通过Torchvision加载数据集。在后面的部分中，您将学到更多有关Torchvision和加载数据的信息。这次，我们将利用可以通过在此处设置train = False来获得的测试集：

testset= datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)

测试集包含图像，就像训练集一样。通常，您会看到原始数据集的10-20％用于测试和验证，其余的用于训练。

import torchfrom torchvisionimport datasets, transforms# Define a transform to normalize the data
transform= transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,),(0.5,))])# Download and load the training data
trainset= datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader= torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)# Download and load the test data
testset= datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader= torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

在这里，我将创建一个普通神经网络的模型。

from torchimport nn, optimimport torch.nn.functionalas FclassClassifier(nn.Module):def__init__(self):super().__init__()
        self.fc1= nn.Linear(784,256)
        self.fc2= nn.Linear(256,128)
        self.fc3= nn.Linear(128,64)
        self.fc4= nn.Linear(64,10)defforward(self, x):# make sure input tensor is flattened
        x= x.view(x.shape[0],-1)
        
        x= F.relu(self.fc1(x))
        x= F.relu(self.fc2(x))
        x= F.relu(self.fc3(x))
        x= F.log_softmax(self.fc4(x), dim=1)return x

根据不属于训练集的数据，来验证模型的性能。通常，这只是准确性，即网络正确预测的类别的百分比。其他选项包括precision，recall 和top-5 error rate。我们将在这里着重于准确性。首先，我将对测试集中一组数据进行正向传播。

model= Classifier()

images, labels=next(iter(testloader))# Get the class probabilities
ps= torch.exp(model(images))# Make sure the shape is appropriate, we should get 10 class probabilities for 64 examplesprint(ps.shape)

torch.Size([64, 10])

有了这些概率，我们可以使用ps.topk方法获得最可能的类。这将返回k个最大值。由于我们只想要最可能的类，因此可以使用ps.topk（1）。这将返回前k个值和前k个索引的元组。如果最高值为第五个元素，我们将取回4作为索引。

top_p, top_class= ps.topk(1,dim=1)# Look at the most likely classes for the first 10 examplesprint(top_class[:5,:])

tensor([[0],
        [4],
        [4],
        [4],
        [6]])

现在我们可以检查预测的类是否与标签匹配。通过将top_class和labels等同起来很容易做到，但是我们必须注意形状。这里top_class是形状为（64，1）的2D张量，而标签为形状（64）的1D。为了使相等性按照我们想要的方式工作，top_class和labels必须具有相同的形状。

我们这样做

equals= top_class== labels

equals 将具有形状（64，64），请自己尝试。它的作用是将top_class的每一行中的一个元素与标签中的每个元素进行比较，从而为每一行返回64个True / False布尔值。

equals= top_class== labels.view(*top_class.shape)

现在我们需要计算正确预测的百分比。equals的值是0或1。这意味着，如果我们将所有值相加并除以值的数量，就可以得出正确预测的百分比。这与取平均值的操作相同，因此我们可以通过调用torch.mean获得准确性。如果就这么简单。如果您尝试使用torch.mean(equals)，则会出现错误。

RuntimeError: mean is not implemented for type torch.ByteTensor

发生这种情况是因为equals的类型为torch.ByteTensor，但没有为该类型的张量实现torch.mean。因此，我们需要将等于转换为浮点张量。请注意，当我们使用torch.mean时，它返回一个标量张量，要获取实际值作为浮点数，我们需要执行precision.item()。

accuracy= torch.mean(equals.type(torch.FloatTensor))print(f'Accuracy: {accuracy.item()*100}%')

Accuracy: 10.9375%

该网络是未经训练的，因此会进行随机猜测，我们应该看到10％左右的准确性。现在，让我们训练网络并包括验证集测试，以便我们可以衡量网络在测试集上的表现是否良好。由于我们不在验证阶段更新参数，因此可以通过使用torch.no_grad（）关闭t梯度来加快代码的速度：

# turn off gradientswith torch.no_grad():# validation pass herefor images, labelsin testloader:...

练习：在下面实施验证，并在循环后打印出总精度。您应该能够获得80％以上的准确度。

model= Classifier()
criterion= nn.NLLLoss()
optimizer= optim.Adam(model.parameters(), lr=0.003)

epochs=30
steps=0

train_losses, test_losses=[],[]for einrange(epochs):
    running_loss=0for images, labelsin trainloader:
        
        optimizer.zero_grad()
        
        log_ps= model(images)
        loss= criterion(log_ps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss+= loss.item()else:
        test_loss=0
        accuracy=0# Turn off gradients for validation, saves memory and computationswith torch.no_grad():for images, labelsin testloader:
                log_ps= model(images)
                test_loss+= criterion(log_ps, labels)
                
                ps= torch.exp(log_ps)
                top_p, top_class= ps.topk(1, dim=1)
                equals= top_class== labels.view(*top_class.shape)
                accuracy+= torch.mean(equals.type(torch.FloatTensor))
                
        train_losses.append(running_loss/len(trainloader))
        test_losses.append(test_loss/len(testloader))print("Epoch: {}/{}.. ".format(e+1, epochs),"Training Loss: {:.3f}.. ".format(running_loss/len(trainloader)),"Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))

Epoch: 1/30..  Training Loss: 0.514..  Test Loss: 0.420..  Test Accuracy: 0.845
Epoch: 2/30..  Training Loss: 0.392..  Test Loss: 0.422..  Test Accuracy: 0.844
Epoch: 3/30..  Training Loss: 0.354..  Test Loss: 0.387..  Test Accuracy: 0.862
''''''
Epoch: 28/30..  Training Loss: 0.190..  Test Loss: 0.467..  Test Accuracy: 0.876
Epoch: 29/30..  Training Loss: 0.186..  Test Loss: 0.442..  Test Accuracy: 0.875
Epoch: 30/30..  Training Loss: 0.178..  Test Loss: 0.455..  Test Accuracy: 0.883

%matplotlib inline%config InlineBackend.figure_format='retina'import matplotlib.pyplotas plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WnrvlvHD-1586226578465)(C:/Users/67231/Desktop/Part%205%20-%20Inference%20and%20Validation%20(Exercises)]/output_15_1.png)

过拟合

如果我们在训练网络时查看训练和验证损失，我们会看到一种称为过拟合的现象.

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Viubd6mu-1586226578465)(C:\Users\67231\Desktop\Dive-into-DL-PyTorch-master\deep-learning-v2-pytorch-master\deep-learning-v2-pytorch-master\intro-to-pytorch\assets\overfitting.png)]$

网络会越来越好地学习训练集，从而减少了训练损失。但是，它开始出现问题，无法推广到训练集之外的数据，从而导致验证损失增加。任何深度学习模型的最终目标都是对新数据进行预测，因此我们应努力使验证损失降至最低。一种选择是使用模型的验证损失最小，这里是大约8-10个训练时期。此策略称为提前停止。实际上，您在训练时会经常保存模型，然后选择验证损失最小的模型。

减少过度拟合的最常见方法是dropout，我们会随机丢弃输入单元。这迫使网络在权重之间共享信息，从而增强了泛化为新数据的能力。使用nn.Dropout模块可以很容易地在PyTorch中添加dropout。

classClassifier(nn.Module):def__init__(self):super().__init__()
        self.fc1= nn.Linear(784,256)
        self.fc2= nn.Linear(256,128)
        self.fc3= nn.Linear(128,64)
        self.fc4= nn.Linear(64,10)# Dropout module with 0.2 drop probability
        self.dropout= nn.Dropout(p=0.2)defforward(self, x):# make sure input tensor is flattened
        x= x.view(x.shape[0],-1)# Now with dropout
        x= self.dropout(F.relu(self.fc1(x)))
        x= self.dropout(F.relu(self.fc2(x)))
        x= self.dropout(F.relu(self.fc3(x)))# output so no dropout here
        x= F.log_softmax(self.fc4(x), dim=1)return x

在训练过程中，我们希望使用dropout来防止过拟合，但是在推理过程中，我们希望使用整个网络。因此，在验证，测试以及使用网络进行预测的任何时候，我们都需要关闭dropout。为此，请使用model.eval（）。这会将模型设置为dropout概率为0的评估模式。您可以通过使用model.train（）将模型设置为训练模式来重新启用dropout。通常，验证循环的模式如下所示：关闭梯度，将模型设置为评估模式，计算验证损失和度量，然后将模型设置回训练模式。

# turn off gradients
with torch.no_grad():
    
    # set model to evaluation mode
    model.eval()
    
    # validation pass here
    for images, labels in testloader:
        ...

# set model back to train mode
model.train()

练习：将dropout添加到模型中，然后再次在Fashion-MNIST上进行训练。看看是否可以获得更低的验证损失或更高的准确性。

## TODO: Define your model with dropout addedfrom torchimport nn,optimimport torch.nn.functionalas FclassClassifier(nn.Module):def__init__(self):super().__init__()
        self.fc1= nn.Linear(784,256)
        self.fc2= nn.Linear(256,128)
        self.fc3= nn.Linear(128,64)
        self.fc4= nn.Linear(64,10)
        self.dropout= nn.Dropout(p=0.3)defforward(self,x):
        x= x.view(x.shape[0],-1)
        x= self.dropout(F.relu(self.fc1(x)))
        x= self.dropout(F.relu(self.fc2(x)))
        x= self.dropout(F.relu(self.fc3(x)))
        x= F.log_softmax(self.fc4(x),dim=1)return x

## TODO: Train your model with dropout, and monitor the training progress with the validation loss and accuracy
model= Classifier()
criterion= nn.NLLLoss()
optimizer= optim.Adam(model.parameters(),lr=0.001)
epochs=30
steps=0

train_losses,test_losses=[],[]for einrange(epochs):
    running_loss=0for images, labelsin trainloader:
        optimizer.zero_grad()
        
        log_ps= model(images)
        loss= criterion(log_ps,labels)
        loss.backward()
        optimizer.step()
        
        running_loss+= loss.item()else:
        test_loss=0
        accuracy=0with torch.no_grad():
            model.eval()for images,labelsin testloader:
                log_ps= model(images)
                test_loss+= criterion(log_ps,labels)
                
                ps= torch.exp(log_ps)
                top_p,top_class= ps.topk(1,dim=1)
                equals= top_class== labels.view(*top_class.shape)
                accuracy+= torch.mean(equals.type(torch.FloatTensor))
        model.train()
        train_losses.append(running_loss/len(trainloader))
        test_losses.append(test_loss/len(testloader))print("Epoch: {}/{}.. ".format(e+1, epochs),"Training Loss: {:.3f}..".format(running_loss/len(trainloader)),"Test Loss: {:.3f}.. ".format(test_loss/len(testloader)),"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))

Epoch: 1/30..  Training Loss: 0.653..  Test Loss: 0.469..  Test Accuracy: 0.827
Epoch: 2/30..  Training Loss: 0.477..  Test Loss: 0.434..  Test Accuracy: 0.845
Epoch: 3/30..  Training Loss: 0.436..  Test Loss: 0.397..  Test Accuracy: 0.850
Epoch: 4/30..  Training Loss: 0.408..  Test Loss: 0.396..  Test Accuracy: 0.857
''''''
Epoch: 28/30..  Training Loss: 0.276..  Test Loss: 0.324..  Test Accuracy: 0.886
Epoch: 29/30..  Training Loss: 0.275..  Test Loss: 0.352..  Test Accuracy: 0.882
Epoch: 30/30..  Training Loss: 0.271..  Test Loss: 0.341..  Test Accuracy: 0.884

%matplotlib inline%config InlineBackend.figure_format='retina'import matplotlib.pyplotas plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-F63ax3rr-1586226578465)(C:/Users/67231/Desktop/Part%205%20-%20Inference%20and%20Validation%20(Exercises)]/output_21_1.png)

推测

现在已经对模型进行了训练，我们可以将其用于推测。我们之前已经做过，但是现在我们需要记住使用model.eval（）将模型设置为推测模式。你还需要使用torch.no_grad（）关闭自动梯度计算。

# Import helper module (should be in the repo)import helper# Test out your network!
model.eval()

dataiter=iter(testloader)
images, labels= dataiter.next()
img= images[0]# Convert 2D image to 1D vector
img= img.view(1,784)# Calculate the class probabilities (softmax) for imgwith torch.no_grad():
    output= model.forward(img)

ps= torch.exp(output)# Plot the image and probabilities
helper.view_classify(img.view(1,28,28), ps, version='Fashion')

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nwV9eAGm-1586226578466)(C:/Users/67231/Desktop/Part%205%20-%20Inference%20and%20Validation%20(Exercises)]/output_23_0.png)

下次内容

在下一内容中，我将向您展示如何保存训练好的模型。通常，您不需要每次都需要训练模型。相反，您将训练一次，保存它，然后在您想进行更多训练时加载模型，或者用于推断。