摘要：问题描述：如何初始化网络的权重和偏差（例如通过 He 或 Xavier 初始化）？解决方案 1：单层要初始化单个层的权重，请使用中的函数torch.nn.init。例如：conv1 = torch.nn.Conv2d(...) torch.nn.init.xavier_uniform(conv1.weigh...

问题描述：

如何初始化网络的权重和偏差（例如通过 He 或 Xavier 初始化）？

解决方案 1：

单层

要初始化单个层的权重，请使用中的函数torch.nn.init。例如：

conv1 = torch.nn.Conv2d(...)
torch.nn.init.xavier_uniform(conv1.weight)

conv1.weight.data或者，您可以通过写入（即）来修改参数torch.Tensor。例如：

conv1.weight.data.fill_(0.01)

这同样适用于偏见：

conv1.bias.data.fill_(0.01)

`nn.Sequential`或自定义`nn.Module`

将初始化函数传递给torch.nn.Module.apply。它将以nn.Module递归方式初始化整个中的权重。

apply( fn )：递归应用于fn每个子模块（由返回.children()）以及自身。典型用途包括初始化模型的参数（另请参阅 torch-nn-init）。

例子：

def init_weights(m):
    if isinstance(m, nn.Linear):
        torch.nn.init.xavier_uniform(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

解决方案 2：

我们比较了使用相同神经网络（NN）架构的不同权重初始化模式。

全零或全一

如果你遵循奥卡姆剃刀原则，你可能会认为将所有权重设置为 0 或 1 是最好的解决方案。事实并非如此。

由于每个权重都相同，因此每层的所有神经元都会产生相同的输出。这使得很难决定调整哪些权重。

    # initialize two NN's with 0 and 1 constant weights
    model_0 = Net(constant_weight=0)
    model_1 = Net(constant_weight=1)

经过 2 个时期后：

训练损失图，权重初始化为常数

Validation Accuracy
9.625% -- All Zeros
10.050% -- All Ones
Training Loss
2.304  -- All Zeros
1552.281  -- All Ones

统一初始化

均匀分布从一组数字中选取任意数字的概率均等。

让我们看看神经网络使用均匀权重初始化训练的效果如何，其中low=0.0和high=1.0。

下面，我们将看到另一种初始化网络权重的方法（除了 Net 类代码之外）。要在模型定义之外定义权重，我们可以：

定义一个根据网络层类型分配权重的函数，然后
将这些权重应用于使用初始化的模型model.apply(fn)，该模型将一个函数应用于每个模型层。

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # apply a uniform distribution to the weights and a bias=0
            m.weight.data.uniform_(0.0, 1.0)
            m.bias.data.fill_(0)

    model_uniform = Net()
    model_uniform.apply(weights_init_uniform)

经过 2 个时期后：

在此处输入图片描述

Validation Accuracy
36.667% -- Uniform Weights
Training Loss
3.208  -- Uniform Weights

设定权重的一般规则

设置神经网络中权重的一般规则是将其设置得接近零，但不要太小。

好的做法是在 [-y, y] 范围内开始设定权重，其中y=1/sqrt(n)
（n 是给定神经元的输入数量）。

    # takes in a module and applies the specified weight initialization
    def weights_init_uniform_rule(m):
        classname = m.__class__.__name__
        # for every Linear layer in a model..
        if classname.find('Linear') != -1:
            # get the number of the inputs
            n = m.in_features
            y = 1.0/np.sqrt(n)
            m.weight.data.uniform_(-y, y)
            m.bias.data.fill_(0)

    # create a new model with these weights
    model_rule = Net()
    model_rule.apply(weights_init_uniform_rule)

下面我们比较一下权重用均匀分布 [-0.5,0.5) 初始化的 NN 的性能与用一般规则初始化权重的 NN 的性能

经过 2 个时期后：

显示权重均匀初始化与初始化一般规则的性能的图

Validation Accuracy
75.817% -- Centered Weights [-0.5, 0.5)
85.208% -- General Rule [-y, y)
Training Loss
0.705  -- Centered Weights [-0.5, 0.5)
0.469  -- General Rule [-y, y)

使用正态分布来初始化权重

正态分布的平均值应为 0，标准差为y=1/sqrt(n)，其中 n 是 NN 的输入数量

    ## takes in a module and applies the specified weight initialization
    def weights_init_normal(m):
        '''Takes in a module and initializes all linear layers with weight
           values taken from a normal distribution.'''

        classname = m.__class__.__name__
        # for every Linear layer in a model
        if classname.find('Linear') != -1:
            y = m.in_features
        # m.weight.data shoud be taken from a normal distribution
            m.weight.data.normal_(0.0,1/np.sqrt(y))
        # m.bias.data should be 0
            m.bias.data.fill_(0)

下面我们展示两个 NN 的性能，一个使用均匀分布初始化，另一个使用正态分布初始化

经过 2 个时期后：

使用均匀分布与正态分布的权重初始化性能

Validation Accuracy
85.775% -- Uniform Rule [-y, y)
84.717% -- Normal Distribution
Training Loss
0.329  -- Uniform Rule [-y, y)
0.443  -- Normal Distribution

解决方案 3：

要初始化图层，您通常不需要做任何事情。

PyTorch 会帮你完成这些。如果你仔细想想，就会发现这很有道理。既然 PyTorch 可以按照最新趋势完成初始化层，为什么我们还要初始化层呢？

例如，该Linear层的__init__方法将执行Kaiming He初始化：

init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
    fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
    bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
    init.uniform_(self.bias, -bound, bound)

类似地，这适用于其他层类型。例如，Conv2d请点击此处。

注意：适当初始化的优点是训练速度更快。如果你的问题需要特殊初始化，你仍然可以事后再进行初始化。

解决方案 4：

import torch.nn as nn        

# a simple network
rand_net = nn.Sequential(nn.Linear(in_features, h_size),
                         nn.BatchNorm1d(h_size),
                         nn.ReLU(),
                         nn.Linear(h_size, h_size),
                         nn.BatchNorm1d(h_size),
                         nn.ReLU(),
                         nn.Linear(h_size, 1),
                         nn.ReLU())

# initialization function, first checks the module type,
# then applies the desired changes to the weights
def init_normal(m):
    if type(m) == nn.Linear:
        nn.init.uniform_(m.weight)

# use the modules apply function to recursively apply the initialization
rand_net.apply(init_normal)

解决方案 5：

如果您想要额外的灵活性，您也可以手动设置权重。

假设你的输入全部为 1：

import torch
import torch.nn as nn

input = torch.ones((8, 8))
print(input)

tensor([[1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1.]])

并且你想要创建一个没有偏差的密集层（这样我们就可以形象化）：

d = nn.Linear(8, 8, bias=False)

将所有权重设置为 0.5（或其他值）：

d.weight.data = torch.full((8, 8), 0.5)
print(d.weight.data)

权重：

Out[14]: 
tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]])

现在所有权重均为 0.5。将数据传递如下：

d(input)

Out[13]: 
tensor([[4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.],
        [4., 4., 4., 4., 4., 4., 4., 4.]], grad_fn=<MmBackward>)

请记住，每个神经元接收 8 个输入，所有输入的权重均为 0.5，值均为 1（无偏差），因此每个输入的总和为 4。

解决方案 6：

抱歉这么晚才回复，希望我的回答能对你有帮助。

要使用以下normal distribution方法来初始化权重：

torch.nn.init.normal_(tensor, mean=0, std=1)

或者使用constant distribution写入：

torch.nn.init.constant_(tensor, value)

或者使用uniform distribution：

torch.nn.init.uniform_(tensor, a=0, b=1) # a: lower_bound, b: upper_bound

您可以在此处查看初始化张量的其他方法

解决方案 7：

迭代参数

apply例如，如果模型不能直接实现，则不能使用Sequential：

所有人都一样

# see UNet at https://github.com/milesial/Pytorch-UNet/tree/master/unet


def init_all(model, init_func, *params, **kwargs):
    for p in model.parameters():
        init_func(p, *params, **kwargs)

model = UNet(3, 10)
init_all(model, torch.nn.init.normal_, mean=0., std=1) 
# or
init_all(model, torch.nn.init.constant_, 1.)

取决于形状

def init_all(model, init_funcs):
    for p in model.parameters():
        init_func = init_funcs.get(len(p.shape), init_funcs["default"])
        init_func(p)

model = UNet(3, 10)
init_funcs = {
    1: lambda x: torch.nn.init.normal_(x, mean=0., std=1.), # can be bias
    2: lambda x: torch.nn.init.xavier_normal_(x, gain=1.), # can be weight
    3: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv1D filter
    4: lambda x: torch.nn.init.xavier_uniform_(x, gain=1.), # can be conv2D filter
    "default": lambda x: torch.nn.init.constant(x, 1.), # everything else
}

init_all(model, init_funcs)

您可以尝试torch.nn.init.constant_(x, len(x.shape))检查它们是否已适当初始化：

init_funcs = {
    "default": lambda x: torch.nn.init.constant_(x, len(x.shape))
}

解决方案 8：

这是更好的方法，只需传递整个模型

import torch.nn as nn
def initialize_weights(model):
    # Initializes weights according to the DCGAN paper
    for m in model.modules():
        if isinstance(m, (nn.Conv2d, nn.ConvTranspose2d, nn.BatchNorm2d)):
            nn.init.normal_(m.weight.data, 0.0, 0.02)
        # if you also want for linear layers ,add one more elif condition

解决方案 9：

由于我目前还没有足够的声誉，我无法在

prosti于2019 年 6 月 26 日 13:16发表的回答。

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(3))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

但我想指出的是，实际上我们知道何开明的论文《深入研究整流器：在 ImageNet 分类上超越人类水平的表现》中的一些假设并不恰当，尽管看起来精心设计的初始化方法在实践中取得了成功。

例如，在反向传播案例小节中，他们假设 $w_l$ 和 $\delta y_l$ 相互独立。但众所周知，以得分图 $\delta y^L_i$ 为例，如果我们使用典型的交叉熵损失函数目标，则通常是 $y_i-softmax(y^L_i)=y_i-softmax(w^L_ix^L_i)$。

所以我认为何氏初始化法效果良好的真正根本原因仍有待解开。因为每个人都见证了它在促进深度学习训练方面的力量。

解决方案 10：

如果您看到弃用警告 (@Fábio Perez)...

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.xavier_uniform_(m.weight)
        m.bias.data.fill_(0.01)

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

解决方案 11：

请参阅：https：//pytorch.org/tutorials/prototype/skip_param_init.html

现在可以在模块构建期间跳过参数初始化，从而避免浪费计算。使用 torch.nn.utils.skip_init() 函数可以轻松实现这一点。

如何在 PyTorch 中初始化权重？

问题描述：

解决方案 1：

单层

`nn.Sequential`或自定义`nn.Module`

解决方案 2：

我们比较了使用相同神经网络（NN）架构的不同权重初始化模式。

全零或全一

统一初始化

设定权重的一般规则

使用正态分布来初始化权重

解决方案 3：

解决方案 4：

解决方案 5：

解决方案 6：

解决方案 7：

迭代参数

所有人都一样

取决于形状

解决方案 8：

解决方案 9：

解决方案 10：

解决方案 11：

云端的项目管理软件

问题描述：

解决方案 1：

单层

nn.Sequential或自定义nn.Module

解决方案 2：

我们比较了使用相同神经网络（NN）架构的不同权重初始化模式。

全零或全一

统一初始化

设定权重的一般规则

使用正态分布来初始化权重

解决方案 3：

解决方案 4：

解决方案 5：

解决方案 6：

解决方案 7：

迭代参数

所有人都一样

取决于形状

解决方案 8：

解决方案 9：

解决方案 10：

解决方案 11：

云端的项目管理软件

`nn.Sequential`或自定义`nn.Module`