VisualPytorch发布域名+双服务器以下:
http://nag.visualpytorch.top/static/ (对应114.115.148.27)
http://visualpytorch.top/static/ (对应39.97.209.22)python
梯度降低: \(w_{i+1} = w_i-LR *g(w_i)\),学习率(learning rate)控制更新的步伐服务器
pytorch中全部学习率控制都继承与class _LRScheduler
网络
主要属性及函数:app
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1) # 设置学习率降低策略 for epoch in range(MAX_EPOCH): ... for i, data in enumerate(train_loader): ... scheduler.step() # 更新学习率,注意在每一个epoch调用,而不是每一个iteration
StepLR
功能:等间隔调整学习率
主要参数:
• step_size:调整间隔数
• gamma:调整系数
\(lr = lr_0 * gamma**(epoch//step\_size)\)函数
MultiStepLR
功能:按给定间隔调整学习率
主要参数:
• milestones:设定调整时刻数
• gamma:调整系数
\(lr = lr * gamma\)学习
ExponentialLR
功能:按指数衰减调整学习率
主要参数:
• gamma:指数的底
\(lr = lr_0 * gamma**epoch\)优化
CosineAnnealingLR
功能:余弦周期调整学习率
主要参数:
• T_max:降低周期,如图所示降低周期为50epoch
• eta_min:学习率下限ui
\(\eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min})(1+cos(\frac{T_{cur}}{T_{max}}\pi))\)spa
ReduceLRonPlateau
功能:监控指标,当指标再也不变化则调整
主要参数:
• mode:min/max 两种模式,min表示监控指标再也不减少则调整
• factor:调整系数
• patience:“耐心 ”,接受几回不变化
• cooldown:“冷却时间”,中止监控一段时间
• verbose:是否打印日志
• min_lr:学习率下限
• eps:学习率衰减最小值命令行
scheduler_lr = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.1, mode="min", patience=10,cooldown=10, min_lr=1e-4, verbose=True) for epoch in range(max_epoch): for i in range(iteration): # train(...) optimizer.step() optimizer.zero_grad() if epoch == 5: loss_value = 0.4 scheduler_lr.step(loss_value) ''' Epoch 16: reducing learning rate of group 0 to 1.0000e-02. Epoch 37: reducing learning rate of group 0 to 1.0000e-03. Epoch 58: reducing learning rate of group 0 to 1.0000e-04. '''
LambdaLR
功能:自定义调整策略,对多组参数采用不一样的学习率调整方式
主要参数:
• lr_lambda:function or list
optimizer = optim.SGD([ {'params': [weights_1]}, {'params': [weights_2]}], lr=lr_init) lambda1 = lambda epoch: 0.1 ** (epoch // 20) lambda2 = lambda epoch: 0.95 ** epoch scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
pip install tensorbord pip install future
在含有runs的文件夹下命令行输入 tensorboard --logdir=./
,便可打开,以下图所示。
可视化任意网络模型训练的Loss,及Accuracy曲线图,Train与Valid必须在同一个图中(节选人民币分类训练部分):
# 构建 SummaryWriter writer = SummaryWriter(comment='test_your_comment', filename_suffix="_test_your_filename_suffix") for epoch in range(MAX_EPOCH): loss_mean = 0. correct = 0. total = 0. net.train() for i, data in enumerate(train_loader): ... # 记录数据,保存于event file writer.add_scalars("Loss", {"Train": loss.item()}, iter_count) writer.add_scalars("Accuracy", {"Train": correct / total}, iter_count) # 每一个epoch,记录梯度,权值 for name, param in net.named_parameters(): writer.add_histogram(name + '_grad', param.grad, epoch) writer.add_histogram(name + '_data', param, epoch) scheduler.step() # 更新学习率
经过matplotlib直接绘制的曲线(训练集和验证集,iteration为单位),第二张是tensorbord。能够发现,若是没有排除离群点和平滑,两个图是一致的。
能够看到,随着迭代次数的增长梯度愈来愈小,并非梯度消失,而是自己Loss已经达到1e-4.
add_image()
功能:记录图像
• tag:图像的标签名,图的惟一标识
• img_tensor:图像数据,注意尺度。只要该图像有>1的像素点,再也不对该图像*255
标准化
• global_step:x轴 • dataformats:数据形式,CHW,HWC,HW
torchvision.utils.make_grid()
功能:制做网格图像
• tensor:图像数据, B*C*H*W形式
• nrow:行数(列数自动计算)
• padding:图像间距(像素单位)
• normalize:是否将像素值标准化
• range:标准化范围
• scale_each:是否单张图维度标准化
• pad_value:padding的像素值
writer = SummaryWriter(comment='test_your_comment', filename_suffix="_test_your_filename_suffix") alexnet = models.alexnet(pretrained=True) kernel_num = -1 for sub_module in alexnet.modules(): if isinstance(sub_module, nn.Conv2d): kernel_num += 1 kernels = sub_module.weight c_out, c_int, k_w, k_h = tuple(kernels.shape) # 每个卷积核单独绘制三个通道 for o_idx in range(c_out): kernel_idx = kernels[o_idx, :, :, :].unsqueeze(1) # make_grid须要 BCHW,这里拓展C维度 kernel_grid = vutils.make_grid(kernel_idx, normalize=True, scale_each=True, nrow=c_int) writer.add_image('{}_Convlayer_split_in_channel'.format(kernel_num), kernel_grid, global_step=o_idx) # 全部卷积核一块儿绘制 kernel_all = kernels.view(-1, 3, k_h, k_w) # 3, h, w kernel_grid = vutils.make_grid(kernel_all, normalize=True, scale_each=True, nrow=8) # c, h, w writer.add_image('{}_all'.format(kernel_num), kernel_grid, global_step=322) print("{}_convlayer shape:{}".format(kernel_num, tuple(kernels.shape))) # 模型, 特征图的可视化 alexnet = models.alexnet(pretrained=True) # forward convlayer1 = alexnet.features[0] fmap_1 = convlayer1(img_tensor) # 预处理 fmap_1.transpose_(0, 1) # bchw=(1, 64, 55, 55) --> (64, 1, 55, 55) fmap_1_grid = vutils.make_grid(fmap_1, normalize=True, scale_each=True, nrow=8) writer.add_image('feature map in conv1', fmap_1_grid, global_step=322) writer.close()
add_graph()
功能:可视化模型计算图
• model:模型,必须是 nn.Module
• input_to_model:输出给模型的数据
• verbose:是否打印计算图结构信息
注意使用该方法对环境有所限制,torch版本必须>=1.3,在该版本下运行生成runs文件夹后,可更换为原环境运行tensorboard.
torchsummary
功能:查看模型信息,便于调试
• model:pytorch模型
• input_size:模型输入size
• batch_size:batch size
• device:“cuda” or “cpu”
Tensor.register_hook
功能:注册一个反向传播hook函数,为了避免修改主体而实现特定的功能
Hook函数仅一个输入参数,为张量的梯度
w = torch.tensor([1.], requires_grad=True) x = torch.tensor([2.], requires_grad=True) a = torch.add(w, x) b = torch.add(w, 1) y = torch.mul(a, b) a_grad = list() def grad_hook(grad): a_grad.append(grad) def grad_hook(grad): grad *= 2 return grad*3 # 返回值会覆盖掉原来的grad,故最后w.grad = 6*5 = 30 handle = w.register_hook(grad_hook) handle = a.register_hook(grad_hook) y.backward() # 查看梯度 print("gradient:", w.grad, x.grad, a.grad, b.grad, y.grad) # 30 2 None None None print("a_grad[0]: ", a_grad[0]) # 2 handle.remove()
Function | Parameter | Usage |
---|---|---|
Module.register_forward_hook | module, input, output | 注册module的前向传播hook函数 |
register_forward_pre_hook | module, input | 注册module前向传播前的hook函数 |
register_backward_hook | module, input, output | 注册module反向传播的hook函数 |
参数:
• module: 当前网络层
• input:当前网络层输入数据
• output:当前网络层输出数据
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 2, 3) self.pool1 = nn.MaxPool2d(2, 2) def forward(self, x): x = self.conv1(x) x = self.pool1(x) return x def forward_hook(module, data_input, data_output): fmap_block.append(data_output) input_block.append(data_input) def forward_pre_hook(module, data_input): print("forward_pre_hook input:{}".format(data_input)) def backward_hook(module, grad_input, grad_output): print("backward hook input:{}".format(grad_input)) print("backward hook output:{}".format(grad_output)) # 初始化网络 net = Net() net.conv1.weight[0].detach().fill_(1) net.conv1.weight[1].detach().fill_(2) net.conv1.bias.data.detach().zero_() # 注册hook fmap_block = list() input_block = list() net.conv1.register_forward_hook(forward_hook) net.conv1.register_forward_pre_hook(forward_pre_hook) net.conv1.register_backward_hook(backward_hook) # inference fake_img = torch.ones((1, 1, 4, 4)) # batch size * channel * H * W output = net(fake_img) loss_fnc = nn.L1Loss() target = torch.randn_like(output) loss = loss_fnc(target, output) loss.backward()
以register_forward_hook
为例,在output = net(fake_img)
时调用过程以下:
在net.conv1.register_forward_hook(forward_hook)
注册之后,net中_modules参数已经有了对应的_forword_hooks
Modules.__call__()
,此函数分为4个步骤,net中不含hooks,进入forwarddef __call__(self, *input, **kwargs): # 1. _forward_pre_hooks for hook in self._forward_pre_hooks.values(): result = hook(self, input) if result is not None: if not isinstance(result, tuple): result = (result,) input = result # 2. forward if torch._C._get_tracing_state(): result = self._slow_forward(*input, **kwargs) else: result = self.forward(*input, **kwargs) # 3. _forward_hooks for hook in self._forward_hooks.values(): hook_result = hook(self, input, result) if hook_result is not None: result = hook_result # 4. _backward_hooks if len(self._backward_hooks) > 0: var = result while not isinstance(var, torch.Tensor): if isinstance(var, dict): var = next((v for v in var.values() if isinstance(v, torch.Tensor))) else: var = var[0] grad_fn = var.grad_fn if grad_fn is not None: for hook in self._backward_hooks.values(): wrapper = functools.partial(hook, self) functools.update_wrapper(wrapper, hook) grad_fn.register_hook(wrapper) return result
Net.forward
调用第一个卷积层def forward(self, x): x = self.conv1(x) x = self.pool1(x) return x
Modules.__call__()
,此时在forward后会调用相应的hook函数,即咱们在主程序中定义的CAM:类激活图,class activation map: 在普通的网络层最后改为了GAP获得最后的权重层,再由全链接层进行softmax。最后直接对特征图进行加权平均。
Grad-CAM:CAM改进版,利用梯度做为特征图权重:不用再修改网络结构
咱们获得以上有趣的分析,发现模型预测飞机的存在不是飞机自己,而是蓝色的天空,代码实现详见PyTorch的hook及其在Grad-CAM中的应用