训练MNIST数据集模型

1. 数据集准备算法


   详细信息见: Caffe: LMDB 及其数据转换网络


mnist是一个手写数字库,由DL大牛Yan LeCun进行维护。mnist最初用于支票上的手写数字识别, 如今成了DL的入门练习库。征对mnist识别的专门模型是Lenet,算是最先的cnn模型了。app

mnist数据训练样本为60000张,测试样本为10000张,每一个样本为28*28大小的黑白图片,手写数字为0-9,所以分为10类。ide


1)数据能够从MNIST官网上下载)函数

2)或者执行以下命令学习

$CAFFE_ROOT表示源代码的根目录
测试

cd $CAFFE_ROOT
./data/mnist/get_mnist.sh

运行成功后,在 data/mnist/目录下有四个文件:
train-images-idx3-ubyte:  训练集样本 (9912422 bytes) 
train-labels-idx1-ubyte:  训练集对应标注 (28881 bytes) 
t10k-images-idx3-ubyte:   测试集图片 (1648877 bytes) 
t10k-labels-idx1-ubyte:   测试集对应标注 (4542 bytes)

这些数据不能在caffe中直接使用,须要转换成LMDB数据
./examples/mnist/create_mnist.sh
下载成功会有以下两个数据集:/data/mnist-train-leveldb /data/mnist-test-leveldb.

 

2.LeNet: MNIST 分类模型的训练和测试

2.1 LeNet分类模型
ui

使用LeNet模型网络来训练,是数字识别的好模型。The design of LeNet contains the essence of CNNs that are still used in larger models such as the ones in ImageNet. In general, it consists of a convolutional layer followed by a pooling layer, another convolution layer followed by a pooling layer, and then two fully connected layers similar to the conventional multilayer perceptrons. We have defined the layers in `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`.this

2.2 MNIST网络结构定义

此处解释MNIST手写数字识别LeNet模型的定义 `lenet_train_test.prototxt`,Caffe中使用的protobuf 定义在 `$CAFFE_ROOT/src/caffe/proto/caffe.proto`中.接下来将会写一个protobuf定义:`caffe::NetParameter` (或者Python形式, `caffe.proto.caffe_pb2.NetParameter`) .spa

 开始,先定义一个网络名字:
    name: "LeNet"

2.2.1 数据层

demo中从lmdb建立的MNIST数据,经过一个数据层定义:
    layer {
      name: "mnist"
      type: "Data"
      transform_param {
        scale: 0.00390625
      }
      data_param {
        source: "mnist_train_lmdb"
        backend: LMDB
        batch_size: 64
      }
      top: "data"
      top: "label"
    }

本层有一个属性name `mnist`, type `data`,数据读取自 lmdb. batch的大小为64, 而且会对incoming pixels进行scale保证范围在 [0,1). 为何是 0.00390625?这个值等于 1除以256. 最终, 该层生成两个blobs, 一个是 `data` blob, 另外一个是 `label` blob.

2.2.2 卷积层

卷机层定义:
    layer {
      name: "conv1"
      type: "Convolution"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      convolution_param {
        num_output: 20
        kernel_size: 5
        stride: 1
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "data"
      top: "conv1"
    }

本层使用数据层提供的 `data` blob , 并产生 `conv1` 层. 产生 20个通道的输出, 卷积核大小5 and carried out with stride 1.
The fillers容许能够随机初始化weights和bias的值. 对于 weight filler, 使用`xavier` 算法给予神经元的输入和输出的数量自动地决定初始化的scale.对于 bias filler, 将会简单的初始化为常量t,默认为0.
`lr_mult`为层中参数的学习率调节器.当前权因子的学习率和运行时solver给定的学习率同样, bias的学习率为权因子学习率的两倍能够得到较好的收敛速度.


2.2.3 Pooling 层


Pooling是较容易定义的:

    layer {
      name: "pool1"
      type: "Pooling"
      pooling_param {
        kernel_size: 2
        stride: 2
        pool: MAX
      }
      bottom: "conv1"
      top: "pool1"
    }
max pooling 核的大小为2,stride 为 2 (因此相邻的pooling区域没有重叠).
相似的,能够写出第二个 convolution 和 pooling 层. 详见 `$CAFFE_ROOT/examples/mnist/lenet_train_test.prototxt`

2.2.4 全链接层(Fully Connected Layer)


写一个全链接层很简单:

    layer {
      name: "ip1"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 500
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "pool2"
      top: "ip1"
    }

T这样定义了一个全连接层(Caffe中称为`InnerProduct`层) 有 500个输出.全部其余行和以前的很类似对么?

2.2.5 ReLU 层


ReLU 层也很简单:

    layer {
      name: "relu1"
      type: "ReLU"
      bottom: "ip1"
      top: "ip1"
    }

由于 ReLU是一个element-wise 操做, 咱们能够作 *in-place* 操做节省内存. This is achieved by simply giving the same name to the bottom and top blobs. 固然, 不要使用重复的blob名字 for other layer types!

ReLU 层后, 咱们将会写另外一个 innerproduct layer:

    layer {
      name: "ip2"
      type: "InnerProduct"
      param { lr_mult: 1 }
      param { lr_mult: 2 }
      inner_product_param {
        num_output: 10
        weight_filler {
          type: "xavier"
        }
        bias_filler {
          type: "constant"
        }
      }
      bottom: "ip1"
      top: "ip2"
    }

2.2.6 Loss 层


最后, loss层

    layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "ip2"
      bottom: "label"
    }

 `softmax_loss` 层移植了 softmax 和  multinomial logistic loss (that saves time and improves numerical stability). 使用了两个blobs, 第一个being the prediction and第二个being the `label` provided by the data layer (remember it?). 这并不产生任何输出,只是用来计算损失函数的值, report it when backpropagation starts, and initiates the gradient with respect to `ip2`. This is where all magic starts.


2.2.7 备注:写层的规则

Layer 的定义能够包含是否或者何时被包含在网络定义中的规则, 例以下面这个例子:

    layer {
      // ...layer definition...
      include: { phase: TRAIN }
    }

这是一个规则, 基于网络的state controls layer inclusion in the network.
You can refer to `$CAFFE_ROOT/src/caffe/proto/caffe.proto` for more information about layer rules and model schema.

上面的这个例子, 该层只会被包含在 `TRAIN` phase.
若是改变 `TRAIN` 为 `TEST`,该层只会使用在test phase.

默认的, 层是没有规则的,一个层一般包含在网络中.
Thus, `lenet_train_test.prototxt` has two `DATA` layers defined (with different `batch_size`), one for the training phase and one for the testing phase.
Also, there is an `Accuracy` layer which is included only in `TEST` phase for reporting the model accuracy every 100 iteration, as defined in `lenet_solver.prototxt`.


总体解释以下:


#网络名称
name: "LeNet"
#train数据层
#输入源:mnist_train_ldmb,batch_size:64
#输出:data blob,label blob
#数据变换:scale归一化,0.00390625=1/255
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
#test数据层
#输入源:mnist_test_ldmb,batch_size:100
#输出:data blob,label blob
#数据变换:scale归一化
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "examples/mnist/mnist_test_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
#卷积层conv1
#输入数据:data blob
#输出数据:conv1 blob
#卷积层参数:20个5*5的特征卷积核,步长为1,卷积核的权重初始化方式为xavier,偏置的初始化方式为constant,常量默认为0
#该层学习率:权重学习率为基学习率base_lr的1倍,偏置学习率为base_lr的两倍
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
#池化层pool1
#输入数据:conv1 blob
#输出数据:pool1 blob
#池化方式及参数:Max pool,2*2的池化核,步长为2
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

layer {
  name: "conv2"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}

#全链接层ip1
#输入数据:pool2 blob
#输出数据:ip1 blob
#全链接层参数:500个节点,权值初始化方式为xavier,偏置初始化方式为constant,默认为0
#学习率:权重学习率为base_lr,偏置学习率为base_lr*2

 layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
#非线性激活层relu1
#输入数据:ip1 blob
#输出数据:ip1 blob(注意仍然是ip1,因为relu是对每一个点操做,输出也是对应个点的值,这样作便于省内存)
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
#全链接层ip2
#输入数据:ip1 blob
#输出数据:ip2 blob(此网络输出也即用于最终预测的输出)
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
#精确度计算层accuracy,只在test阶段有用
#输入数据:ip2 blob,label accuracy
#输出数据:accuracy
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
# 损失函数层:SoftmaxWithLoss类型的loss
# 数据输入:ip2 blob, label blob
# 数据输出:loss blob,最终损失
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}



3 定义 MNIST Solver


Check out the comments explaining each line in the prototxt `$CAFFE_ROOT/examples/mnist/lenet_solver.prototxt`:

    # The train/test net protocol buffer definition
    net: "examples/mnist/lenet_train_test.prototxt"
    # test_iter specifies how many forward passes the test should carry out.
    # In the case of MNIST, we have test batch size 100 and 100 test iterations,
    # covering the full 10,000 testing images.
    test_iter: 100
    # Carry out testing every 500 training iterations.
    test_interval: 500
    # The base learning rate, momentum and the weight decay of the network.
    base_lr: 0.01
    momentum: 0.9
    weight_decay: 0.0005
    # The learning rate policy
    lr_policy: "inv"
    gamma: 0.0001
    power: 0.75
    # Display every 100 iterations
    display: 100
    # The maximum number of iterations
    max_iter: 10000
    # snapshot intermediate results
    snapshot: 5000
    snapshot_prefix: "examples/mnist/lenet"
    # solver mode: CPU or GPU
    solver_mode: GPU


解释以下:

# 网络结构
net: "examples/mnist/lenet_train_test.prototxt"
# 此时validation总样本数test_iter*batch_size
test_iter: 100
# 每500次训练迭代进行依次validation迭代
test_interval: 500
#初始学习率,基学习率
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# 学习率变化策略
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# 每100次迭代显示一次loss等参数,包括训练与验证
display: 100
# 最多迭代次数
max_iter: 10000
# 每5000次迭代进行一次模型存储,防止意外中断
snapshot: 5000
snapshot_prefix: "examples/mnist/lenet"
# 训练和测试模式
solver_mode: GPU



4 训练和测试模型


写完network definition protobuf and solver protobuf后. 直接运行 `train_lenet.sh`, 或者按照以下命令:

    cd $CAFFE_ROOT
    ./examples/mnist/train_lenet.sh

`train_lenet.sh` 是一个简单的脚本, but here is a quick explanation: the main tool for training is `caffe` with action `train` and the solver protobuf text file as its argument.


When you run the code, you will see a lot of messages flying by like this:

    I1203 net.cpp:66] Creating Layer conv1
    I1203 net.cpp:76] conv1 <- data
    I1203 net.cpp:101] conv1 -> conv1
    I1203 net.cpp:116] Top shape: 20 24 24
    I1203 net.cpp:127] conv1 needs backward computation.

These messages tell you the details about each layer, its connections and its output shape, which may be helpful in debugging. After the initialization, the training will start:

    I1203 net.cpp:142] Network initialization done.
    I1203 solver.cpp:36] Solver scaffolding done.
    I1203 solver.cpp:44] Solving LeNet

Based on the solver setting, we will print the training loss function every 100 iterations, and test the network every 500 iterations. You will see messages like this:

    I1203 solver.cpp:204] Iteration 100, lr = 0.00992565
    I1203 solver.cpp:66] Iteration 100, loss = 0.26044
    ...
    I1203 solver.cpp:84] Testing net
    I1203 solver.cpp:111] Test score #0: 0.9785
    I1203 solver.cpp:111] Test score #1: 0.0606671

For each training iteration, `lr` is the learning rate of that iteration, and `loss` is the training function. For the output of the testing phase, score 0 is the accuracy, and score 1 is the testing loss function.

And after a few minutes, you are done!

    I1203 solver.cpp:84] Testing net
    I1203 solver.cpp:111] Test score #0: 0.9897
    I1203 solver.cpp:111] Test score #1: 0.0324599
    I1203 solver.cpp:126] Snapshotting to lenet_iter_10000
    I1203 solver.cpp:133] Snapshotting solver state to lenet_iter_10000.solverstate
    I1203 solver.cpp:78] Optimization Done.

The final model, stored as a binary protobuf file, is stored at

    lenet_iter_10000

which you can deploy as a trained model in your application, if you are training on a real-world application dataset.


若是想在GPU下运行计算,只须要修改lenet_solver.prototxt文件的solver_mode便可,0CPU1GPU

# solver mode: CPU or GPU
    solver_mode: CPU
 How to reduce the learning rate at fixed steps? Look at lenet_multistep_solver.prototxt