本文介绍两种卷积:python
下篇文章介绍:git
最近在读论文的时候,有好几个地方提到了3D卷积,一直不懂是怎么操做的,看了一部分资料:github
仍是有些懵,又利用tensorflow作了进一步测试学习。web
tensorflow中的函数tf.nn.conv3dide
tf.nn.conv3d( input, filter, strides, padding, data_format='NDHWC', dilations=[1, 1, 1, 1, 1], name=None )
给定一个5-D的输入和滤波器,计算一个3D卷积。svg
input:
A Tensor. Must be one of the following type, half, bfloat16, float32, float64. Shape [batch, in_depth, in_height, in_width, in_channels]filter:
A tensor. Must have the same type as input. Shape [filter_depth, filter_height, filter_width, in_channels, out_channels]. in_channels must match between input and filterstrides:
A list of ints that has length >= 5. 1-D tensor of length 5. The stride of the sliding window for each dimension of input. Must have strides[0] = strides[4] = 1.padding:
A string from: “SAME”, “VALID”. The type of padding algorithm to use.data_format:
An optional string from: “NDHWC”, “NCDHW”. Defaults to “NDHWC”. The data format of the input and output data. With the default format “NDHWC”, the data is stored in the order of: [batch, in_depth, in_height, in_width, in_channels]. Alternatively, the format could be “NCDHW”, the data storage order is: [batch, in_channels, in_depth, in_height, in_width].dilations:
An optional list of ints. Defaults to [1, 1, 1, 1, 1]. 1-D tensor of length 5. The dilation factor for each dimension of input. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension. The dimension order is determined by the value of data_format, see above for details. Dilations in the batch and depth dimensions must be 1.name:
A name for the operation (optional).A Tensor. Has the same type as input函数
与2D卷积tf.nn.conv2d相比,学习
tf.nn.conv2d( input, filter, strides, padding, use_cudnn_on_gpu=True, data_format='NHWC', dilations=[1, 1, 1, 1], name=None )
其实差异不大,主要区别在于input和filter两个参数,2D卷积中这两个参数的要求是:测试
input:
A Tensor. Must be one of the following types: half, bfloat16, float32, float64. A 4-D tensor. [batch, height, width, channels.filter:
A Tensor. Must have the same type as input. A 4-D tensor of shape [filter_height, filter_width, in_channels, out_channels]主要区别是3D卷积中的input参数的in_depth与filter参数里面的filter_depth。ui
input:
A Tensor. Must be one of the following type, half, bfloat16, float32, float64. Shape [batch, in_depth, in_height, in_width, in_channels]filter:
A tensor. Must have the same type as input. Shape [filter_depth, filter_height, filter_width, in_channels, out_channels]. in_channels must match between input and filter其它参数和3D卷积相似。
import tensorflow as tf import numpy as np input = tf.constant(1, shape=[1, 7, 224, 224, 3], dtype=tf.float32) filter_1_2 = tf.constant(2, shape=[1, 3, 3, 3, 64], dtype=tf.float32) filter_3_4 = tf.constant(2, shape=[1, 5, 3, 3, 64], dtype=tf.float32) res_1 = tf.nn.conv3d(input=input, filter=filter_1_2, strides=[1, 1, 1, 1, 1], padding='SAME') res_2 = tf.nn.conv3d(input=input, filter=filter_1_2, strides=[1, 2, 1, 1, 1], padding='SAME') res_3 = tf.nn.conv3d(input=input, filter=filter_3_4, strides=[1, 1, 1, 1, 1], padding='SAME') res_4 = tf.nn.conv3d(input=input, filter=filter_3_4, strides=[1, 2, 1, 1, 1], padding='SAME') sess = tf.Session() conv_res_1 = sess.run(res_1) conv_res_2 = sess.run(res_2) conv_res_3 = sess.run(res_3) conv_res_4 = sess.run(res_4) print conv_res_1.shape print conv_res_2.shape print conv_res_3.shape print conv_res_4.shape #print conv_res
(1, 7, 224, 224, 64) (1, 4, 224, 224, 64) (1, 7, 224, 224, 64) (1, 4, 224, 224, 64)
能够看到在第2维的卷积和第三、4维同样的原理。
上图为一个3D滤波器(其实为4D: depth, channel, height, width)在输入数据进行卷积的效果。
如上图,滤波器在第三维(帧,切片)上进行滑动,产生了不少((in_depth - 1) / filter_depth + 1)单通道的feature map。不一样的滤波器一样能够产生不少同等个数((in_depth - 1) / filter_depth + 1)的单层feature map。而后就一块组合成了[batch, (in_depth - 1) / filter_depth + 1, (in_height - 1) / filter_height + 1, (in_width - 1) / filter_width + 1, out_channels]的输出,如程序示例所示。
去卷积是在语义分割里用的比较多,以前看过一篇专门介绍反卷积的论文,但如今记不太清了,故从新学习一下。
这里的动态图对了解反卷积很形象,看看应该就能够明白。
另外分享一篇论文A guide to convolution arithmetic for deep learning
tensorflow中的tf.nn.conv2d_transpose
tf.nn.conv2d_transpose( value, filter, output_shape, strides, padding='SAME', data_format='NHWC', name=None )
value:
A 4-D Tensor of type float and shape [batch, height, width, in_channels] for NHWC data format or [batch, in_channels, height, width] for NCHW data format.filter:
A 4-D Tensor with the same type as value and shape [height, width, output_channels, in_channels]. filter’s in_channesl dimension must match that of value.output_shape
: A 1-D Tensor representing the output shape of the deconvolution op.strides:
A list of ints. The stride of the sliding window for each dimension of the input tensor.padding:
A string, either ‘VALID’ or ‘SAME’. The padding algorithm. See the “returns” section of tf.nn.convolution for details.data_format:
A string. ‘NHWC’ and ‘NCHW’ are supported.name:
Optional name for the returned tensor.这个output_shape就很懵,为何input, filter, padding, stride肯定后,output_shape不肯定?
import tensorflow as tf import numpy as np value = tf.constant(1, shape=[1, 3, 3, 3], dtype=tf.float32) filter = tf.constant(2, shape=[3, 3, 64, 3],dtype=tf.float32) output_shape_1 = tf.constant([1, 6, 6, 64]) output_shape_2 = tf.constant([1, 5, 5, 64]) res_1 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_1, strides=[1, 2, 2, 1], padding='SAME' ) res_2 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_2, strides=[1, 2, 2, 1], padding='SAME' ) sess = tf.Session() conv_res_1 = sess.run(res_1) conv_res_2 = sess.run(res_2) #conv_res_3 = sess.run(res_3) #conv_res_4 = sess.run(res_4) print conv_res_1.shape print conv_res_2.shape #print conv_res_3.shape #print conv_res_4.shape #print conv_res
(1, 6, 6, 64) (1, 5, 5, 64)
上面的例子能够看到指定这个output_shape是重要的,由于反卷积后的shape是不肯定,能够在可能的集合里来指定。
那么具体反卷积是怎么操做的呢?或者说给定input, filter, stride, padding等信息,反卷积后有哪些可能的output_shape呢?
考虑一下卷积的公式:
conv: i -> o
deconv: o -> i
o = floor(i + 2*p -k) / s + 1, (1)
反卷积是把推导出i的shape,i通过卷积后变成的o.
由(1)可得,
floor(i + 2*p - k) = (o - 1)s
程序测试
import tensorflow as tf import numpy as np value = tf.constant(1, shape=[1, 64, 64, 3], dtype=tf.float32) filter = tf.constant(2, shape=[3, 3, 256, 3],dtype=tf.float32) def deconv_s_1(): # s = 1, padding='SAME' output_shape_1_same = tf.constant([1, 64, 64, 256]) res_1 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_1_same, strides=[1, 1, 1, 1], padding='SAME' ) # s = 1, padding='VALID' output_shape_1_valid = tf.constant([1, 66, 66, 256]) res_2 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_1_valid, strides=[1, 1, 1, 1], padding='VALID' ) sess = tf.Session() conv_res_1 = sess.run(res_1) conv_res_2 = sess.run(res_2) print "s = 1, padding='SAME', expected: i = o = 64 " print conv_res_1.shape print "s = 1, padding='VALID', expected i = o + 2 = 66" print conv_res_2.shape def deconv_s_2(): #s = 2, padding='SAME' output_shape_2_same_1 = tf.constant([1, 128, 128, 256]) res_1 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_2_same_1, strides=[1, 2, 2, 1], padding='SAME' ) # s = 2, padding='SAME' output_shape_2_same_2 = tf.constant([1, 127, 127, 256]) res_2 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_2_same_2, strides=[1, 2, 2, 1], padding='SAME' ) # s = 2, padding='VALID' output_shape_2_valid_1 = tf.constant([1, 129, 129, 256]) res_3 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_2_valid_1, strides=[1, 2, 2, 1], padding='VALID' ) # s = 2, padding='VALID' print "s = 2, padding='VALID'" output_shape_2_valid_2 = tf.constant([1, 130, 130, 256]) res_4 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_2_valid_2, strides=[1, 2, 2, 1], padding='VALID' ) sess = tf.Session() conv_res_1 = sess.run(res_1) conv_res_2 = sess.run(res_2) conv_res_3 = sess.run(res_3) conv_res_4 = sess.run(res_4) print "s = 2, padding='SAME', expected i = 2o = 128 or i = 2o - 1 = 127 " print conv_res_1.shape print conv_res_2.shape print "s = 2, padding='VALID', expected i = 2o + 1 = 129 or i = 2o + 2 = 130" print conv_res_3.shape print conv_res_4.shape # print conv_res def deconv_error(): output_shape_2_same_1 = tf.constant([1, 129, 129, 256]) res_1 = tf.nn.conv2d_transpose( value=value, filter=filter, output_shape=output_shape_2_same_1, strides=[1, 2, 2, 1], padding='SAME' ) sess = tf.Session() conv_res_1 = sess.run(res_1) print "s = 2, padding='SAME', expected i = 2o = 128 or i = 2o - 1 = 127 " print conv_res_1.shape print "input_size = 64:" print "stride = 1" deconv_s_1() print "stride = 2" deconv_s_2() deconv_error()
input_size = 64: 2018-12-04 00:04:02.363507: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA stride = 1 s = 1, padding='SAME', expected: i = o = 64 (1, 64, 64, 256) s = 1, padding='VALID', expected i = o + 2 = 66 (1, 66, 66, 256) stride = 2 s = 2, padding='VALID' s = 2, padding='SAME', expected i = 2o = 128 or i = 2o - 1 = 127 (1, 128, 128, 256) (1, 127, 127, 256) s = 2, padding='VALID', expected i = 2o + 1 = 129 or i = 2o + 2 = 130 (1, 129, 129, 256) (1, 130, 130, 256) 2018-12-04 00:04:03.186144: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1 2018-12-04 00:04:03.187537: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1 2018-12-04 00:04:03.188916: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1 2018-12-04 00:04:03.189200: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:355 : Invalid argument: Conv2DCustomBackpropInput: Size of out_backprop doesn't match computed: actual = 64, computed = 65 spatial_dim: 1 input: 129 filter: 3 output: 64 stride: 2 dilation: 1
综上,经过上述程序验证了咱们的想法。
因此当ksize = 3时,咱们选择padding=‘SAME’, stride = 2,能够将输出Osize = 2 * Isize.
以上从卷积<->反卷积的shape的关系进行了解析,并分析了output_shape的可能性。