之前简单看过车道线检测思路,如今看到比较详细文章,之后系统学习。html
Opencv4图像分割和识别实战课程-车道线检测相关理论知识点-计算机...前端
第九章:车道线检测 1. 车道线检测相关理论知识点 [ 27:19 ]java
最近在用深度学习的方法进行车道线检测,现总结以下:
目前,对于车道线检测的方法主要分为两大类,一是基于传统机器视觉的方法,二是基于深度学习大方法。python
1. 边缘检测+霍夫变换
方法流程:彩色图像转灰度,模糊处理,边缘检测,霍夫变换
这种方法通常可以检测出简单场景下的车辆目前行驶的两条车道线,以及偶尔的相邻车道(依赖前视相机的角度)。该方法能够利用霍夫变换的结果(线的斜率),进一步过滤出左右车道线。不过同时,该方法也依赖于边缘检测的结果,因此调参(边缘检测、霍夫变换)以及其余的trick(roi选取等等)是很重要的。git
2. 颜色阈值
方法流程:将图像转颜色空间(通常HSV),对新的color space中的各个通道设置阈值(大于阈值取值为1,小于取值为0),获得结果。
该方法依赖于各通道的阈值的选取,只须要调整几个阈值参数,但我的认为该方法鲁棒性会较差,例如当前车辆前方的车辆可能会被所有置1。github
3. 透视变换
方法流程:获取透视变换矩阵,透视变换,车道线检测(1或者2)
该方法的优势是将前视摄像头抓拍的图像转为鸟瞰图,可以检测到多条线。其关键在于透视变换矩阵的准确性(不考虑转换后的车道线检测),对于转换后的鸟瞰图,能够经过上述两种方式检测车道线。
在实际场景中,传统方法的鲁棒性确实不行,除去光照和邻近车辆的影响外,车道中间的指示箭头和人行道也是此类算法很难处理的挑战。
上述方法的总结建议参考该博文算法
小编主要讲的是如何经过训练一个深度神经网络对车道线进行语义分割,从而实现车道线检测。SegNet网络是一种颇有趣的图像分割技术,是一种encoding-decoding的结构,在使用时可直接调用标准的模型结构。小编训练的网络即是基于SegNet网络构建的。
初始的网络结构以下图:
1.数据集
原图:
分割图(label):
原图点此处下载(https://www.dropbox.com/s/rrh8lrdclzlnxzv/full_CNN_train.p?dl=0)
数据集labels点此处下载(https://www.dropbox.com/s/ak850zqqfy6ily0/full_CNN_labels.p?dl=0)
第一版的网络model以下:express
""" This file contains code for a fully convolutional (i.e. contains zero fully connected layers) neural network for detecting lanes. This version assumes the inputs to be road images in the shape of 80 x 160 x 3 (RGB) with the labels as 80 x 160 x 1 (just the G channel with a re-drawn lane). Note that in order to view a returned image, the predictions is later stacked with zero'ed R and B layers and added back to the initial road image. """ import numpy as np import pickle from sklearn.utils import shuffle from sklearn.model_selection import train_test_split # Import necessary items from Keras from keras.models import Sequential from keras.layers import Activation, Dropout, UpSampling2D, concatenate from keras.layers import Conv2DTranspose, Conv2D, MaxPooling2D from keras.layers.normalization import BatchNormalization from keras.preprocessing.image import ImageDataGenerator from keras import regularizers # Load training images train_images = pickle.load(open("full_CNN_train.p", "rb" )) # Load image labels labels = pickle.load(open("full_CNN_labels.p", "rb" )) # Make into arrays as the neural network wants these train_images = np.array(train_images) labels = np.array(labels) # Normalize labels - training images get normalized to start in the network labels = labels / 255 # Shuffle images along with their labels, then split into training/validation sets train_images, labels = shuffle(train_images, labels) # Test size may be 10% or 20% X_train, X_val, y_train, y_val = train_test_split(train_images, labels, test_size=0.1) # Batch size, epochs and pool size below are all paramaters to fiddle with for optimization batch_size = 16 epochs = 10 pool_size = (2, 2) input_shape = X_train.shape[1:] ### Here is the actual neural network ### model = Sequential() # Normalizes incoming inputs. First layer needs the input shape to work model.add(BatchNormalization(input_shape=input_shape)) # Below layers were re-named for easier reading of model summary; this not necessary # Conv Layer 1 model.add(Conv2D(8, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv1')) # Conv Layer 2 model.add(Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv2')) # Pooling 1 model.add(MaxPooling2D(pool_size=pool_size)) # Conv Layer 3 model.add(Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv3')) model.add(Dropout(0.2)) # Conv Layer 4 model.add(Conv2D(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv4')) model.add(Dropout(0.2)) # Conv Layer 5 model.add(Conv2D(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv5')) model.add(Dropout(0.2)) # Pooling 2 model.add(MaxPooling2D(pool_size=pool_size)) # Conv Layer 6 model.add(Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv6')) model.add(Dropout(0.2)) # Conv Layer 7 model.add(Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Conv7')) model.add(Dropout(0.2)) # Pooling 3 model.add(MaxPooling2D(pool_size=pool_size)) # Upsample 1 model.add(UpSampling2D(size=pool_size)) # Deconv 1 model.add(Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Deconv1')) model.add(Dropout(0.2)) # Deconv 2 model.add(Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Deconv2')) model.add(Dropout(0.2)) # Upsample 2 model.add(UpSampling2D(size=pool_size)) # Deconv 3 model.add(Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Deconv3')) model.add(Dropout(0.2)) # Deconv 4 model.add(Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Deconv4')) model.add(Dropout(0.2)) # Deconv 5 model.add(Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Deconv5')) model.add(Dropout(0.2)) model.get_layer('Conv4') # Upsample 3 model.add(UpSampling2D(size=pool_size)) # Deconv 6 model.add(Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Deconv6')) # Final layer - only including one channel so 1 filter model.add(Conv2DTranspose(1, (3, 3), padding='valid', strides=(1,1), activation = 'relu', name = 'Final')) ### End of network ### # Using a generator to help the model use less data # Channel shifts help with shadows slightly datagen = ImageDataGenerator(channel_shift_range=0.2) datagen.fit(X_train) # Compiling and training the model model.compile(optimizer='Adam', loss='mean_squared_error') model.fit_generator(datagen.flow(X_train, y_train, batch_size=batch_size), steps_per_epoch=len(X_train)/batch_size, epochs=epochs, verbose=1, validation_data=(X_val, y_val)) # Freeze layers since training is done model.trainable = False model.compile(optimizer='Adam', loss='mean_squared_error') # Save model architecture and weights model.save('full_CNN_model.h5') # Show summary of model model.summary()
上述model的创建参考连接(https://github.com/mvirgo/MLND-Capstone)网络
上述的网络model在车道线检测的时候效果并不稳定,因此对其进行了改进。
1.增长了网络的层数,修改了每层卷积核的个数。
2.受Unet网络的启发,采用并联跳跃结构将encoding的feature map 与 decoding的feature map进行链接,这样能够在进行分类预测时利用多层信息。
3.将UpSampling2D改成Conv2DTranspose实现上采样的过程,UpSampling2D直接采用原像素值进行填补不存在学习的过程,而Conv2DTranspose存在学习的过程,效果更好。
训练
改进后的网络model以下:app
import numpy as np import pickle #import cv2 from sklearn.utils import shuffle from sklearn.model_selection import train_test_split # Import necessary items from Keras from keras.models import Model from keras.layers import Activation, Dropout, UpSampling2D, concatenate, Input from keras.layers import Conv2DTranspose, Conv2D, MaxPooling2D from keras.layers.normalization import BatchNormalization from keras.preprocessing.image import ImageDataGenerator from keras.utils import plot_model from keras import regularizers # Load training images train_images = pickle.load(open("full_CNN_train.p", "rb" )) # Load image labels labels = pickle.load(open("full_CNN_labels.p", "rb" )) # Make into arrays as the neural network wants these train_images = np.array(train_images) labels = np.array(labels) # Normalize labels - training images get normalized to start in the network labels = labels / 255 # Shuffle images along with their labels, then split into training/validation sets train_images, labels = shuffle(train_images, labels) # Test size may be 10% or 20% X_train, X_val, y_train, y_val = train_test_split(train_images, labels, test_size=0.1) # Batch size, epochs and pool size below are all paramaters to fiddle with for optimization batch_size = 16 epochs = 15 pool_size = (2, 2) #input_shape = X_train.shape[1:] ### Here is the actual neural network ### # Normalizes incoming inputs. First layer needs the input shape to work #BatchNormalization(input_shape=input_shape) Inputs = Input(batch_shape=(None, 80, 160, 3)) # Below layers were re-named for easier reading of model summary; this not necessary # Conv Layer 1 Conv1 = Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Inputs) Bat1 = BatchNormalization()(Conv1) # Conv Layer 2 Conv2 = Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Conv1) Bat2 = BatchNormalization()(Conv2) # Pooling 1 Pool1 = MaxPooling2D(pool_size=pool_size)(Conv2) # Conv Layer 3 Conv3 = Conv2D(32, (3, 3), padding = 'valid', strides=(1,1), activation = 'relu')(Pool1) #Drop3 = Dropout(0.2)(Conv3) Bat3 = BatchNormalization()(Conv3) # Conv Layer 4 Conv4 = Conv2D(32, (3, 3), padding = 'valid', strides=(1,1), activation = 'relu')(Bat3) #Drop4 = Dropout(0.5)(Conv4) Bat4 = BatchNormalization()(Conv4) # Conv Layer 5 Conv5 = Conv2D(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat4) #Drop5 = Dropout(0.2)(Conv5) Bat5 = BatchNormalization()(Conv5) # Pooling 2 Pool2 = MaxPooling2D(pool_size=pool_size)(Bat5) # Conv Layer 6 Conv6 = Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Pool2) #Drop6 = Dropout(0.2)(Conv6) Bat6 = BatchNormalization()(Conv6) # Conv Layer 7 Conv7 = Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat6) #Drop7 = Dropout(0.2)(Conv7) Bat7 = BatchNormalization()(Conv7) # Pooling 3 Pool3 = MaxPooling2D(pool_size=pool_size)(Bat7) # Conv Layer 8 Conv8 = Conv2D(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Pool3) #Drop8 = Dropout(0.2)(Conv8) Bat8 = BatchNormalization()(Conv8) # Conv Layer 9 Conv9 = Conv2D(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat8) #Drop9 = Dropout(0.2)(Conv9) Bat9 = BatchNormalization()(Conv9) # Pooling 4 Pool4 = MaxPooling2D(pool_size=pool_size)(Bat9) # Upsample 1 to Deconv 1 Deconv1 = Conv2DTranspose(128, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(Pool4) #Up1 = UpSampling2D(size=pool_size)(Pool4) Mer1 = concatenate([Deconv1, Bat9], axis=-1) # Deconv 2 Deconv2 = Conv2DTranspose(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer1) DBat2 = BatchNormalization()(Deconv2) # Deconv 3 Deconv3 = Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat2) DBat3 = BatchNormalization()(Deconv3) # Upsample 2 to Deconv 4 Deconv4 = Conv2DTranspose(64, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(DBat3) #Up2 = UpSampling2D(size=pool_size)(DBat2) Mer2 = concatenate([Deconv4, Bat7], axis=-1) # Deconv 5 Deconv5 = Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer2) DBat5 = BatchNormalization()(Deconv5) # Deconv 6 Deconv6 = Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat5) DBat6 = BatchNormalization()(Deconv6) # Upsample 3 to Deconv 7 Deconv7 = Conv2DTranspose(32, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(DBat6) #Up3 = UpSampling2D(size=pool_size)(DBat4) Mer3 = concatenate([Deconv7, Bat5], axis=-1) # Deconv 8 Deconv8 = Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer3) DBat8 = BatchNormalization()(Deconv8) # Deconv 9 Deconv9 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat8) DBat9 = BatchNormalization()(Deconv9) # Deconv 10 Deconv10 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat9) DBat10 = BatchNormalization()(Deconv10) # Upsample 4 to Deconv 11 Deconv11 = Conv2DTranspose(16, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(DBat10) #Up4 = UpSampling2D(size=pool_size)(DBat7) Mer4 = concatenate([Deconv11, Bat2], axis=-1) # Deconv 12 Deconv12 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer4) DBat12 = BatchNormalization()(Deconv12) # Deconv 13 Deconv13 = Conv2DTranspose(8, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat12) DBat13 = BatchNormalization()(Deconv13) # Final layer - only including one channel so 1 filter Final = Conv2DTranspose(1, (3, 3), padding='same', strides=(1,1), activation = 'relu')(DBat13) ### End of network ### model = Model(inputs=Inputs, outputs=Final) # Using a generator to help the model use less data # Channel shifts help with shadows slightly datagen = ImageDataGenerator(channel_shift_range=0.2) datagen.fit(X_train) # Compiling and training the model model.compile(optimizer='Adam', loss='mean_squared_error') model.fit_generator(datagen.flow(X_train, y_train, batch_size=batch_size), steps_per_epoch=len(X_train)/batch_size, epochs=epochs, verbose=1, validation_data=(X_val, y_val)) # Freeze layers since training is done model.trainable = False model.compile(optimizer='Adam', loss='mean_squared_error') # Save model architecture and weights model.save('full_CNN_model_HYe15.h5') # Show summary of model model.summary() plot_model(model, to_file='model.png')
测试
将网络model训练好之后,进行测试,每帧图像的检测结果是其前5帧图像检测结果的平均。
import numpy as np import cv2 from scipy.misc import imresize from moviepy.editor import VideoFileClip from IPython.display import HTML from keras.models import load_model import matplotlib.pyplot as plt # Load Keras model model = load_model('full_CNN_model_HY.h5') # Class to average lanes with class Lanes(): def __init__(self): self.recent_fit = [] self.avg_fit = [] def road_lines(image): """ Takes in a road image, re-sizes for the model, predicts the lane to be drawn from the model in G color, recreates an RGB image of a lane and merges with the original road image. """ # Get image ready for feeding into model small_img = imresize(image, (80, 160, 3)) small_img = np.array(small_img) small_img = small_img[None,:,:,:] # Make prediction with neural network (un-normalize value by multiplying by 255) prediction = model.predict(small_img)[0] * 255 # Add lane prediction to list for averaging lanes.recent_fit.append(prediction) # Only using last five for average if len(lanes.recent_fit) > 5: lanes.recent_fit = lanes.recent_fit[1:] # Calculate average detection lanes.avg_fit = np.mean(np.array([i for i in lanes.recent_fit]), axis = 0) # Generate fake R & B color dimensions, stack with G blanks = np.zeros_like(lanes.avg_fit).astype(np.uint8) lane_drawn = np.dstack((blanks, lanes.avg_fit, blanks)) # Re-size to match the original image lane_image = imresize(lane_drawn, (720, 1280, 3)) #plt.imshow(lane_image) #plt.show() # Merge the lane drawing onto the original image result = cv2.addWeighted(image, 1, lane_image, 1, 0) return result lanes = Lanes() # Where to save the output video vid_output = 'project_video_hy.mp4' # Location of the input video clip1 = VideoFileClip("project_video.mp4") vid_clip = clip1.fl_image(road_lines) vid_clip.write_videofile(vid_output, audio=False)
测试结果
测试结果还能够,以后将继续研究利用深度学习对多个车道进行检测,未完待续。。。。。。
深度学习车道线检测 相关内容
深度学习中文ocr 深度学习里的神经网络 深度学习算法数学推导 深度学习图像动漫化 深度学习工具做用 数据扩展 深度学习 深度学习中的可视化 中文标签 深度学习 深度学习论文笔记 深度学习图像分类论文 java实现变声效果---萝莉、大叔、肥仔、搞怪、慢吞吞、网红女、困兽 CSDN测试课程 前端小白入门必修-小米商城开发实战(下)
原
深度学习的车道线检测网络
一.分割模型
1. SCNN,Github,TuSimple Benchmark lane Detection challenge得到了第一名,准确率为96.53%,结合车道线特征提出的专门的分割模型,主要贡献点是spati cnn结构的提出,这个结构主要适用于条状目标的检测例如车道线,电线,电线杆等。
2. LaneNet ,Github 该算法在图森的车道线数据集上的准确率为96.4%,在NVIDIA 1080 TI上的处理速度为52FPS。 多任务的车道线检测模型 一个分支学习一个中间层的特征图用于统计车道线数目,一个分支去分割车道线(二分类,相比于多分类这里参数少了,计算量小了,准确率高了),结合两个分支,利用聚类计算车道线的数量,最终实现车道线的实例分割,最终经过HNet学习透视矩阵参数去实现车道线的拟合
3.ENet-SAD ,Github 自注意力蒸馏分割网络,主干网络采用ENet 在E2-E4模块加上SAD模块去产生一个注意力特征图,利用loss去缩小相邻模块之间的差距,从而让网络可以实现低层模仿高层学习,加快网络的收敛速度。比SCNN参数少20倍,快10倍
二.目标检测模型
Vpgnet ,Github 多任务的车道线检测,有目标检测任务 ,分割任务,以及坐标点回归任务相结合,实现车道线的检测。
1 原理
以前我曾采用传统方法实现了一下车道线检测,
http://www.noobyard.com/article/p-tsohlvro-nt.html
车道线检测是无人车系统里感知模块的重要组成部分。利用视觉算法的车道线检测解决方案是一种较为常看法决方案。视觉检测方案主要基于图像算法,检测出图片中行车道路的车道线标志区域。
高速公路上的车道线检测是一项具备挑战性的任务,因为车道线标志的种类繁多,车辆拥挤形成车道线标志区域被遮挡,车道线可能有腐蚀磨损的状况,以及天气等因素都能给车道线检测任务带来不小的挑战。过去,大部分车道线检测算法基本是经过卷积滤波方法,识别分割出车道线区域,而后结合霍夫变换、RANSAC等算法进行车道线检测,这类算法须要人工手动去调滤波算子,根据算法所针对的街道场景特色手动调节参数,工做量大且鲁棒性较差,当行车环境出现明显变化时,车道线的检测效果不佳。参考个人上一篇博文
https://blog.csdn.net/xiao__run/article/details/82746319
本文基于传统车道线检测算法,结合深度学习技术,提出了一种使用深度神经网络,代替传统算法中手动调滤波算子,对高速公路上的车道线进行语义分割.基本结构就是采用segnet相似结构网络与分割车道线区间,基于fcn, segnet, Unet语义分割网络基本都是encoding,decoding结构,小编在特征选取上作了基本改进,效果还能看.
1 最后一层采用1个3x3的卷积核去回归.
2 采用adam去优化,
3 loss采用loss=‘mean_squared_error’
2 数据集样本
数据集如图所示
分割图
网络model以下:
""" This file contains code for a fully convolutional (i.e. contains zero fully connected layers) neural network for detecting lanes. This version assumes the inputs to be road images in the shape of 80 x 160 x 3 (RGB) with the labels as 80 x 160 x 1 (just the G channel with a re-drawn lane). Note that in order to view a returned image, the predictions is later stacked with zero'ed R and B layers and added back to the initial road image. """python import numpy as np import pickle import cv2 from sklearn.utils import shuffle from sklearn.model_selection import train_test_split # Import necessary items from Keras from keras.models import Model from keras.layers import Activation, Dropout, UpSampling2D, concatenate, Input from keras.layers import Conv2DTranspose, Conv2D, MaxPooling2D from keras.layers.normalization import BatchNormalization from keras.preprocessing.image import ImageDataGenerator from keras.utils import plot_model from keras import regularizers # Load training images train_images = pickle.load(open("full_CNN_train.p", "rb" )) # Load image labels labels = pickle.load(open("full_CNN_labels.p", "rb" )) # Make into arrays as the neural network wants these train_images = np.array(train_images) labels = np.array(labels) # Normalize labels - training images get normalized to start in the network labels = labels / 255 # Shuffle images along with their labels, then split into training/validation sets train_images, labels = shuffle(train_images, labels) # Test size may be 10% or 20% X_train, X_val, y_train, y_val = train_test_split(train_images, labels, test_size=0.1) # Batch size, epochs and pool size below are all paramaters to fiddle with for optimization batch_size = 16 epochs = 10 pool_size = (2, 2) #input_shape = X_train.shape[1:] ### Here is the actual neural network ### # Normalizes incoming inputs. First layer needs the input shape to work #BatchNormalization(input_shape=input_shape) Inputs = Input(batch_shape=(None, 80, 160, 3)) # Below layers were re-named for easier reading of model summary; this not necessary # Conv Layer 1 Conv1 = Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Inputs) Bat1 = BatchNormalization()(Conv1) # Conv Layer 2 Conv2 = Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Conv1) Bat2 = BatchNormalization()(Conv2) # Pooling 1 Pool1 = MaxPooling2D(pool_size=pool_size)(Conv2) # Conv Layer 3 Conv3 = Conv2D(32, (3, 3), padding = 'valid', strides=(1,1), activation = 'relu')(Pool1) #Drop3 = Dropout(0.2)(Conv3) Bat3 = BatchNormalization()(Conv3) # Conv Layer 4 Conv4 = Conv2D(32, (3, 3), padding = 'valid', strides=(1,1), activation = 'relu')(Bat3) #Drop4 = Dropout(0.5)(Conv4) Bat4 = BatchNormalization()(Conv4) # Conv Layer 5 Conv5 = Conv2D(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat4) #Drop5 = Dropout(0.2)(Conv5) Bat5 = BatchNormalization()(Conv5) # Pooling 2 Pool2 = MaxPooling2D(pool_size=pool_size)(Bat5) # Conv Layer 6 Conv6 = Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Pool2) #Drop6 = Dropout(0.2)(Conv6) Bat6 = BatchNormalization()(Conv6) # Conv Layer 7 Conv7 = Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat6) #Drop7 = Dropout(0.2)(Conv7) Bat7 = BatchNormalization()(Conv7) # Pooling 3 Pool3 = MaxPooling2D(pool_size=pool_size)(Bat7) # Conv Layer 8 Conv8 = Conv2D(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Pool3) #Drop8 = Dropout(0.2)(Conv8) Bat8 = BatchNormalization()(Conv8) # Conv Layer 9 Conv9 = Conv2D(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat8) #Drop9 = Dropout(0.2)(Conv9) Bat9 = BatchNormalization()(Conv9) # Pooling 4 Pool4 = MaxPooling2D(pool_size=pool_size)(Bat9) # Upsample 1 Up1 = UpSampling2D(size=pool_size)(Pool4) Mer1 = concatenate([Up1, Bat9], axis=-1) # Deconv 1 Deconv1 = Conv2DTranspose(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer1) #DDrop1 = Dropout(0.2)(Deconv1) DBat1 = BatchNormalization()(Deconv1) # Deconv 2 Deconv2 = Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat1) #DDrop2 = Dropout(0.2)(Deconv2) DBat2 = BatchNormalization()(Deconv2) # Upsample 2 Up2 = UpSampling2D(size=pool_size)(DBat2) Mer2 = concatenate([Up2, Bat7], axis=-1) # Deconv 3 Deconv3 = Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer2) #DDrop3 = Dropout(0.2)(Deconv3) DBat3 = BatchNormalization()(Deconv3) # Deconv 4 Deconv4 = Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat3) #DDrop4 = Dropout(0.2)(Deconv4) DBat4 = BatchNormalization()(Deconv4) # Upsample 3 Up3 = UpSampling2D(size=pool_size)(DBat4) Mer3 = concatenate([Up3, Bat5], axis=-1) # Deconv 5 Deconv5 = Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer3) #DDrop5 = Dropout(0.2)(Deconv5) DBat5 = BatchNormalization()(Deconv5) # Deconv 6 Deconv6 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat5) #DDrop6 = Dropout(0.2)(Deconv6) DBat6 = BatchNormalization()(Deconv6) # Deconv 7 Deconv7 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat6) #DDrop7 = Dropout(0.2)(Deconv7) DBat7 = BatchNormalization()(Deconv7) # Upsample 4 Up4 = UpSampling2D(size=pool_size)(DBat7) Mer4 = concatenate([Up4, Bat2], axis=-1) # Deconv 8 Deconv8 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer4) #DDrop8 = Dropout(0.2)(Deconv8) DBat8 = BatchNormalization()(Deconv8) # Deconv 9 Deconv9 = Conv2DTranspose(8, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat8) #DDrop9 = Dropout(0.2)(Deconv9) DBat9 = BatchNormalization()(Deconv9) # Final layer - only including one channel so 1 filter Final = Conv2DTranspose(1, (3, 3), padding='same', strides=(1,1), activation = 'relu')(DBat9) ### End of network ### model = Model(inputs=Inputs, outputs=Final) # Using a generator to help the model use less data # Channel shifts help with shadows slightly datagen = ImageDataGenerator(channel_shift_range=0.2) datagen.fit(X_train) # Compiling and training the model model.compile(optimizer='Adam', loss='mean_squared_error') model.fit_generator(datagen.flow(X_train, y_train, batch_size=batch_size), steps_per_epoch=len(X_train)/batch_size, epochs=epochs, verbose=1, validation_data=(X_val, y_val)) # Freeze layers since training is done model.trainable = False model.compile(optimizer='Adam', loss='mean_squared_error') # Save model architecture and weights model.save('full_CNN_model.h5') # Show summary of model model.summary() plot_model(model, to_file='model.png')
进一步改进
接下来我不是直接采用unsample,采用转置卷积进行上采样,一样进行一个多尺度特征融合
,训练代码以下:
""" This file contains code for a fully convolutional (i.e. contains zero fully connected layers) neural network for detecting lanes. This version assumes the inputs to be road images in the shape of 80 x 160 x 3 (RGB) with the labels as 80 x 160 x 1 (just the G channel with a re-drawn lane). Note that in order to view a returned image, the predictions is later stacked with zero'ed R and B layers and added back to the initial road image. """ import numpy as np import pickle #import cv2 from sklearn.utils import shuffle from sklearn.model_selection import train_test_split # Import necessary items from Keras from keras.models import Model from keras.layers import Activation, Dropout, UpSampling2D, concatenate, Input from keras.layers import Conv2DTranspose, Conv2D, MaxPooling2D from keras.layers.normalization import BatchNormalization from keras.preprocessing.image import ImageDataGenerator from keras.utils import plot_model from keras import regularizers # Load training images train_images = pickle.load(open("full_CNN_train.p", "rb" )) # Load image labels labels = pickle.load(open("full_CNN_labels.p", "rb" )) # Make into arrays as the neural network wants these train_images = np.array(train_images) labels = np.array(labels) # Normalize labels - training images get normalized to start in the network labels = labels / 255 # Shuffle images along with their labels, then split into training/validation sets train_images, labels = shuffle(train_images, labels) # Test size may be 10% or 20% X_train, X_val, y_train, y_val = train_test_split(train_images, labels, test_size=0.1) # Batch size, epochs and pool size below are all paramaters to fiddle with for optimization batch_size = 16 epochs = 10 pool_size = (2, 2) #input_shape = X_train.shape[1:] ### Here is the actual neural network ### # Normalizes incoming inputs. First layer needs the input shape to work #BatchNormalization(input_shape=input_shape) Inputs = Input(batch_shape=(None, 80, 160, 3)) # Below layers were re-named for easier reading of model summary; this not necessary # Conv Layer 1 Conv1 = Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Inputs) Bat1 = BatchNormalization()(Conv1) # Conv Layer 2 Conv2 = Conv2D(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Conv1) Bat2 = BatchNormalization()(Conv2) # Pooling 1 Pool1 = MaxPooling2D(pool_size=pool_size)(Conv2) # Conv Layer 3 Conv3 = Conv2D(32, (3, 3), padding = 'valid', strides=(1,1), activation = 'relu')(Pool1) #Drop3 = Dropout(0.2)(Conv3) Bat3 = BatchNormalization()(Conv3) # Conv Layer 4 Conv4 = Conv2D(32, (3, 3), padding = 'valid', strides=(1,1), activation = 'relu')(Bat3) #Drop4 = Dropout(0.5)(Conv4) Bat4 = BatchNormalization()(Conv4) # Conv Layer 5 Conv5 = Conv2D(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat4) #Drop5 = Dropout(0.2)(Conv5) Bat5 = BatchNormalization()(Conv5) # Pooling 2 Pool2 = MaxPooling2D(pool_size=pool_size)(Bat5) # Conv Layer 6 Conv6 = Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Pool2) #Drop6 = Dropout(0.2)(Conv6) Bat6 = BatchNormalization()(Conv6) # Conv Layer 7 Conv7 = Conv2D(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat6) #Drop7 = Dropout(0.2)(Conv7) Bat7 = BatchNormalization()(Conv7) # Pooling 3 Pool3 = MaxPooling2D(pool_size=pool_size)(Bat7) # Conv Layer 8 Conv8 = Conv2D(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Pool3) #Drop8 = Dropout(0.2)(Conv8) Bat8 = BatchNormalization()(Conv8) # Conv Layer 9 Conv9 = Conv2D(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Bat8) #Drop9 = Dropout(0.2)(Conv9) Bat9 = BatchNormalization()(Conv9) # Pooling 4 Pool4 = MaxPooling2D(pool_size=pool_size)(Bat9) # Upsample 1 to Deconv 1 Deconv1 = Conv2DTranspose(128, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(Pool4) #Up1 = UpSampling2D(size=pool_size)(Pool4) Mer1 = concatenate([Deconv1, Bat9], axis=-1) # Deconv 2 Deconv2 = Conv2DTranspose(128, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer1) DBat2 = BatchNormalization()(Deconv2) # Deconv 3 Deconv3 = Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat2) DBat3 = BatchNormalization()(Deconv3) # Upsample 2 to Deconv 4 Deconv4 = Conv2DTranspose(64, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(DBat3) #Up2 = UpSampling2D(size=pool_size)(DBat2) Mer2 = concatenate([Deconv4, Bat7], axis=-1) # Deconv 5 Deconv5 = Conv2DTranspose(64, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer2) DBat5 = BatchNormalization()(Deconv5) # Deconv 6 Deconv6 = Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat5) DBat6 = BatchNormalization()(Deconv6) # Upsample 3 to Deconv 7 Deconv7 = Conv2DTranspose(32, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(DBat6) #Up3 = UpSampling2D(size=pool_size)(DBat4) Mer3 = concatenate([Deconv7, Bat5], axis=-1) # Deconv 8 Deconv8 = Conv2DTranspose(32, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer3) DBat8 = BatchNormalization()(Deconv8) # Deconv 9 Deconv9 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat8) DBat9 = BatchNormalization()(Deconv9) # Deconv 10 Deconv10 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat9) DBat10 = BatchNormalization()(Deconv10) # Upsample 4 to Deconv 11 Deconv11 = Conv2DTranspose(16, (2, 2), padding='valid', strides=(2,2), activation = 'relu')(DBat10) #Up4 = UpSampling2D(size=pool_size)(DBat7) Mer4 = concatenate([Deconv11, Bat2], axis=-1) # Deconv 12 Deconv12 = Conv2DTranspose(16, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(Mer4) DBat12 = BatchNormalization()(Deconv12) # Deconv 13 Deconv13 = Conv2DTranspose(8, (3, 3), padding='valid', strides=(1,1), activation = 'relu')(DBat12) DBat13 = BatchNormalization()(Deconv13) # Final layer - only including one channel so 1 filter Final = Conv2DTranspose(1, (3, 3), padding='same', strides=(1,1), activation = 'relu')(DBat13) ### End of network ### model = Model(inputs=Inputs, outputs=Final) # Using a generator to help the model use less data # Channel shifts help with shadows slightly datagen = ImageDataGenerator(channel_shift_range=0.2) datagen.fit(X_train) # Compiling and training the model model.compile(optimizer='Adam', loss='mean_squared_error') model.fit_generator(datagen.flow(X_train, y_train, batch_size=batch_size), steps_per_epoch=len(X_train)/batch_size, epochs=epochs, verbose=1, validation_data=(X_val, y_val)) # Freeze layers since training is done model.trainable = False model.compile(optimizer='Adam', loss='mean_squared_error') # Save model architecture and weights model.save('full_CNN_model.h5') # Show summary of model model.summary() plot_model(model, to_file='model.png')
测试
训练获得model以后,接下来咱们用视频进行测试一下:
测试时采用连续5帧作的一个平均
import numpy as np import cv2 from scipy.misc import imresize from moviepy.editor import VideoFileClip from IPython.display import HTML from keras.models import load_model import matplotlib.pyplot as plt # Load Keras model model = load_model('full_CNN_model.h5') # Class to average lanes with class Lanes(): def __init__(self): self.recent_fit = [] self.avg_fit = [] def road_lines(image): """ Takes in a road image, re-sizes for the model, predicts the lane to be drawn from the model in G color, recreates an RGB image of a lane and merges with the original road image. """ # Get image ready for feeding into model small_img = imresize(image, (80, 160, 3)) small_img = np.array(small_img) small_img = small_img[None,:,:,:] # Make prediction with neural network (un-normalize value by multiplying by 255) prediction = model.predict(small_img)[0] * 255 # Add lane prediction to list for averaging lanes.recent_fit.append(prediction) # Only using last five for average if len(lanes.recent_fit) > 5: lanes.recent_fit = lanes.recent_fit[1:] # Calculate average detection lanes.avg_fit = np.mean(np.array([i for i in lanes.recent_fit]), axis = 0) # Generate fake R & B color dimensions, stack with G blanks = np.zeros_like(lanes.avg_fit).astype(np.uint8) lane_drawn = np.dstack((blanks, lanes.avg_fit, blanks)) # Re-size to match the original image lane_image = imresize(lane_drawn, (720, 1280, 3)) #plt.imshow(lane_image) #plt.show() # Merge the lane drawing onto the original image result = cv2.addWeighted(image, 1, lane_image, 1, 0) return result lanes = Lanes() # Where to save the output video vid_output = 'harder_challenge.mp4' clip1 = VideoFileClip("project_video.mp4") vid_clip = clip1.fl_image(road_lines) vid_clip.write_videofile(vid_output, audio=False)
结果
结果如图所示:
效果勉强还能看,后续我将会作进一步改进,例如使用E-Net网络加速,采用lanenet实例分割,聚类等算法训练图森的数据集,获得更加好的效果
车道线检测做为自动驾驶领域的常规工做,在深度学习的浪潮中又有了很大的进步,在此分享我所作的调研工做,部分为ppt截图,为了方便请谅解。
- [论文极简笔记]SCNN-Spatial As Deep: Spatial CNN for Traffic Scene Understanding
- [论文极简笔记] Towards End-to-End Lane Detection: an Instance Segmentation Approach
- [论文极简笔记] LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environm
- [论文极简笔记] Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks
如上图所示,车道线检测工做的baseline并不明确,不一样的方法与不一样的场景应用都有各自的局限性。例如:
如上图所示,在评判ture or false时,主要有两种方式:
如上图所示,目前的主流方法pipeline分为多阶段与单阶段。
引用github项目 awesome-lane-detection
Paper
2019
《Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks》
《End-to-end Lane Detection through Differentiable Least-Squares Fitting》 github
2018
《End to End Video Segmentation for Driving : Lane Detection For Autonomous Car》
《3D-LaneNet: end-to-end 3D multiple lane detection》
《Efficient Road Lane Marking Detection with Deep Learning》 DSP 2018
《Multiple Lane Detection Algorithm Based on Optimised Dense Disparity Map Estimation》 IST 2018
《LineNet: a Zoomable CNN for Crowdsourced High Definition Maps Modeling in Urban Environments》
《Real-time stereo vision-based lane detection system》
《LaneNet: Real-Time Lane Detection Networks for Autonomous Driving》
《EL-GAN: Embedding Loss Driven Generative Adversarial Networks for Lane Detection》
《Real-time Lane Marker Detection Using Template Matching with RGB-D Camera》
《Towards End-to-End Lane Detection: an Instance Segmentation Approach》 论文解读 github
《Lane Detection and Classification for Forward Collision Warning System Based on Stereo Vision》
《(SCNN)Spatial As Deep: Spatial CNN for Traffic Scene Understanding》 AAAI 2018 CSDN Translator
《Lane Detection Based on Inverse Perspective Transformation and Kalman Filter》
2017
《A review of recent advances in lane detection and departure warning system》
《Deep Learning Lane Marker Segmentation From Automatically Generated Labels》 Youtube
VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition ICCV 2017 github
Code
https://github.com/wvangansbeke/LaneDetection_End2End
https://github.com/georgesung/advanced_lane_detection
https://github.com/MaybeShewill-CV/lanenet-lane-detection
https://github.com/XingangPan/SCNN
https://github.com/davidawad/Lane-Detection
https://github.com/yang1688899/CarND-Advanced-Lane-Lines
https://github.com/SeokjuLee/VPGNet
https://github.com/mvirgo/MLND-Capstone:Lane Detection with Deep Learning
https://github.com/galenballew/SDC-Lane-and-Vehicle-Detection-Tracking
https://github.com/shawshany/Advance_LaneFinding
Blog
Lane Detection with Deep Learning (Part 1)
Simple Lane Detection with OpenCV
Building a lane detection system using Python 3 and OpenCV
Datasets
A Dataset for Lane Instance Segmentation in Urban Environments
后面会各开一篇详细介绍
论文:Towards End-to-End Lane Detection: an Instance Segmentation Approach
代码:https://github.com/MaybeShewill-CV/lanenet-lane-detection
参考:车道线检测算法LaneNet + H-Net(论文解读)
数据集:Tusimple
本文提出一种端到端的车道线检测算法,包含 LanNet + H-Net 两个网络模型。其中 LanNet 是一种将语义分割和对像素进行向量表示结合起来的多任务模型,最后利用聚类完成对车道线的实例分割。H-Net 是有个小的网络结构,负责预测变换矩阵 H,使用转换矩阵 H 对同属一条车道线的全部像素点进行从新建模(使用 y 坐标来表示 x 坐标)。
论文中将实例分割任务拆解成语义分割(LanNet 一个分支)和聚类(LanNet一个分支提取 embedding express, Mean-Shift 聚类)两部分。如上图所示,LanNet 有两个分支任务,分别为 a lane segmentation branch and a lane embedding branch。Segmentation branch负责对输入图像进行语义分割(对像素进行二分类,判断像素属于车道线仍是背景);Embedding branch对像素进行嵌入式表示,训练获得的embedding向量用于聚类。最后将两个分支的结果进行结合利用 Mean-Shift 算法进行聚类,获得实例分割的结果。
在设计语义分割模型时,论文主要考虑了如下两个方面:
1.在构建label时,为了处理遮挡问题,论文对被车辆遮挡的车道线和虚线进行了还原;
2. Loss使用交叉熵,为了解决样本分布不均衡的问题(属于车道线的像素远少于属于背景的像素),参考论文ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation ,使用了boundedinverse class weight对loss进行加权:
$W_{class} = \frac{1}{ln(c+p_(class))}$
其中,p 为对应类别在整体样本中出现的几率,c 是超参数(ENet论文中是1.02,使得权重的取值区间为[1,50])。
为了区分车道线上的像素属于哪条车道,embedding_branch 为每一个像素初始化一个 embedding 向量,而且在设计 loss 时,使属同一条车道线的像素向量距离很小,属不一样车道线的像素向量距离很大。
这部分的loss函数是由两部分组成:方差loss($L_{var}$)和距离loss($L_{dist}$):
$L_{var} = \frac{1}{C} \sum_{c=1}^{C} \frac{1}{N_c} [\Arrowvert \mu_c-x_i \Arrowvert -\delta_v]_{+}^{2}$
$L_{dist} = \frac{1}{C(C-1)} \sum_{c_A=1}^{C} \sum_{c_B=1, c_A \ne C_B}^{C} [\delta_d - \Arrowvert \mu_{c_A} - \mu_{c_B} \Arrowvert ]_{+}^{2}$
其中,C 是车道线数量,$N_c$ 是属同一条车道线的像素点数量,$\mu_c$ 是车道线的均值向量,$x_i$ 是像素向量(pixel embedding),$[x]_+ = max(0, x)$。
该 loss 函数源自于论文 《Semantic Instance Segmentation with a Discriminative loss function》 ,该论文中还有一个正则项,本文没有用:
$L_{reg} = \frac{1}{C} \sum_{c=1}^{C} \Arrowvert \mu_c \Arrowvert $
同一车道线的像素向量,距离车道线均值向量 $\mu_c$ 超过 $\delta_v$ 时, pull force($L_{var}$) 才有意义,使得 $x_i$ 靠近 $\delta_d$;
不一样车道线的均值向量 $\mu_{c_A}$ 和 $\mu_{c_B}$ 之间距离小于 $\delta_d$ 时,push force($L_{dist}$) 才有意义,使得 $\mu_{c_A}$ 和 $\mu_{c_B}$ 彼此远离。
注意,聚类能够看作是个后处理,上一步里 embedding_branch 已经为聚类提供好的特征向量了,利用这些特征向量咱们能够利用任意聚类算法来完成实例分割的目标。
为了方便聚类,论文中设定 $\delta_d > 6\delta_v$。
在进行聚类时,首先使用 mean shift 聚类,使得簇中心沿着密度上升的方向移动,防止将离群点选入相同的簇中;以后对像素向量进行划分:以簇中心为圆心,以 $2\delta_v$ 为半径,选取圆中全部的像素归为同一车道线。重复该步骤,直到将全部的车道线像素分配给对应的车道。
LaneNet是基于ENet 的encoder-decoder模型,如图5所示,ENet由5个stage组成,其中stage2和stage3基本相同,stage1,2,3属于encoder,stage4,5属于decoder。
如上图所示,在LaneNet 中,语义分割和实例分割两个任务共享 stage1 和 stage2,并将 stage3 和后面的 decoder 层做为各自的分支(branch)进行训练;其中,语义分割分支(branch)的输出 shape 为W*H*2,实例分割分支(branch)的输出 shape 为W*H*N,W,H分别为原图宽和高,N 为 embedding vector 的维度;两个分支的loss权重相同。
LaneNet的输出是每条车道线的像素集合,还须要根据这些像素点回归出一条车道线。传统的作法是将图片投影到 bird’s-eye view 中,而后使用 2 阶或者 3 阶多项式进行拟合。在这种方法中,变换矩阵 H 只被计算一次,全部的图片使用的是相同的变换矩阵,这会致使地平面(山地,丘陵)变化下的偏差。
为了解决这个问题,论文训练了一个能够预测变换矩阵 H 的神经网络 H-Net,网络的输入是图片,输出是变换矩阵 H:
经过置 0 对转置矩阵进行约束,即水平线在变换下保持水平。(即坐标 y 的变换不受坐标 x 的影响)
由上式能够看出,转置矩阵 H 只有6个参数,所以H-Net的输出是一个 6 维的向量。H-Net 由 6 层普通卷积网络和一层全链接网络构成,其网络结构如图所示:
Curve fitting的过程就是经过坐标 y 去从新预测坐标 x 的过程:
$P^{'} = HP$
$w = (Y^TY)^{-1}Y^Tx^{'}$
$x_i^{'*} = \alpha y^{'2} + \beta y^{'} + \gamma$
$p_i^{*} = H^{-1}p_i^{'*}$
$Loss = \frac{1}{N} \sum_{i=1}^{N}(x_i^{*} - x_i)^2 $
Dataset : Tusimple
Embedding dimension = 4
δ_v=0.5
δ_d=3
Image size = 512*256
Adam optimizer
Learning rate = 5e-4
Batch size = 8
Dataset : Tusimple
3rd-orderpolynomial
Image size =128*64
Adam optimizer
Learning rate = 5e-5
Batch size = 10
$accuracy = \frac{2}{\frac{1}{recall} + \frac{1}{precision}}$
$recall = \frac{TP_1}{G_1}$
$precision = \frac{TP_0}{G_0}$
其中 $G_1$ 表明 GT二值图里像素值为 1 部分的数量,$TP_1$ 则表明预测结果里相对于 $G_1$, 预测正确的数量。
简单示例:
import numpy as np import tensorflow as tf out_logits = np.array([ [[0.1, 0.1, 0.1, 0.1, 0.1], [0.1, 0.2, 0.2, 0.8, 0.1], [0.1, 0.2, 0.2, 0.2, 0.1], [0.1, 0.2, 0.2, 0.2, 0.1], [0.1, 0.1, 0.1, 0.1, 0.1]], [[0.1, 0.1, 0.1, 0.1, 0.1], [0.1, 0.8, 0.8, 0.2, 0.1], [0.1, 0.8, 0.8, 0.8, 0.1], [0.1, 0.8, 0.8, 0.8, 0.1], [0.1, 0.1, 0.1, 0.1, 0.1]] ]) # 预测结果 binary_label = np.array([ [0, 0, 0, 0, 0], [0, 1, 1, 1, 0], [0, 1, 1, 1, 0], [0, 1, 1, 1, 0], [0, 0, 0, 0, 0] ]) # GT logits = np.transpose(out_logits, (1, 2, 0)) out_logits = tf.constant(logits, dtype=tf.float32) binary_label_tensor = tf.constant(binary_label, dtype=tf.int32) binary_label_tensor = tf.expand_dims(binary_label_tensor, axis=-1) # =================== pix_cls_ret: 对于 GT 中为 1 的部分,统计 Pre 中是否分对,1对0错 out_logits = tf.nn.softmax(logits=out_logits) out = tf.argmax(out_logits, axis=-1) # 最后那个维度上的 max_idx out = tf.expand_dims(out, axis=-1) # 增长一维 5x5 -> 5x5x1 idx = tf.where(tf.equal(binary_label_tensor, 1)) pix_cls_ret = tf.gather_nd(out, idx) # =================== recall: 以GT 中像素值为 1 为基数,统计 recall = TP_1 / G1 recall = tf.count_nonzero(pix_cls_ret) # TP_1 recall = tf.divide(recall, tf.cast(tf.shape(pix_cls_ret)[0], tf.int64)) # =================== pix_cls_ret: 对于 GT 中为 0 的部分,统计 Pre 中是否分对,0对1错 idx = tf.where(tf.equal(binary_label_tensor, 0)) pix_cls_ret = tf.gather_nd(out, idx) # =================== precision: 以 GT 中像素值为 0 为基数,统计 precision = TP_0 / G0 precision = tf.subtract(tf.cast(tf.shape(pix_cls_ret)[0], tf.int64), tf.count_nonzero(pix_cls_ret)) # TP_0 precision = tf.divide(precision, tf.cast(tf.shape(pix_cls_ret)[0], tf.int64)) accuracy = tf.divide(2.0, tf.divide(1.0, recall) + tf.divide(1.0, precision)) with tf.Session() as sess: out_logits = out_logits.eval() out = out.eval() idx = idx.eval() pix_cls_ret = pix_cls_ret.eval() recall = recall.eval() precision = precision.eval() print(accuracy)
View Code
LaneNet: Towards End-to-End Lane Detection: an Instance Segmentation Approach
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Discriminative Loss: Semantic Instance Segmentation with a Discriminative loss function
转载于:https://www.cnblogs.com/xuanyuyt/p/11523192.html