GhostVLAD for set-based face recognition

时间 2019-11-06 标签 ghostvlad set based face recognition

GhostVLAD for set-based face recognition 中提到了文章解决的是template-based face recognition。网络

VLAD: vector of locally aggregated descriptors. 由Jegou et al.在2010年提出，其核心思想是aggregated(积聚)，主要应用于图像检索领域。less

文章的3个贡献：ide

提出一种网络来聚合并embed网络输出的面部特征向量至一个compact的固定长度的表示。
提出一个新奇的GhostVLAD层，其中包含ghost clusters，不对聚合作贡献。文中展现了一种高质量的自动加权方式来使得高质量的图像比低质量的图像贡献更多。而且这个ghost clusters能够提升网络能力来解决比较差质量的图像。
文中探索了特征维度，簇的数目，不一样训练技术对识别性能的影响。最后做者在IJB-B数据集上远超sota的identification和cerification指标。

那么这种set（template） based face recognition的难处何在？在于集合里的人脸可能有不一样的姿态，表情，光照，甚至质量的差别也很大。若是我给low-quality和high-quality同样的weight，那确定会hurt performance。因此网络应该更关注于informative ones。性能

比较set之间的类似性一个直接的作法就是我将每一个subject的全部人脸特征都存储起来，而后比较两个subject的每一对图像，这么作是很是耗存储和时间。所以聚合方法可以产生compact template representation。更重要的是，从image set获取的representation应当更加具备判别性。同一subject的template descriptors应当互相close，反之则far apart。尽管一些工做利用average pooling和max pooling能够聚合到一个比较compact的template representation，本文寻找一种更好的方案。本文灵感来源于图像检索中的编码方法：Fisher Vector encoding和T-embedding 增长从related和unrelated图像块提取到的描述子的可分性。因而做者也在利用了一种类似的encoding：NetVLAD来设计网络。做者拓展NetVLAD结构to include ghost clusters。将这些低质量人脸视为ghost clusters。尽管没有明确对template里的faces进行加权，这种特性自动会出现。即低质量人脸会contribute less。网络以端到端的方式训练，仅用identity-level labels。在IJB-A,IJB-B上面都有很大提高。测试

大体结构如上图：对一个template中的每一个图片提取特征，而后利用GhostVLAD层来聚合这些descriptors到单一固定长度的vectors。最后的D维template描述子由FC层来削减维度，并附有BN和L2正则。编码

这个网络应该有以下性质：spa

输入任意数量图像，输出固定长度的template descriptor来表征输入的image set
输出的template descriptor应当是compact的，或低维，使得存储较小便于更快的template comparisions。
输出的template descriptor应当是discriminative的，使得同一subject的templates之间的类似性大于与其余不一样subjects之间的类似性。（内聚性）

上面三条性质的实现方案分别以下：设计

利用一个修改后的NetVLAD层：GhostVLAD来聚合人脸描述子
经过一个trained layer实现维度缩减
由于整个网络end-to-end被训练，而且由于GhostVLAD层可以down-weight低质量图像的contribution，因此能够实现discriminative

本文的核心部件：GhostVLAD：NetVLAD with ghost clustersrest

这是个可训练的aggregation layer。给定N个DF维的面部向量，计算一个单一的DF乘K维的输出。它基于NetVLAD层实现了一个编码过程，相似于VLAD encoding。因此是可微可训练的。这个NetVLAD已被证明比average和max pooling的效果要好。这里简要回顾一下论文NetVLAD（NetVLAD: CNN architecture for weakly supervised place recognition）。orm

做者拓展NetVLAD with "ghost" clusters为GhostVLAD。即做者在原有的K个类簇中额外的加了G个“ghost”类簇来造成soft assignments。

使用ghost clusters的一个直觉就是使得网络更容易调整template中的每一个face example。这经过assigning examples to be ignored to the ghost clusters来实现的。例如对于一个highly blurry的人脸图像，将会被很大程度上assigned to a ghost cluster，使得它在non-ghost的clusters的权重就会趋近于0。那这样就使得它对于template representation的贡献是可忽略不计的。

一些训练细节：

为了perform set-based training，重复在线采样属于同一identity的固定数目的图像。

测试细节：

对于IJB-A和IJB-B作“1:1 face verification”和“1:N face identification”。

1:1 face verification的目的是决定两个templates是否属于同一人。经过设定templates之间的类似性阈值实现。验证性能由ROC曲线评估，也就是验证true accept rates（TAR）和false accept rates（FAR）的trade off。
1：N identification的作法是对于probe set的templates，要对给定的gallery中全部templates作评价。模型的评价方法有：true positive identification rate（TPIR）和false positive identification rate（FPIR）以及Rank-N。

结果：明显对低质量图像下降了权重。

论文：A Good Practice Towards Top Performance of Face Recognition: Transferred Deep Feature Fusion

A template refers to a collection of all media (images and/or video frames) of an interested face captured under different conditions that can be utilized as a combined single

representation for matching task.