Image Formation Pipeline --- 从2D到3D（一）

时间 2020-12-25 标签计算机视觉图像处理

Why we need Image Formation Pipeline?

计算机视觉是用来做什么的？答案可能是图像处理、三维重建等等。今天我们要讨论的是图像处理，如何处理真实三维世界和摄像机拍下的二维图像之间的关系？而Image Formation Pipeline便是架起两者之间的桥梁（或者说管道更为合适？）

三维世界与二维图像——坐标系

一个存在于真实世界的点，是需要用三个维度来表示的(X, Y, Z)。如果我们用相机拍下这个点，得到一张图片，这个点就变成二维的了(x, y)。如果我们此时不能再次拍摄照片而需要得到一张从其他视觉角度(perspective)拍摄这个点的照片，我们便需要将这个点还原到真实世界，让它再次变成一个用三个维度表示的点，然后再把它投影到另一个perspection上。

这个过程便需要使用Image Formation Pipeline了。

三维坐标系：世界坐标系与相机坐标系

世界坐标系是观察物体所处的真实坐标系，也就是在我们所处的世界中的坐标系。
相机坐标系是一个线性齐次坐标系，关于线性齐次坐标系的知识请看这里。

线性齐次坐标系允许我们进行仿射变换。

二维坐标系：图片坐标系与像素坐标系

图片坐标系（2d相机坐标系）和像素坐标系都是二维坐标系，不同之处在于像素坐标系以图片的某一个角为坐标原点（本文假设为左上角），图片的每一个像素点的坐标都为整数（integer）。

image formation pipeline做的工作便是进行：世界坐标系—》相机坐标系—》图片坐标系–》像素坐标系，或者反过来的工作。

Image Formation Pipeline

首先我们来讨论如何将三维世界上的点坐标转换为平面像素坐标。

Perspective Projection — Pinhole Model

我们先来谈谈大家比较熟悉的针孔成像模型，也就是pinhole model。这一部建立起了相机坐标系和图片坐标系之间的关系。

The pinhole camera model describes the mathematical relationship between the coordinates of a point in three-dimensional space and its projection onto the image plane of an ideal pinhole camera, where the camera aperture is described as a point and no lenses are used to focus light. —— [ 维基百科 ]

Fc为相机所在位置，也是以相机为参照系的三维世界坐标系的原点
P点坐标（X, Y, Z）是它在以相机为参照系的三维世界坐标系中所在的位置
xy坐标系所在平面为成像平面
p点为P在成像平面上的投影点，这里坐标用(x, y)来表示
相机焦距为 f

得到公式：

f / y = Z / Y

y = f Y / Z

同理得到：

x = f X / Z

⎡ ⎣ ⎢ ⎢ ⎢ x c y c ω c ⎤ ⎦ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ f 00 0 f 0 001 ⎤ ⎦ ⎥ ⎥ ⎡ ⎣ ⎢ ⎢ X Y Z ⎤ ⎦ ⎥ ⎥

[x y] = [x c / ω y c / ω]

这样在perspective projection后，就得到了在图片坐标系下p的坐标（注意并不是像素坐标系）。

Shift & Scale — Heading to Pixel Coordinate

A digital image is made up of rows and columns of pixels. A pixel in such an image can be specified by saying which column and which row contains it. In terms of coordinates, a pixel can be identified by a pair of integers giving the column number and the row number.
Conventionally, columns are numbered from left to right, starting with zero. number rows from top to bottom, starting from zero.—— [Introduction to Computer Vision]

这一步做的是从图片坐标系到像素坐标系之间的转换。

在计算机中，一个图片是由行和列的像素（pixel）所组成的，像素便是一个点独一无二的位置标识符。

像素一般从左上开始，0开始递增，并且为整数。而我们上一步得到的坐标往往是带有小数的，而且(0, 0)这个点在和三维坐标轴相交的位置，所以需要对坐标轴进行平移(shift)以及改变比例(scale)来得到相应的像素点。

转换公式为：

[i m x i m y] = ⎡ ⎣ ⎢ ⎢ ⎢ 1 / s x 00 0 1 / s y 0 o x o y 1 ⎤ ⎦ ⎥ ⎥ ⎥ [x y]

其中( imx , imy )为p点在像素坐标系下的坐标，也就是我们能从图片上获取的坐标。
( ox , oy )是左上角点的坐标。
sx , sy 是缩放比例。

也可以将以上两部操作合起来得到矩阵K:

K = ⎡ ⎣ ⎢ ⎢ ⎢ f / s x 00 0 f / s y 0 o x o y 1 ⎤ ⎦ ⎥ ⎥ ⎥

i m p = K C P

K也被称为相机的内部参数(intrinstic parameter).

How about Real World? Rotation & Translation

由于编辑器卡到浏览器崩溃，所以这一部分挪到下一篇文章里。