you have to move center of your image which is not georeferenced to the exact location, this is the movement problem
and your question about pixels and cm, think about a resolution (100,100) with 2 different pixel sizes
1 - 1*1 cm2
2 - 1000*1000 cm2
do you think calculation in pixels is correct now?
I don't have any documents, unfortunately.
convert your image into an array.
find the x y of the center of the image for example (10 cm, 12 cm)
convert them to degrees
dx = Long - Xdeg
dy = Lat - Ydeg
A = move all the cells with this dx and dy
B = rotate A (flight azimuth)
C = scale B based on flight height and ..
(i'm not sure about dz)