This is a celebration of
camera-to-subject distance. Powerful and misunderstood in equal measure, it’s
an incredibly important property of any shot. We’ll go through some basic
theory, I’ll describe the psychological implications I believe it to have, and
we’ll look at how understanding these can affect the way you design shot
structures for a scene.
In normal life, things
that are closer to us appear bigger and things that are further away appear
smaller.
Let’s call depth the dimension passing through the
lens and receding from the camera in the direction it is pointing. The apparent
size at camera of an object of a given size depends on its distance to the
camera. The depth difference is the distance between two objects along the
depth. A depth difference amongst two objects of equal physical size will give
rise to a difference in apparent size at camera.
I want to call depth rendition what some people call
depth compression/expansion and yet others refer to as perspective, perspective
distortion or one of any number of terms. I use it to describe the difference
in apparent size caused by a given depth difference between objects of equal
physical size. If we create a depth difference between two identical objects, a
frame in which these two objects appear very different in size exhibits expanded depth rendition compared to a
frame in which they appear similar in size. The latter is said to be compressed in rendition.
Take a situation where objects A and B are the same physical size, but there
is a depth difference (A-B) between them giving rise to a difference in
apparent size at the camera. The ratio of A’s apparent size to B’s apparent
size is equal to the ratio between their distances to the camera. It can thus
be said that the difference in apparent size depends on the ratio between the
depth difference and the distance between the camera and the first object. It’s
not the absolute distance between the objects that counts, but rather how significant this distance is in
comparison to the camera-to-subject distance. If the camera is a hundred
feet away, a one foot separation between two objects is not going to make them
look very different in size in the frame. But that same one foot distance is
going to produce a massive size difference if the camera is two inches away
from the closer object – why? Because that same one foot is large in comparison
to two inches, but tiny in comparison to a hundred feet. Since depth rendition
is nothing but the apparent size difference created by a given depth
difference, we can say that depth rendition is a function of camera to
subject distance.
It’s the same mathematical relationship according to which focus has to
be pulled faster as the subject gets closer to the lens, or printing time
differences in the darkroom have to be measured relative to the total exposure.
Someone who prints in the darkroom will understand all this discussion in a
pinch, because if I ask them “How many stops extra is 2 seconds printing time?”
they’ll reply “It depends what your basic exposure was. It was 2 seconds,
another 2 seconds is a stop. If it was 160 seconds, you won’t see that extra
2”.
(The mathematically
inclined can enjoy a more rigorous proof here.)
Assuming you understand
the above, now consider what most photography students hear in a classroom, and
therefore carry through to their careers: “Depth
compression/expansion is the effect whereby the lens used alters the
relationship between depth difference and perceived size difference from what
is seen to be ‘normal’. Wide lenses exhibit depth expansion, whereby small
differences in depth give rise to large differences in perceived size, so that
an object can be made massive in relation to another just by moving it closer
to the lens, and size differences are exaggerated. Long lenses exhibit depth compression,
whereby differences in depth do not give rise to large perceived size
variations, so we can move something close to the lens or put it way in the
background and its perceived size does not change much.”
In much the same way that
non-experts in aviation have a mistaken view of how wings work, non-experts in photography
and cinematography have a mistaken view of how focal lengths affect depth
rendition. Depth rendition is not
affected by focal length. It depends only upon camera-to-subject distance.
If we maintain our camera-to-subject distance and change focal length, depth
rendition does not change. The only
way focal length affects depth rendition is if we decide to keep subject size
in the frame the same, because then we have to move the camera further from or closer to the subject and this
changes the depth rendition. So a wide lens seems to expand perspective because
it forces us to move closer. A long lens seems to compress perspective because
it forces us to move further back. In fact, there is no depth rendition
difference between using a 100m and cropping into a 32mm shot to obtain the
same field of view.
Think about what would
happen if this wasn’t the case: if each focal length had a unique depth
characteristic, and if using different focal lengths changed depth rendition
even if we did not move. Different sensor sizes force us to use different focal
lengths to achieve the same field of view, so every different sized sensor
would have different depth rendition characteristics. Very small sensors would
massively expand perspective whereas Imax cameras would compress them like
crazy. And we know that’s obviously not the case – if I stand a 2/3” chip
camera next to a 35mm camera, at the same distance from the subject, and use
whatever focal length is required to achieve the same field of view, the depth
rendition will be the same on both cameras even though I’m using a much longer
focal length on the larger chip camera.
Let’s be clear – we are not wrong when we swing a wider lens because
we want more depth expansion. But unless we realise that it’s not the lens
itself, but the fact this lens allows us to move closer than a longer lens to
attain the same subject size in the frame,
then we have an inaccurate picture of what’s really going on, and the reason I
make such a big deal out of a seemingly small incompleteness in understanding
is that without understanding this properly, it is impossible to make use of
the phenomenon I’m about to describe. Read on!
We’ve established that depth rendition depends only on viewer to subject
distance. This is true for the human eye, and it’s true for the camera. It must
follow that the visual system is used to using the depth cues in a scene to help
determine distances. No matter what lens you use, the distance between the
camera and the subject is ineluctable and is forever imprinted in the image in
the form of depth cues which the viewer’s visual system is subconsciously
reading all the time and using to draw conclusions about distance. This is the
central contention of this article. It is the reason street photographers are
told to ‘get in there’ rather than use a long lens. Once a director told me she
wanted to get ‘close’ to the characters by using a long lens. She may have a
unique way of seeing, but it’s more likely she misunderstood lenses; the way to
get close to the character is to move the camera closer!
On a short I shot earlier this year, our protagonist spots two
characters interacting across the road, and stops to look at them. We started
the scene tracking with him on a 35mm, and stopped with him. The director then
wanted a shot of the two characters across the road. I just walked across with
the 35mm on and lined up a two shot, and the director commented that it just
didn’t feel right. After thinking about it for a second, I realized that it was
a question of narrative point of view. We are telling this scene from this
character’s narrative point of view, so even if the shot of the characters
across the road is not a literal POV, it will only feel like the scene is
consistently in our lead’s narrative point of view if we keep the camera close
to him and switch to a longer lens to shoot the characters across the road. We
shot it on an 85mm. The camera to subject distance is many times greater than
the distance between any two elements in the frame, creating our so-called
depth compression. Our brain is used to reading these depth cues continuously,
so it effortlessly and subconsciously realizes that it is seeing these
characters from some distance away, and concludes that this must be something
like what our character across the road is seeing, even though the size of the two characters in the frame is identical
to how I had them in the 35mm version. The only thing that has changed is the
depth cues produced by moving the camera back across the road.
Instead of using the term camera-to-subject
distance, I’d like to call it proximity.
I owe this terminology to director Oscar Nobi, who came up with it whilst
discussing depth rendition in his film "Sorinne". To the writer Gul Davis, on the
other hand, I owe the term narrative
point of view, which I used in the example above. This doesn’t refer to
what we filmmakers understand as ‘POV’, where the camera is the character’s
eyes, but to whether we are telling the story from the perspective of a
particular character at any given point in the film. If there is considerable
physical distance between two characters and we choose to keep the camera
nearer one of them, we are telling the story from his narrative point of view,
no matter the focal lengths we use or what we point the camera at, because the eye reads the depth cues.
Consider instead a scene with only one character. What happens when we
vary the proximity? There’s something I’ll call emotional distance, which is what it sounds like – how close we
feel emotionally to a certain character. In real life, if we are physically
close to someone, chances are we have an emotional connection with them, and
conversely, we don’t usually share emotional moments across canyons so long
distances tend to evoke emotional detachment. This is not a straight
relationship - it depends on the context – but more often than not, emotional
distance is linked in some way to proximity.
Sometimes you wish to maintain a consistent proximity between two shots
for various reasons: narrative point of view, emotional distance, whatever it
is. In the example from my short, it was a question of coherent narrative point
of view throughout the scene. In this case it would make sense to use a similar
proximity but change the lens. Sometimes you wish to modulate proximity
throughout the scene, for example to get increasingly emotionally close to a
character as the scene goes on. In this case, physically moving the camera between
shots will do the trick. What about zoom moves versus dolly moves? Pushing in
to a character using a dolly alters the proximity, whereas zooming in does not.
Zoom moves are largely out of fashion nowadays, and that’s a shame, because
there is a unique feeling to a zoom move that can be very appropriate in
certain situations – we’re making things bigger or smaller in the frame whilst
keeping the proximity constant.
I want to emphasize here that once we have established what proximity we
want and why, and put our camera down in the desired place, the feeling created
by that proximity is not affected by our choice of focal length. How I like to
think of it, at least for the scenes in which we are clearly in a character’s
narrative point of view, is that given the correct proximity for that point of
view, different focal lengths create different impressions of focus of visual
attention. Here’s a very good example: in Casino Royale, there’s a scene at the
airport in which Bond is among the airplanes looking for the villain and spots
a dead body next to a fuel truck. We have a shot of Bond, and cut to something
like his point of view, and then have a series of shots cut together quickly,
where the camera stays where it is and a longer and longer lens is used for
each shot (or a zoom is punched in between each shot). This sequence is
incredibly effective in giving us the feeling that bond is ‘homing in’,
suddenly registering the importance of what he’s just seen, and focusing his
attention on it. It’s the feeling of a zoom move but done in a series of cuts
rather than a zooming in; it has a starkness and urgency that fits the scene.
Formalizing the above, it could be said that when we are roughly seeking
to mimic the human eye with the camera, proximity
controls the depth rendition; the feeling of closeness, and focal length
controls the impression of focus of visual attention. This is why I think the
term ‘normal lens’ is bogus. Back in my photography days one of the big
discussion points was whether the 35 or the 50 was the ‘normal lens’. Attempting
to find a focal length that matches the human eye is missing the point. We
can’t be trying to find a focal length that matches the depth rendition of the human visual system, because we’ve
established that depth rendition depends on where we stand and is independent
of focal length. So we must be trying to find a focal length with a field
of view that matches the human visual system. But the human visual system
is a dynamic entity that adapts to the situation at hand and cannot be thought
of simply as a camera with a prime lens on it, fixed at a certain field of
view. If we strain to be aware of our entire field of view, we can see at least
180ยบ horizontally and vertically, way more than an 8mm! But if we, like James
Bond, suddenly spot a dead body in the distance, our field of view,
perceptually speaking, decreases drastically as our attention becomes focused
on that area directly in front of us. When simulating the eye, instead of
trying to find a ‘normal lens’ we should think about what the perceptual focus
of attention would be for the situation at hand. As I said, changing focal
lengths is an excellent way of simulating various degrees of perceptual
attention on what is in front of us. Bond sees a dead body indicating the
presence of the villain: BANG, BANG, BANG, we cut to progressively longer focal
lengths, the perfect analogy for the focusing of attention on a small area of
our field of view.
Now that I’ve explained the importance of proximity, let’s pitch it
against focal length in terms of how working cameramen line up shots.
Assuming we’re on the same camera throughout the show, we have three
factors: subject size in frame, proximity and focal length. We will typically
choose two of these to our taste, which dictates what the final one should
be. I realize that real world
shooting situations are more complicated, that we may be blocking things to the
camera, etc, but I believe that most of the time we are, either consciously or
subconsciously, tending towards one of two mindsets:
1 )
Knowing the frame size we want, choosing a focal length, and moving the
camera into a position that gives us the desired frame size with the chosen
lens on.
2 )
Knowing the frame size we want, knowing the proximity we want, and
selecting the focal length that gives us the desired frame size from the chosen
camera position.
Although both of these mindsets can be executed easily with either a
zoom or primes, it is easy to imagine how someone who wasn’t too conscious of
how they were making these decisions would be prone to lapsing into mindset 1)
if using primes and mindset 2) if using a zoom.
Common wisdom has it that zoom lenses are bad training for a DP,
because you'll have a tendency to keep the camera where it is and just 'punch
in and out' on the zoom to reframe the shot. But just like using a zoom lens
may reinforce the habit of staying where one is and punching in, destroying any
field of view consistency and proximity variation, using primes may reinforce
the habit of staying on one field of view and moving around, destroying any
proximity consistency and field of view variation.
I’ll give an example. A character is in a room, another knocks and leans
against the doorjamb on the other side of the room. We want close-ups of both.
Your standard ‘prime trained’ shooter may elect to shoot both on the 50mm,
matching camera-to-subject distance between setups. But what if we want to tell
this scene from the narrative point of view of the character in the room? We
could shoot his close-up on a 35, and just turn the camera around and pop an 85
without moving for the other character’s close-up. The audience will
subconsciously read the depth cues, even if the head sizes in frame are exactly
the same, and as we cut between the two angles they will subliminally perceive
that the scene is being told from the narrative standpoint of the character in
the room.
Another James Bond example for the above: In the Macau casino segment in
Skyfall, bond spots Severine from the
other side of the room. The first time we see her, it’s in a wide from Bond’s
position. We then have a close up of Bond, shot on something like a 27 or 32.
For all we know they kept the camera where it was, same lens, and spun it
around. Then we cut to a long lens two-shot of Severine and her bodyguard from
the same camera position as the wide. She turns around and sees him. Cut back
to Bond’s close-up. The fact that the camera is kept where it is throughout
this small sequence means that we are firmly in Bond’s narrative point of view.
Up until now in the film, the camera has always been closer to Bond than it has
to Severine, so that when she finally approaches him it is the first time we
have a close proximity to her.
Isn’t that a higher level of storytelling with the camera compared to
popping a 50 and matching distances? Some will say: “But you’ve introduced
focal length inconsistency and
camera-to-subject distance (depth rendition) inconsistency!” But what we have
maintained is consistent narrative point of view: Bond’s. And that can be a
much more powerful storytelling tool than focal length or depth rendition
consistency. Of course, you may choose to treat both characters with equal
emotional distance, in which case the same lens and same camera-to-subject
distance may be entirely the right thing. But you’ve arrived at the solution
through a much more considered process than “Here are the close-ups, get the
50”.
Obviously beginning shooters don’t really use a conscious thought
process, they just go by eye and try to do the best they can. It is my
contention that once you are past being a total beginner at lining up shots,
there is a tendency to really focus on the focal length and get into that same
focal-length centric mindset that then leads you to say things like ‘The zoom
lens forms bad habits’. Zooms can form bad habits, but so can primes. The real
perversity of the whole anti-zoom school is that they will usually say that using
primes really teaches you about lenses and their unique depth characteristics whereas the truth is that unique depth
characteristics belong not to focal lengths but to camera-to-subject distances,
proximities, awareness of which might well be brought about better by shooting
with a zoom that with shooting with primes! Despite this, I do ultimately agree
that a zoom is a bad tool to learn on, because a beginner should always be
aware of what lens they are on. Although sometimes they should stay where they
are and swing the lens rather than move closer, awareness of exactly what focal
length they have on, rather than just tweaking the barrel, solidifies your
knowledge of focal lengths. But that knowledge consists of awareness of the
field of view and depth of field characteristics of different focal lengths and
not their ‘depth characteristics’, which as we now know, don’t belong to focal
lengths at all but to camera-subject distances, to proximities.
Just before I go, I need to bring you back to reality. In this article,
I have made distinctions stark and situations exaggerated, because things are
so muddled up that if you don’t point the conceptual lines out clearly then
they’re impossible to see. The reality is that you’re going to be framing for a
shot size most of the time, and if you’re experienced enough you’ll know the
proximity a certain focal length will force for a certain frame size. So a
competent DP asking for a 32 could just as well be thinking “That’s the lens
dictated by my choice of frame size and proximity” as “I know this is a lens
that’ll give me roughly the correct proximity for my frame size”. The important
thing is to remember the difference.
And finally, a disclaimer. By heavily intellectualizing all this, I don’t
mean to suggest that the only valid approach on set is an intellectual one. You
could have two DP/director teams; one working on a very intellectual level, another
on a very instinctive level; both could be capable of producing excellent work.
Whatever works for you. I find that the more I think through these things
logically off set, the more spontaneous and instinctive I can be on set while
still producing good results.
London, April 2013