Cinematic sound aesthetics

What you hear while watching a movie is the result of specific choices made by those who work on film production. The way in which these choices are generally made is the result of a long tradition that over the years has led to the formation of an authentic sound language.
However, we work on almost unconsciously known ground because often, whoever ends up working in the film industry, is first of all a passionate spectator. Being a sound-conscious spectator means having a certain expectation of what you will hear in a given scene.
Expectations towards sound often derive from solutions that recur in the history of cinema. This is due to the fact that they have proved successful in obtaining a full synesthesia between viewing and listening, and thus a functional way to arouse certain emotions that have become the standard endowment in the narrative technique of the sound medium.

The formation of sound language

A very personal reflection of mine leads me to believe that the creation of standard models was also strongly influenced by the technical limitations that existed in the past.
Today, the technology of sound processing and reproduction can achieve extraordinary things. It is possible to work on very sophisticated levels of detail. To simplify things, one could imagine a container in which you can insert the sound information of a scene: the larger the container, the more you can reproduce – in a clear and distinguishable way – many simultaneous sound events.
If today the container is very large, in the past this was not so; this means that the pioneers of sound cinema were constantly faced with restrictive choices, forced to choose whether to favor listening to the dialogues, sound effects, environments or music so as not to “crumple” the final result. It was, therefore, necessary to choose which of the elements mentioned above would support the narrative. Working in abstraction probably favored the creation of the grammar of sound precisely because it emphasized the expressive and communicative power in the union of certain sound solutions with certain visual solutions within the cinematographic story.

Sound in the cinematic genre

In cinema there are different genres and styles within which to classify different works: with the term “noir”, for example, nocturnal scenarios come automatically to mind in which silhouetted characters move immersed in fog.
If in genre cinema the image is always strongly characterized, the sound also is not far behind. This is because the image is the source from which the sound comes, the event that generates it.
Among the cinematographic media, the primacy of vision is undeniable, but we must not think of sound as a trivial ancillary element; only when in a sequence is it possible to combine the visual perception with the sound perception in an organic way can one create cinematic magic. I’m not just referring to simple synchronization or simple audio contextualization – which means hearing what you see happening on the screen – but rather it’s about creating the correct sound texture based on the specific aesthetic qualities of what I see on the screen.
If you analyze the photographic palette of a film you can see what I mean more clearly.

Looking at this photographic palette, it is easy to imagine a sound fluidity, in which each element is perfectly integrated into the space that the image shows without the aggravation of particular details.

Compared to the above, an opposite case in comparison to what is shown above, follows. Here, the palette is much more fragmented and less nuanced than the previous one.

To make a comparison, often in Tarantino’s movies, the colors in the photography stand out in a lively way and this choice is also reflected in the sound, which in fact tends to be more characterized and distinct. The details, even the slightest ones, explode on the screen, and are sometimes exaggerated to an unbelievable point, reinforcing the pop aesthetic that the image gives.

The audio palette

Can we talk about the audio palette by borrowing the terminology used above?
In the example below, there are two sound elements, both reproducing a gunshot. The first is a stark recording without any kind of intervention; the second is the result of a stratification of five distinct sound events.

What the viewer hears is always a shot, but the communicative characteristic of the two sounds is completely opposite. The first one is probably more suitable for a naturalistic context and an authorial narration, while the second one would be better framed in the action genre or in movies with a more “muscular” aesthetic caliber.

We must imagine the audio palette as the possibility of tending, more or less, towards one or the other extreme of what we heard above, trying to find the sound color gradient suitable for the image in front of you. We must consider all the possible factors, the narrative moment, the rhythm of editing, the photographic color, the characters’ appearance, the setting; in short, all the general characteristics of the film must be taken into account to avoid using sounds that generate involuntary cognitive dissonances and do not make sound adhere to the image.

Of course, the example of the shot is an extreme simplification, but we must understand the basic approach and imagine applying it with a certain coherence to every single sound event throughout the film. A work that, as the following contribution shows, can reach great levels of complexity.

The diegetic and extra-diegetic sound

Other tools used for the overall aesthetic definition of the sound of a film can be classified within the work of dosing the diegetic or extra-diegetic sound. (ed, I borrow these terms even though I am aware that all semantic fundamentalists will object to my use of them here.) Reinforcing certain gestures with sounds that would not be heard in a real context is a very common practice in cinematic sound. The most popular example is that of punches: the impact, which like the gunshot can be more or less elaborate, is often preceded by a movement of air (a whoosh) as if the moving arm were huge. Even such a solution can be subliminal or explicitly exaggerated to become a distinctive trait, like in martial arts movies or causing a comical effect when completely out of context.

The reason why extra-diegetic solutions are used (i.e. sounds that are not necessarily caused by the sound events on stage) is to be found in the possibility of making the action represented on the screen in someway more tangible; accompanying the punch with the whoosh places the spectator’s ear next to the arm of the character who is throwing it,  it makes the effort physically perceptible; exaggerating its impact makes the viewer an active part of the staging, almost as if the fist represented on the screen had hit him, causing a rumble in his head.

Objective and subjective dimensions

Therefore, we can think of placing the viewer in a subjective dimension, dragging him into the protagonist’s role and thus shifting the aural perspective of the representation into a position that is not necessarily the objective representation of the scene provided by the camera. A common example is when we see two characters talking at the end of an avenue yet we hear their voices in the foreground. The perspective and the real distance should not allow us to hear the words, but this solution is very functional because the audience, when well-identified with the story, participates perceptively just as if he or she were one of the characters.
Moving from an objective to a subjective level opens numerous narrative choices, allowing the creation of simple solutions such as the one seen above, until entering into sound suggestions that describe altered states or madness.

I tried to put this kind of approach into practice in the movie Veloce come il vento by Matteo Rovere; the movie is set in the world of racing and the idea was to create an immersive experience for the viewer by placing it inside the cockpit or within an inch of the racing car whizzing by on the track. To arrive at solutions that stay true to this initial concept, we made extensive use of extra-diegetic sounds. Much like the example of the shot seen above, the recording of a racing engine by itself did not have the expressive strength to reach the subjective plane. In fact, many ad hoc elements were created to restore the physiological energy that can transmit such experiences, sometimes artificially exaggerating the vibrations of the structure with the use of low frequencies or characterizing the air and air movements with synthetic sounds such as dust that could be heard as a race car passes by a few centimeters from us.

Creating sounds that have never been heard

Naturally, what has just been reported is the result of an artistic and technical process which at base is a great work of the imagination. I’ve never been a few inches from a car launched at full speed on the track; and the only driving experience I’m used to is in my old minivan that at 120 km/ h on the autostrada starts to vibrate as if there was an earthquake. If I had the time and the opportunity, I would have gladly taken a ride on the track. Living certain experiences first-hand is very helpful when it comes to reproducing sound events. In this case, however, a simple chat with a driver generates many ideas and suggestions useful for this work.
Sometimes it’s practically impossible to have a first-hand experience. We know well that cinema does not have as its central theme the reality that surrounds us; on the contrary, it often gives us experiences set in fantasy worlds populated by imaginary creatures, or takes us on journeys into the distant past (or future) where no contemporary man has ever been.
This situation has an advantage. Not knowing the sounds coming from these kinds of environments provides the opportunity – for those who create them – to be the first to propose them. The audience will tend to accept what they hear more easily by the simple fact that they have never heard it before. This advantage has generated iconic sounds in the history of cinema. Today, many are convinced that a laser gun would sound the same as the one depicted in Star Wars. I am personally convinced that the cry of the T-Rex is the one created by Gary Rydstrom for Jurassic Park.

I have personally faced similar situations on two occasions. “Il Primo Re” by Matteo Rovere and La terra dei figli by Claudio Cupellini based on the graphic novel by Gipi.

While there are no monsters or alien creatures, both films feature an unknown setting. The first is set in the Iron Age, the second in a hypothetical future in which life on earth is almost at the end.

To deal with unknown things, we need to research and maintain a logic, create conceptual tracks to follow, and thus avoid “beating around the bush” aimlessly. Working on the sound itself is an activity that could go on for hours without ever reaching a satisfactory conclusion. A good solution is to rely on the most disparate sources. In the cases mentioned, I have found excellent insights in academic research into noise pollution. Investigating how to make the sound of a character who has magnetic superpowers, for example, you can find excellent suggestions in physics research that study’s how electromagnetic fields affect sound waves.

In short, it is necessary to cling to something when you have to create things that are not really common.

However, we must never forget that the actual effectiveness of the sound of the film is based on pure and simple perception. Study and research must be a stimulus to creation and experimentation, because if you blindly follow one theory, you run the risk of intellectualizing the sound and frustrating the force that characterizes it in transmitting and accompanying the impressions and emotions of the cinematic.

During the creative process upon sound, a useful exercise could be to listen to the work done over a period of time or to let people who are completely unrelated to the creative process listen to those sounds in order to get as close as possible to the immediate impression that they give us.


Follow the dynamic narrative

The ones shown so far are just a few tools that define the sound aesthetics of a movie. Moving between the extremities of the sound palette or making more or less use of extra-diegetic interventions to increase the subjective experience should not be thought of as

rigid technical models to be strictly pursued. In fact, once you find the color of the movie it must not be used exclusively, and you absolutely must not give up on all the other tones of sound. Cinema, as the word itself says, is a dynamic context and varying, even slightly, the elements put into play during the narration can be a solution with very interesting dramatic results.


Studying the script, knowing the narrative structure underlying the story, is essential. In Matteo Garrone’s Dogman, for example, the sound tries to follow the story’s narrative pattern. In the movie, “the context” takes on a fundamental role so as to become a real character. The non-place where the story of Marcello, the protagonist, is set, seems to be a dystopian abstraction, a frame that contains a great sense of humanity and a ferocious and inhuman violence. Like all the other characters, he needed a voice of his own.

As the story progresses, Marcello enters a spiral of increasing violence. The sound setting, parallel to the protagonist, is reassuring at first, then gradually begins to fade, as if the context – in which Marcello was perfectly integrated – abandons him, no longer speaks to him, until arriving at an unreal silence at the end, that is the most dramatic moment of the film. As soon as the screen goes black and you close your sight on this story, the sound returns during the credits, becoming a naturalistic element, indifferent to man’s feelings. It describes a place that continues to live regardless of the drama just shown, in an attempt to emotionally shake the viewer in the face of the absurdity of the tragedy just consummated.

In the artistic process, technically calibrating what has been said so far about the parallel between sound and story, is often a subliminal operation, something that should not be “flashy”.

I conclude with a brief final reflection. I think working on sound is both fascinating and frustrating: when it is done well it proceeds invisibly and the audience enjoys it without realizing it. But, if ineffective, it no longer flows in a fluid way into our ears but stands out as an annoying element. Thus, our work involves hours and hours of reflections, attempts and failures to ensure that no one notices the huge effort carried out for months by the dozens of people involved in the success of the sound of a movie.

 Mirko Perri
Giulio Previ

Thanks to Francesca Bianco for the translation