AUDIO
VIDEO FORENSIC ANALYST - Career Preparation
As an audio video forensic
analyst (AVFA), the majority
of your work will be to improve the perceived audio or visual clarity of a digital recording.
However, your work will also require testing authenticity, measurements, cross referencing
data, a high standard of ethics, and applying peer-reviewed methodology in preparation
of your expert testimony in the service of justice. You will be expected
to understand industry best practices, stay atop innovative
peer-reviewed technologies and methods, adapt existing knowledge into
unexpected circumstances, and understand the rules of evidence applicable
to
each case that you are serving.
Before you can authenticate or enhance a recording, you need to
understand how recorded data is retained. Let us consider the simpler
example of audio recordings, where sound is represented by values
of intensity (usually in the thousands) spaced apart by some fixed time
interval (also in the thousands) and usually in one (mono) or two
(stereo) channels. The resulting recording will likely be compressed to
retain a small fraction of the potential values.
It is known that the frequency range (measured in Hertz from about 20 Hz
to 20 kHZ) of most recording microphones exceeds the frequency range of
what we typically hear (approximately 100 Hz to 6 kHz), with human
speech being an even smaller subset. To reduce file size, digital audio
recorders discard much of this unused headroom prior to saving the recording as an
electronic file. The process of file size reduction is a
balancing act because removing to much of the high frequency range will
result in voice pitch distortion. If too few bits are used to represent each data
point, then the signal-to-noise ratio is reduced. NOTE: A 48 kHz
recording will capture the oscillating AC frequency range up to a
ceiling of 24 kHZ, half the recording's sample rate.
If the amplitude of the recording exceeds the allowed values (measured
in decibels) then some data will be clipped, causing distortion. The recording may
also suffer from reverberation (successively fading echoes). Individually,
each of these issues may not seem substantial, but collectively they can
make words unintelligible and inhibit audio enhancement. Many of these
issues become obvious during the analyst's initial process of critical
listening. To avoid introducing additional data defects, the analyst
must maintain a high quality lossless format throughout their entire
analysis and enhancement process.
As the analyst listens to the recording, they will gather clues regarding which processes and enhancement filters should be applied. For example, while
one can simply attenuate a notched parasite sound, adaptive time-frequency filters
are generally the better choice to suppress an unwanted dynamic sound.
A similar logic applies to video recordings, but here the data set is far
more complex. For example, most surveillance videos are composed of unique visual frames
comprised of two interlaced sequential moments in time (aka interlaced).
Prior to applying enhancement or performing measurements, these unique moments (fields) must be separated, thus doubling the
video's total frame count,
frame rate and aspect ratio (width to height). Following the old adage
of "garbage in equals garbage out", to achieve the greatest
enhancement clarity, the expert must start with the native, and likely proprietary, recording in order to preserve field integrity and minimize
the initial compression losses.
Modern videos are compressed in stages. One of the lossless stages
involves using tokens to define motion and pixel blocks, saved as
"p" and "b" frames, that reference specific full
video moments ("i" frames). Quantization tables are used to squeeze out seemingly imperceptible
visual data, but this process is lossy and will cause visual distortion,
especially if the
codec's compression is excessively applied. The resulting video is then saved in either an
open or proprietary format. Some free open-source third party programs (e.g. VideoCleaner)
can convert most proprietary video streams into lossless open formats,
but the original metadata may only be available when accessing the originating
file. There are also tools that can check for Steganography
(e.g. Openpuff).
Proprietary surveillance videos commonly use a variable frame rate and,
if direct extraction is not possible, the expert must use some method to recapture the visual contents of
the originating video. Since the capture process is at a fixed frame
rate, the frame rates will not match. If the capture rate is set too
low, then some of the originating frames will be missing. If the capture
rate is set too high, then duplicate frames will be acquired. depending
on the method of screen capture used, it is also possible for the expert
to unintentionally record blended frames on when the newer moment hasn't
finished refreshing the screen. For these reasons, the expert will want
to use the lowest capture frame rate that will insure no unique moments
are missed.
Audio and video recordings are typically viewed as amplitude relative to
time. This is perceived as volume with audio, and as brightness with
video. Another perspective is to view a recording as frequency relative
to time using a Fast Fourier Transform (FFT). A FFT domain filter
enables the analyst to more easily detect content tampering or to remove transitory
audio or visual defects that obscure details. FFT filters are extremely
effective and forensic software can automate the process to remove
judgment-based errors. Even so, the analyst must remain vigilant because
any filter that remedies one issue will have a negative impact on the
remaining data, even is in nearly imperceptible ways. This is a perfect example of the Locard principal, where each action that affects something will
also leave some trace behind. If the evidence being examined was
tampered with, that action will also leave some trace, which the analyst
can use to determine what actions occurred.
It is expected that anyone performing authentication testing on an
audio, video or image files will follow a standardized suite of tests
(e.g. the industry standardized MAT
form). Some authentication test are conclusive (e.g. proprietary,
structural metadata), while others (e.g. critical analysis, DCT) must be
weighted, in order to form a final opinion. For example, Video Error Level Analysis
(VELA) can draw attention to a cropped video or a removed object, but it can also produce a false
positive if you don't understand the contrast correlation in the results.
Another example is the existence of a subsonic audio impulse found below the frequency range of
the recording microphone. This impulse could have originated by someone
pausing the recorder, or it could have been caused by an electrical issue.
The analyst needs a strong understanding of each test and the possible
results, and this variance is why their summary opinion may range from a
reasonable to a definitive level of confidence, but will never be
expressed as a scientific certainty.
There are numerous available articles and classes to become skilled at
enhancement (e.g. here),
so I will not labor that point here. The actual results of audio or
video enhancement will depend upon the analyst's methods, applied
software tools, and the quality of their vision and hearing (both of
which should be routinely tested). The lack of industry enhancement
standardization stems from the extensive variance of capturing equipment
and recording devices being used. Those variances, and the
manufacturer's propensity to maintain their own proprietary compression
methods, prevent the development of a one-size-fits-all enhancement
guideline or solution.
As a forensic analyst, you are expected to understand the procedures of the
rules of evidence applicable to the jurisdiction of your case. You will be expected to maintain data integrity through the use of hash values and/or chain-of-custody control, and to keep detailed notes of your activity on each case. Although you will communicate and work at the direction of whoever hires you, you work solely for the evidence and in accordance with the highest ethics. If you calculate the hash value for each file, then everyone can use this value to validate evidentiary integrity regardless of how those files are shared from that point forward.
It is important for your report and testimony to detail all of your tests and results, including those that may be in conflict with each
other or the objectives of your engagement. You have significant discretion as to which steps are performed and exhibits
that you produce, but you are expected to fully disclose and support your choices. If you want your expert work and opinions to survive a Frye challenge, they must be your own
opinions. If you want your CV to survive a Daubert challenge, then it must support your qualifications to form
your opinions. It is your job to only draw opinions from within your area of expertise, and it
will be the presiding court that will determine if your opinions will be entered into the record.
Let’s say that you are hired to determine someone’s height and you only have a single camera view to work with.
Within that video, you must find a reference object of a definable size
(e.g. a doorway of known height that the subject walks through) and a
video still depicting when the subject walks through that doorway. Multiply the
height of that doorway by the pixel height of the person in your still,
and then divide that result by the pixel height of the doorway to
determine the actual height of the person in question. Using a reference
object to measure the size of people or things in the scene is called
Photogrammetry, just as measuring speed or acceleration is called
Videogrammetry.
The court will expect measurement work to be scientific, and for that you must
include a margin-of-error. For the above example, the
margin-of-error is ± half the real world representation of one pixel of
video resolution at where the person was located, after compensating for
the camera's viewing geometry of the target person. Thus, if you calculated the person as
5’10¼” with a
¼” margin-of-error, then you can be 68% (sigma 1) confident in that
person having a height between 5’10” and 5’10½”, or 99.7% (sigma 3, which is 3 standard deviations) that
the person's height is between
5’9½” and
5’11”.
Isolating unique data or artifacts can help the forensic analyst find
new information. For example, the ever changing silent electrical network frequency (ENF) generated by our nation’s power grid has been documented for decades, and thus isolating that ENF from a recording can be used to determine the approximate when and where of a recording’s origination, and then those details can be compared to known case facts and the file’s metadata to determine evidentiary authenticity. Even the interfering noise embedded within a file can be used to identify the specific equipment or handling that produced the recording.
ENF, Photogrammetry and Videogrammetry are just a few examples of how a
skilled forensic analyst can extract new facts from existing evidence in
a truly scientific method, and thus follow an established formulaic
process. By contrast, the processes of enhancement and identification
can not produce a measurable scientific error rate, and thus oversight
comes from the opinions methods and the expert's qualifications being
objectively reviewed by the courts and other experts.
Throughout your career you will come across manipulated recordings. Here
is software (mostly free) that can help you in making that determination
(assuming you are working in Windows®). Start by right clicking on the recording and use the Properties tab to
verify that the creation and modified date, and simple metadata match the
expected facts. Next use Mediainfo
and ExifTool to
dig deeper into the metadata, including the GPS coordinates and
originating hardware and software, to see if they match the expected facts. You can use
a hex editor to read the file
header's plain text (up to the first 4kB) to locate additional file
facts. An easy way to document your findings is to use the industry's
standardized MAT
form, which covers the recognized authentication tests.
Nearly all videos will be in an open format and may be the result of video
or image
manipulation software. If the video in question requires proprietary
playing software, then it is likely trustworthy because only that
manufacturer can create such a file. However, just because you required
the use of specialized playing software does not mean that the file is
in a proprietary format. As a general rule, an open format is one that
can be viewed using VLC
or by uploading the video to a social media site.
If you still question the authenticity of the video, then it is time to
dig deeper. Use VideoCleaner
to perform DCT and VELA
tests (details in this
guide), which are highly effective and detecting if content was added or
removed. A more expansive suite is available with VideoExpert.
For images, you can perform ELA
(explained here)
or this
far more extensive testing suite ($30). For deeper budgets, consider Authenticate.
To test audio files, or the audio track of a video, use GIMP
to test the spectral view or a trial copy of Forensics.
Your career may also include voice and facial matching, which is a
specialty upon itself. Such work relies heavily upon software,
which returns a confidence level ranging above 0% and below 100%.
Unlike authentication testing, voice/face matching results may differ
between software brands due to their proprietary algorithms.
As an audio video forensic analyst, you are tasked with using industry
accepted technology, understanding the limitations of that technology,
using peer-reviewed methodology, and understanding the strength of any opinions that can be formed. No enhancement
process can achieve the fantasy expectations depicted on television, which is why you
always want to apply a soft and realistic hand when attempting to enhance a recording. For example, while attempting to improve the clarity of subtle motion,
you should avoid excessive sharpening of high energy details, or adding
excessive brightness, as you may actually destroy the details that you
intended to improve. Even the simple task of opening a recorded file can alter its metadata, which is why you
must always work from an exact copy.
Never forget that your opinions may deeply affect someone's life and
your impartiality is critical. For this reason, you must avoid
forming a bias. If you are enhancing an audio file, do not read the
transcript or learn the expected wording until after your enhancement work is
complete. If asked to clarify a face, use some other known object as your working
reference. For example, when I was asked to enhance the head of George Zimmerman, I instead enhanced the badge of the
nearby officer so as not to enhance to a preconception. Most importantly, if
you are asked to support an indefensible position, consider walking away because
your integrity is your most valuable asset and once it is gone, so is
your credibility.
If you want to learn more about becoming a certified Audio Video Forensic Analyst,
consider additional reading (here,
here
and here),
on-line training,
and the accredited certification
(find a local testing site here)
reviewed here.
|