The Semantic Forensics (SemaFor) program will develop technologies to automatically detect, attribute, and characterize falsified multi-modal media assets (text, audio, image, video) to defend against large-scale, automated disinformation attacks.
Statistical detection techniques have been
credit:
successful, but media generation and manipulation technology is advancing rapidly.
Purely statistical detection methods are quickly becoming insufficient for detecting falsified media assets.
Detection techniques that rely on statistical fingerprints can often be fooled with limited additional resources (algorithm development, data, or compute).
However, existing automated media generation and manipulation algorithms are heavily reliant on purely data driven approaches and are prone to making semantic errors.
For example, GAN-generated faces may have semantic inconsistencies such as mismatched earrings.
These semantic failures provide an opportunity for defenders to gain an asymmetric advantage.
A comprehensive suite of semantic inconsistency detectors would dramatically increase the burden on media falsifiers, requiring the creators of falsified media to get every semantic detail correct, while defenders only need to find one, or a very few, inconsistencies.
SemaFor seeks to develop innovative semantic technologies for analyzing media.
Semantic detection algorithms will determine if media is generated or manipulated.
Attribution algorithms will infer if media originates from a particular organization or individual.
Characterization algorithms will reason about whether media was generated or manipulated for malicious purposes.
These SemaFor technologies will help identify, deter, and understand adversary disinformation campaigns.