Army researchers have developed an artificial intelligence and machine learning technique that produces a visible face image from a thermal image of a person’s face captured in low-light or nighttime conditions. This development could lead to enhanced real-time biometrics and post-mission forensic analysis for covert nighttime operations.
[dropcap style=”font-size: 60px; color: #9b9b9b;”]T[/dropcap]hermal cameras like FLIR, or Forward Looking Infrared, sensors are actively deployed on aerial and ground vehicles, in watch towers and at check points for surveillance purposes. More recently, thermal cameras are becoming available for use as body-worn cameras. The ability to perform automatic face recognition at nighttime using such thermal cameras is beneficial for informing a Soldier that an individual is someone of interest, like someone who may be on a watch list.
“When using thermal cameras to capture facial imagery, the main challenge is that the captured thermal image must be matched against a watch list or gallery that only contains conventional visible imagery from known persons of interest.”
The motivations for this technology – developed by Drs. Benjamin S. Riggan, Nathaniel J. Short and Shuowen “Sean” Hu, from the U.S. Army Research Laboratory – are to enhance both automatic and human-matching capabilities.
“This technology enables matching between thermal face images and existing biometric face databases/watch lists that only contain visible face imagery,” said Riggan, a research scientist. “The technology provides a way for humans to visually compare visible and thermal facial imagery through thermal-to-visible face synthesis.”
He said under nighttime and low-light conditions, there is insufficient light for a conventional camera to capture facial imagery for recognition without active illumination such as a flash or spotlight, which would give away the position of such surveillance cameras; however, thermal cameras that capture the heat signature naturally emanating from living skin tissue are ideal for such conditions.
“When using thermal cameras to capture facial imagery, the main challenge is that the captured thermal image must be matched against a watch list or gallery that only contains conventional visible imagery from known persons of interest,” Riggan said. “Therefore, the problem becomes what is referred to as cross-spectrum, or heterogeneous, face recognition. In this case, facial probe imagery acquired in one modality is matched against a gallery database acquired using a different imaging modality.”
This approach leverages advanced domain adaptation techniques based on deep neural networks. The fundamental approach is composed of two key parts: a non-linear regression model that maps a given thermal image into a corresponding visible latent representation and an optimization problem that projects the latent projection back into the image space.
Details of this work were presented in March in a technical paper “Thermal to Visible Synthesis of Face Images using Multiple Regions” at the IEEE Winter Conference on Applications of Computer Vision, or WACV, in Lake Tahoe, Nevada, which is a technical conference comprised of scholars and scientists from academia, industry and government.
At the conference, Army researchers demonstrated that combining global information, such as the features from the across the entire face, and local information, such as features from discriminative fiducial regions, for example, eyes, nose and mouth, enhanced the discriminability of the synthesized imagery. They showed how the thermal-to-visible mapped representations from both global and local regions in the thermal face signature could be used in conjunction to synthesize a refined visible face image.
The optimization problem for synthesizing an image attempts to jointly preserve the shape of the entire face and appearance of the local fiducial details. Using the synthesized thermal-to-visible imagery and existing visible gallery imagery, they performed face verification experiments using a common open source deep neural network architecture for face recognition. The architecture used is explicitly designed for visible-based face recognition. The most surprising result is that their approach achieved better verification performance than a generative adversarial network-based approach, which previously showed photo-realistic properties.
Riggan attributes this result to the fact the game theoretic objective for GANs immediately seeks to generate imagery that is sufficiently similar in dynamic range and photo-like appearance to the training imagery, while sometimes neglecting to preserve identifying characteristics, he said. The approach developed by ARL preserves identity information to enhance discriminability, for example, increased recognition accuracy for both automatic face recognition algorithms and human adjudication.