Over the past few years it’s been proven that many things once only humanly possible can be achieved with artificial intelligence. After the announcement that a Search and Rescue AI system could generate a prediction of its surroundings, its clear that improvements in deep learning and neural networks will allow AI to compute many things which for the time being can only be done by the human brain.
Speech2Text is one of such AI. The system – a neural network computer – now has the ability to generate a rendition of what it believes a persons face would look like based solely off of their voice.
The system – which only requires a short piece of audio to base its model on – is said to look for certain markers in the speakers voice which can help it pinpoint what the person may look like. The AI’s predictions were based off of youtube training videos, allowing the system to look at over 100,000 faces and their respective voices.
The results – published to ArVix – are a somewhat generalised depiction of the speaker, with their age range, ethnicity and gender predicted by the system. The AI still appears to get confused when presented with particular problems – for example when the speaker switches languages, the ethnicity of the AI’s rendition changes as well. The pitch of the speakers voice also appears to dictate the gender.
The scientists behind the project have noted that the generalisation of the systems prediction means that it can’t produce truly accurate faces of individuals – yet.
Despite these problems and initial limitations, the system so far has shown promising results and presented an interesting area for artificial intelligence to progress in as the technology behind it improves in accuracy.