Portret Zerrin Yumak
Story

The beauty of creating a (virtual) human

How to create a believable virtual human? That’s the research focus of Zerrin Yumak, assistant professor at the Human-Centred Computing Group at Utrecht University and director of the Motion Capture and Virtual Reality Lab. Zerrin is a keynote speaker at the XR Day on 3 July. “I’m trying to generate all the nuances of non-verbal communication.”

“What truly fascinates me is how my 2-year-old is growing and developing. How fast babies learn and pick up language compared to AI algorithms is an eye-opener.”

“My research group tries to mimic human behaviour, and humans are so complex. That has fascinated me about this field from the beginning: you learn a lot about human social behaviour too. What also truly fascinates me right now is how my 2-year-old child is growing and developing. How fast babies learn and pick up language compared to AI algorithms is an eye-opener. Machines are advanced calculators, but cannot learn the way humans do.”

Autonomous machines

Zerrin was born in Turkey, where she studied industrial engineering, but the courses on computer science were the ones she most enjoyed. “At the time, I read Artificial Intelligence, A Modern Approach by Stuart Russel and Peter Norvig, the most famous book on AI. I became intrigued by the idea of autonomous machines.” 

Zerrin Yumak at work behind her computer

"I was interested in the visual aspect because it turns AI from something abstract into something embodied.”

“I wanted to learn more, and I like to dive deep into things. So I looked for PhD positions in this field. I thought going abroad would expand my horizons, and I found this nice lab at the University of Geneva. It was perfect for me because it combined computer graphics, animation and AI. I was interested in the visual aspect because it turns AI from something abstract into something embodied.”

Direct impact on people 

In Geneva, Zerrin also worked on an EU-funded project around humanoid, social robots that walked around a museum and interacted with visitors. “Very interdisciplinary work; it required engineering, computer science, as well as social sciences. I could see the direct impact on people of the algorithms I was developing. That’s how it all started.”

After completing her PhD in computer science, she worked consecutively as a postdoc at the Swiss Federal Institute of Technology in Lausanne and as a research fellow at Nanyang Technological University in Singapore. She moved to the Netherlands in the spring of 2015. Her international career gave her a taste of the differences in academic cultures. “A country’s research agenda is often defined by the values of that specific culture. There is a lot of emphasis on technology and engineering in the East and the US. In Europe, there is more focus on social and public values, and how technology touches the life of people.” 

“The Netherlands should not be just a consumer of AI technology. Development and incorporating public values into our AI products is the sustainable way to go”

“I don’t think one approach is better than the other. Balance is key. Developing AI technology without thoroughly thinking about the impact on users’ lives can be detrimental. Focusing only on social aspects and neglecting technological development is also not a sustainable path, because then you become just a consumer of the technology. Developing your own technology also means that you can incorporate your own values into the AI products.”

Social and emotional behaviours 

This intersection of humans and technology is precisely Zerrin’s research focus. Her current research in Utrecht is on believable virtual humans and social robots. She works on computational models of social and emotional behaviours and expressive character animation combining methods from computer graphics, artificial intelligence and human-computer interaction. 

Zerrin Yumak attaches sensors on a person dressed in black at the motion capture lab

At the Motion Capture and Virtual Reality Lab, actors have natural conversations and researchers capture their facial expressions and body movements with sensors.

“Of course, robots have some hardware limitations: they can’t move around freely, and not everybody can own one. So in Utrecht, I’m focusing more on 3D digital humans, because they are more easily deployable and thus reach a larger audience and make more impact than social robots. But they share similar algorithms.”

Pushing the boundaries

These 3D digital humans, similar to game characters, come to life through a VR headset. “We are really pushing the boundaries to create realistic-looking characters. The Motion Capture and Virtual Reality Lab at Utrecht University is unique in the Netherlands. Here, actors have natural conversations and we capture their facial expressions and body movements with sensors. We then use this data to regenerate the motions in the computer, using deep learning algorithms.”

“We can capture everything: facial expression, head, body, hand and finger movement, and the accompanying audio. When people are talking, they send all these little, non-verbal signals that affect our perception, and how we communicate. These movements are also related to the context and the mood and personality of a person. I am invested in capturing all the nuances of non-verbal communication.”

Portret Zerrin Yumak

"In recent years we have made rapid progress thanks to the developments in deep learning and generative AI algorithms.”

Holy grail

One of her research aims is generating digital motion by using only audio input: audio in, motion out. “That is the beauty of AI: it can learn this correlation between audio and how we move our lips, eyebrows, cheeks, head, and hands. In recent years we have made rapid progress thanks to the developments in deep learning and generative AI algorithms.”

“My group does multiple studies where we are analysing the emotional cues in audio and map them to the talking style of a person. But the holy grail in our field would be to eventually generate varying facial expressions based on a textual description. Not only categorical emotions such as happy and sad, but much richer descriptions of emotions that are for instance extracted from novels: how can we map these to facial expressions?” 

“You could revive a historical figure like Einstein and ask him questions. This can be very useful for experiential and blended learning.”

“There is a lot to discover. The complex question of facial expressions alone takes a PhD. Another topic I am focusing on is the connection between hand gestures and the semantic information in text. For example, when describing a route, you might point left or right. These kinds of rich, meaningful gestures are not generated well with the current AI algorithms. A final focus area is making the faces and clothes of characters more realistic and how to model interactions between humans and between humans and objects. Each of these topics are really big research questions.” 

Lack of 3D data

One of the important challenges in bringing this field further is the lack of data, Zerrin explains. “For 2D generative AI applications such as Midjourney and ChatGPT, there is a vast amount of images and text data. When it comes to 3D, we don’t have this data yet. Virtual reality environments are still niche. We generate data with motion capture at the moment, but that’s expensive, and you need a special lab.” 

“I think the public needs to be trained on how AI works: in the end, it’s just algorithms and data.”

Another challenge is that deep learning algorithms are mostly applied in the 2D domain and there is not enough benchmarking yet for 3D applications. “These are black-box algorithms, you don’t precisely know what is going on; why is this particular cheek or eyebrow moving? It requires a lot of debugging, trial and error, and training time to discover the parameters.”

Virtual teachers

The application areas for this technology in daily life are very broad, according to Zerrin. “In education, they can be used as a kind of peer or teacher who can think along or answer questions, enabling students to learn at their own pace. You can also use digital humans to populate simulations, a historical event for example, or take students to space. Or revive historical figures such as Einstein and ask them questions. These things can be very useful for experiential and blended learning.”

Zerrin Yumak in the motion capture lab and an actor with a headset with a mobile phone attached to it on which he can see his facial expression

Cars are dangerous too

Zerrin often gets asked about the risks of the technology, and there is a lot of discussion on the ethical dimensions, such as privacy, security, and the danger of deep fakes. “Our research always passes through the university’s ethical commission. I personally don’t think that there is much to worry about. I see AI as just a tool. The way people use it should be checked though - data and algorithms should not be biased, and you have to make it explicit that it's not a real human that you see.” 

“I think the public needs to be educated on how AI works. Those dog robots by Boston Dynamics may look a bit scary, but in the end, even they are just an algorithm. Misuse is possible, but the same goes for many things. Cars are dangerous too. It’s part of human development, and we need to get it right.”

Text: Josje Spinhoven
Photos: Jelmer de Haas

Video: Making of the Motion Capture and Virtual Reality Lab

More personal stories about the impact of technology

Related topics: