"I had to spend 20 years explaining what I was doing"
AI lets scientists work magic with language
"I can't remember a time when I wasn't interested in language," says linguist Antal van den Bosch. "It is an instrument with wondrous properties. When I learnt a new language in high school, it gave me a kind of key to another world." Nowadays, he finds the magic in artificial intelligence.
You can't imagine it more topical: as a professor at Utrecht University, Antal van den Bosch specialises in large language models such as ChatGPT. At the same time, he leads the Social Sciences and Humanities domain at NWO.
So we may speak of a successful career. But not without some headwinds, Van den Bosch argues. "As a southerner, I was confronted with the second-class role unconsciously assigned to non-standard variants of Dutch. Language can connect, but it can also divide."
Extreme variant
From his interest in language, studying Language and Literature Studies was a logical choice for Van den Bosch. As a student in Tilburg, he first came across artificial intelligence. "In AI, there has always been a lot of focus on language. One of the oldest ambitions is the Turing Test: that, as a human, you no longer notice that you are communicating with a computer."
"At the time I was studying, there was a sudden breakthrough in AI. Machine learning made unprecedented things possible. We then started applying that to language in Tilburg, with an ever-growing research group. The most extreme variant of this kind of machine learning is a large language model that is just fed gigantic amounts of text and then figures out what patterns are in it, so that it can then generate text."
"That model has always been my goal. I've actually been working on it since the beginning of this millennium, when I started my own group and got my own grants."
Education as a firm foundation
That seems like a smart choice, given the success of large language models like ChatGPT. "Well, for 20 years I had to explain what I was doing. Many people thought it was a very extreme approach. Grants dried up regularly, but the advantage of academia is that education programmes run for a very long time. As a result, there is a steady base of people we have managed to retain for science. Those have been teaching AI for twenty-five years or more."
"You could end up in a doomsday scenario where more and more science is artificially generated"
What also helps is the availability of adequate computing facilities. Van den Bosch himself works with the national supercomputer Snellius at SURF for training and testing large language models.
One hundred percent transparent
Anno 2024, SURF wants to use this national potential to develop its own GPT-NL. Does Van den Bosch see something in that? "Yes, definitely. There is currently a kind of unorganised race of language models going on. Big tech companies are doing their best to produce better multilingual models, but at the same time there are also all kinds of non-profits involved, clever tinkerers at government institutions, and of course scientists."
"What you can do as SURF, with TNO, the Netherlands Forensic Institute and a lot of input from academia, is to be one hundred per cent transparent and critical from the outset in the composition of the training set. That is then a really unique feature." After all, one of the biggest problems of AI is ambiguity about the material used to train systems.
"But in a few years, using that same approach, you will have to develop the next version of GPT-NL. Otherwise, you will be overtaken on the hard shoulder right by something more efficient and therefore faster. You see it in every digital infrastructure, but with AI the pace is extremely fast."
Recognising quality
Meanwhile, Van den Bosch also leads other AI projects, such as BETTER-MODS. "The news site Nu.nl has a sizeable team of moderators that for years has had its hands full with the timely removal of toxic comments under articles. At the same time, they also have to select a few good, constructive contributions from the sometimes thousand comments, which are then shown first. BETTER-MODS should help with this."
Recognising quality is no easy task for an AI tool, but it succeeded. "Not 100 per cent, but tests show that we can mimic the behaviour of human moderators reasonably well."
How it works? "Our best system pays particular attention to the form aspects of responses. Good contributions tend to be long because they contain reasoning or elaborate thought. The system also looks - as human moderators do - at the senders. Indeed, an online user group has emerged within which some people summarise their thoughts in such a way that they fit well with journalistic content. Those users are recognised in their roles by moderators and fellow users. You can see this on all kinds of platforms, because even in the online world it remains about people: their familiarity and connection. So it is important for an AI tool to recognise this too."
"As a scientist, you are mostly dealing with failure"
Doomsday scenario
You can leave a lot to AI, but for Van den Bosch, there are limits. These lie, for example, in reviewing papers for conferences. "Researchers found that many of those reviews in 2023 seemed to be written by computers. Reviewers apparently ran out of time and enlisted the help of AI. But this was striking: suddenly, many reviews were very positive and their language was typical of Chat-GPT."
It worries him. "Highly specialised language models are now appearing that claim to be very good in the legal domain, for example. What happens when papers are generated by that AI, while its reviewers also use it? Then you end up in a doomsday scenario of circular processes, where more and more science is artificially generated."
From his position at NWO, which is responsible for a large part of research funding in the Netherlands, Van den Bosch thinks it is important to reflect on this "We cannot and do not want to stop researchers from using AI, even when they write an application, as long as they are transparent about it. But in the assessment process, we really don't want it."
Societal challenges
What else does Van den Bosch consider important for NWO? "Interdisciplinary research! Within 'my' domain and with other domains. That is perhaps my most pronounced ambition. Fortunately, I am not the only one with that desire."
In the past, ministries and companies found it difficult to find their way to the social sciences and humanities, but I think corona has awakened something. After all, the pandemic started as a medical problem, but very quickly questions arose that could be better answered by many other disciplines, such as language and communication sciences, psychology, sociology, organizational sciences, economics, law, history, ethics, philosophy ... In fact, you need the social sciences and humanities for every major social challenge. Because ultimately science is also something by people for people.”
Magic formula
And yet... if you ask Van den Bosch what fascinates him most about his work, the technician in him awakens. “That an IT idea really works if you program it properly. Like a magic formula that turns out to work. Because in practice, as a scientist you are usually concerned with failure. This applies equally to the people at OpenAI: they tinkered for years before they came up with the technical idea that made ChatGPT possible on the hundredth attempt. When it finally does succeed – that's really great.”
Antal van den Bosch (1969)
- 1997-2011: successively postdoc, university (senior) lecturer and professor Tilburg University
- 2011-2019: Professor at Radboud University
- 2012: Appointed member of the KNAW
- 2017-2022: director Meertens Institute
- 2020: Appointed extraordinary professor of Language and Artificial Intelligence at the University of Amsterdam
- 2022: appointed professor at Utrecht University
Text: Aad van de Wijngaart
Photos: Jelmer de Haas