Artificial intelligence is a hot topic that opens doors to new and exciting possibilities. By harnessing the power of machines and algorithms, many time-consuming tasks can be completed quickly and efficiently. Incorporating AI solutions in your workflow can help you make good progress in numerous areas. But how applicable is AI to archival materials? Meemoo, the Flemish Institute for Archives, along with its cultural and government partners, set out to explore this question.
Source reference: face recognition used on a video from LIBERAS
Over the past century, our lives have been captured in moving images for the first time. In Flanders, we have accumulated a wealth of audiovisual archival material through filming and recording. Managing and preserving these digital files for the future, as well as preparing them for reuse, is the essence of digital archiving. Digital repositories now contain a mass of digitised or born-digital materials. But a lack of proper descriptions often makes this valuable content difficult to search through.
How can artificial intelligence assist in managing such an archive? With the goal of making the wealth of Flemish audiovisual archives more accessible in an efficient and reliable way, meemoo took on the challenge of enriching the metadata for videos and audio clips from 125 cultural and government organisations.
There are numerous professional archivists and domain experts who perform important work by adding descriptions with excellent knowledge of the subject matter. These descriptions include details such as titles, people in the footage, genre, dates, brief summaries, and much more. Metadata enrichment – and so also the path to accessible archives – is an ongoing process that applies to all kinds of materials.
But what are the benefits of artificial intelligence? Well, manually adding descriptions is incredibly time-consuming, and there’s not always enough time available, or it's too valuable to spend on such tasks. Automating some of these tasks can therefore mean that archivists have a helpful assistant. Artificial intelligence may not solve every problem, but it can be a good way to quickly add a large amount of uniform data. And enriching metadata across organisations ensures uniformity.
Meemoo is applying three different AI approaches in the GIVE metadata project: face detection and recognition, speech recognition, and entity recognition. These techniques can provide solutions to questions such as:
Whenever possible, the generated names of individuals and recognised entities are linked to authentic sources, such as Wikidata. These authorities act as a database filled with reliable information that:
enriches the new metadata with additional information;
helps to avoid confusion – after all, for example, Eddy Wally is also known as Eduard Van De Walle;
ensures uniformity across archives.
Applying these techniques to archival content is slightly more complex than described above, however.
It is clear that AI offers many possibilities, but there are also concerns. You can’t simply unleash Artificial intelligence on any old content without careful consideration first. Face recognition, for example, involves sensitive biometric data and can potentially lead to discrimination. It is therefore crucial that the development and use of AI systems comply with existing regulatory frameworks (such as the GDPR), which is why meemoo is ensuring that everything is done as correctly and safely as possible before, during and after the project.
In the GIVE metadata project, we paid attention to the protection of personal data (GDPR), for example by only identifying faces of public figures and conducting a Data Protection Impact Assessment (DPIA). All the servers used are located within the European Union, and all AI-generated metadata is labelled to ensure transparency and avoid confusion in case of any incorrect descriptions.
We discussed the ethical aspects extensively under the guidance of the Knowledge Centre Data & Society. We also held workshops to consider the processes and tools in detail with all stakeholders – including individuals who could be recognised in videos, technicians implementing the processes, and archivists wanting to use the data for access. This approach ensured that every voice was heard, resulting in an approach that everyone could agree on.
Research has also shown that it is crucial to train facial detection and recognition models on a diverse dataset to minimise bias or prejudices related to gender, age and skin colour. To ensure that the system used in the GIVE project could handle this as effectively as possible, the existing open source models were checked using a sample.
Parliamentary debates in the Flemish Parliament, recordings of lectures by Jan Hoet at SMAK (Municipal Museum of Contemporary Art), documentaries preserved by Letterenhuis (House of Literature), and much, much more. Flanders is home to a wealth of audiovisual materials. And, with the help of artificial intelligence, a total of 126 cultural and government organisations are taking a big step towards having well-described and accessible archival content. Here’s an overview:
3.3 million detected individuals, 500 million transcribed words, and 6.5 million entities. Impressive results, but it would be a shame to let the story end here. Due to a tight schedule, there was only room for the audiovisual collections of cultural and governmental organisations in this project. But by the end of 2023, it will be the turn of the VRT and regional broadcaster archives. Meemoo also receives new materials in its archival system almost every day, which is why the established systems will remain usable even after the GIVE project ends, thereby directly enriching each object.
This project was made possible with support from the European Regional Development Fund and is part of the Flemish Government’s Resilience Recovery Plan.