Wikidata Embedding Project Launches As Open Source Alternative To Big Tech AI

0
79
Open Source Wikidata Embedding Project Launches To Democratise AI Knowledge
Open Source Wikidata Embedding Project Launches To Democratise AI Knowledge

Wikimedia Deutschland has opened the Wikidata Embedding Project to developers worldwide, providing a free and transparent alternative to Big Tech’s closed AI systems by grounding generative AI in verifiable, multilingual knowledge.

Wikimedia Deutschland has announced the global public release of the Wikidata Embedding Project, making Wikidata’s structured knowledge directly usable for generative AI. The new vector database is free and publicly accessible at: https://wd-vectordb.toolforge.org

Positioned as an open-source alternative to Big Tech’s proprietary AI infrastructures, the project allows developers to integrate Wikidata — the world’s largest open knowledge graph with 119 million entries — into large language models (LLMs) for more transparent, verifiable, and trustworthy applications. Developers and community members can also join a free webinar on October 9: https://www.wikidata.org/wiki/Event:Embedding_Project_Webinar

The project uses vector database technology for semantic search, enabling AI systems to retrieve concepts by meaning rather than keywords. It supports the Model Context Protocol (MCP), bridging LLMs with Wikidata’s structured data. The infrastructure is built on Jina.AI’s multilingual embedding model and DataStax’s Astra DB, with initial support for English, French, and Arabic, and more languages planned.

Applications include fact-checking, classification, named entity disambiguation, zero-shot classification, semantic visualisation of the knowledge graph, GraphRAG, and reference linking for source attribution. Beyond generative AI, the system extends into knowledge assistants, semantic exploration, research, and education.

Lydia Pintscher, Wikidata Portfolio Lead, said: “We’re building infrastructure that empowers everyone to develop generative AI using verifiable, freely accessible data — an essential step toward a digital future where technology serves the public good by default, not exception.”

Philippe Saadé, Wikidata AI Project Manager, added: “This Embedding Project launch shows that powerful AI doesn’t have to be controlled by a handful of companies — it can be open, collaborative, and built to serve everyone.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here