Google again hit the headlines by announcing a new search engine, allowing you to search for datasets. The new platform is based on the open standard schema.org. Anyone who publishes data can describe their datasets through these open standards.
Researchers in the open data and science communities will clearly benefit from this platform. Google has also developed guidelines for dataset providers. More about this could be read by an example of a data search in Google’s announcement.
Open data repositories used
The Dataset Search platform, launched on Wednesday, searches through millions of open data repositories available on the web for desired datasets. It scans through publisher sites, digital libraries, and on author’s personal web pages, among other places. But it relies on dataset publishers to correctly label their datasets with the appropriate information, or metadata tags, as their otherwise known.
To create Dataset search, Google developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc.
Open standard based approach
The search platform then collects and link this information, analyse where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset. The approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way. Also the dataset providers, large and small, to adopt this common standard so that all datasets are part of this robust ecosystem.
The Google Dataset Search beta website (available in multiple languages) can be found here.
You just simply need to enter what you are looking for and Google will help guide you to the published dataset on the repository provider’s site.