The idea of this site is to collect various file formats used in materials science and chemistry, describe them with metadata, and provide links to software projects that can parse them.
This registry uses the Datatractor Schemas ( repository) and enables programmatic use and discovery of extractors, as shown in our reference implementation.
By providing this data in a web API, we hope that users can discover new extractors more easily, and that metadata standards will be developed describing the output of extractors, enabling schemas to proliferate throughout the field.
The proof-of-concept of this project was devised as part of the MaRDA Metadata Extractors Working Group, and is continued as the Datatractor project. Further discussions and instructions for how to get involved can be found at the Datatractor discussions page.
You can find out more about the project at the preprint: "Datatractor: Metadata, automation, and registries for extractor interoperability in the chemical and materials sciences", Matthew L. Evans, Gian-Marco Rignanese, David Elbert & Peter Kraus, arXiv:2410.18839 (2024).
The registry and the data within it is available under the terms of the MIT license.
Anyone can contribute file type and extractor entries to this registry by following the instructions in the contributing guidelines on GitHub.
After submitting a pull request, your contributions will be reviewed and validated. The entries are listed in this registry once your pull request is merged and deployed.