Importing EPMC data to Wikidata (or similar)

Aiming to import metadata about all articles that have a PMCID appearing in wikipedia to librarybase.wmflabs.org. This is a wikibase installation which is the same software that wikidata.org runs.

Metadata are obtained from the EuropePMC RESTFUL api using a custom python library written by me. It is available at https://github.com/tarrow/epmclib.

This is then pushed to the wikibase installation using a custom script (also written by me) utilizing a python library for interacting with wikibase installations. My script(s) will be available at https://github.com/tarrow/librarybase-pwb. The external library is called pywikibot and is available at https://github.com/wikimedia/pywikibot-core.

The script I use also makes calls to a SPARQL endpoint which keeps a triple store of the data available in librarybase.wmflabs.org and is available at sparql.librarybase.wmflabs.org. Follow the 404 to get to the splash page. Keeping this data up to date is currently a problem because the updater periodically fails and a dump of triples has to be manually side loaded to get past the failing point. The code for the sparql end point is here: https://github.com/wikimedia/wikidata-query-rdf. Talk to me about how to set up if you’re interested. I’m still waiting to get the updater stable before I write up the documentation.

Data can’t really be pushed until the endpoint is functional so we can find what articles already exist and also link authors to multiple articles when the have an ORCID. I’m waiting to hear back from the maintainer of wikidata-query-rdf about how to make the updater not break.