November 2015 – Naive Logic

Plans for the first week of december

This week I will be my 4th week at EuropePMC and I hope to achieve a huge amount.

I have two strands to the work I’ll be doing; 1) taking data from wikimedia community about EuropePMC and the papers contained within it. 2) Taking data from EuropePMC and try to make it more available to the wikimedia community.

I shall be further analysing the mwcites data; particularly trying to resolve the ID’s created therein. This will be done by revamping the crude epmclib utility I wrote earlier this month so that by default it caches data downloaded and then using it to resolve everything found. It may even be integrated into mwcites to do this automatically on analysing the dumps. I’ll have to have a think about it.

Hopefully I can then make a good judgement as to if the majority of the citations found are legitimate. If so I’ll then make some nice annotated plot.ly graphs of which citations were first cited when on wikipedia.

I’ll also be trying to get a working, continually updated sparql endpoint for librarybase working so that it can be quickly queried. If this is possible I should be able to finish work on the pywikibot script such that it can start putting all PMCIDs and PMIDs that appear on wikipedia (according to mwcites) into librarybase.

Finally after discussions on Friday with Joe Wass from crossref I hope to perhaps roll out a live feed of citations from wikipedia recent changes that contain PMIDs/PMCIDs.

Work already done

So far I have been working on taking data from wikimedia about EuropePMC and taking data held by EuropePMC and looked at ways to make it accessible to the wikimedia community.

For the former I have been using the excellent mwcites utility created by Aaron Halfaker (and some output kindly generated by him from it). I have been doing some simple analysis of the PMCids it found in english wikipedia dumps to make some rather nice graphs using plot.ly.

While trying to annotate this work I discovered that some fraction of the citations found by the utility were not correct (ID did not resolve as PMCIDS) which needs further investigation. It may be a very small number or it may be non-negligible.

I have also been writing a script using pywikibot to push data held by EuropePMC into a wikibase repository (the same software that runs wikidata)

A Classic First Post

As you’d imagine for any new blog I’m publishing that most stereotypical of posts: the first post.

Here I discuss what this blog is about and so on.

I’ll be talking here about a variety of things but principally about the work I am currently undertaking at the EBI where I am an intern/trainee in the EuropePMC group run by Jo McEntyre.

I’m principally looking at making links and collaborations between EuropePMC and the Wikimedia movement. Particularly looking at ways I can make some of the data held by EuropePMC accessible through wikidata but also in analysing the prevalence of papers cited in Wikipedia that are held by EuropePMC.