query-9779963127e93cba563a2c9db5e37c78
Let’s save that query. """ } FILTER(CONTAINS(?commonName, "í") && CONTAINS(?commonName, " ")). ?item wdt:P1843 ?commonName. { WHERE SELECT ?item """ = queryWe also now need the page generators package, so let’s import it: ;pg as pagegenerators import pywikibot fromAnd get a SPARQL browser for our query: )site ,query(WikidataSPARQLPageGenerator.pg = generatorAnd let’s first just browse the query results: loop over the iterator, print the taxon common name. )text.()getTarget.claim(print ]:"P1843"]["claims"()[get.item in claim for :generator in item forThe results look good, they all look like labels that we want to fix. So let’s try that. But first, we need to recreate the generator, because it’s now exhausted and we can’t iterate over it a second time )site ,query(WikidataSPARQLPageGenerator.pg = generatorAnd then, edit the items as before – but this time in the loop! break ))text.target ,text ,item(format."{} to {} to change {}"Edited (print )"Fix broken taxon common name: see https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Darwin%C3%ADs_Fox"=summary(editEntity.item )"ä" ,"‰"(replace.)"’" ,"í"(replace.text = text.target :text in "í" if text.target = text ()getTarget.claim = target ]:"P1843"]["claims"()[get.item in claim for :generator in item for), so this actually does edit all the items. Turns out that’s not too bad, because my account doesn’t have the bot flag, so pywikibot has to sleep for about ten seconds after each edit to prevent running into rate limits. If the edits had turned out to be bad, I could’ve just stopped the program (with the big button at the top of the notebook) and reverted the bad edits manually. But they look good, so I just let the thing run until it’s done. The SPARQL query returned 336 results earlier, so at ten seconds per edit, we can estimate that this will take about an hour. for item in generator), not the outer loop (for claim in ... in there because automated editing is still scary, so I wanted to edit only one item at first. But I forgot that this only breaks the inner loop (breakI put that Alright – an hour later, that’s done. We’re not done, though: Many of those items also have aliases that are broken in the same way. We start again with Darwin’s fox, and look at its aliases: ]"aliases"()[get.darwins_fox = aliases method.) editAnd then we fix them in a loop. (Remember, this won’t do anything until we call some )"ä" ,"‰"(replace.)"’" ,"í"(replace.alias = ]index][lang[aliases :alias in "í" if ]):lang[aliases(enumerate in alias ,index for :aliases in lang for method. editAliasesWe do the edit – this time, with the )"See https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Darwin%C3%ADs_Fox"=summary(editEntity.darwins_fox. (Q757402)Tanimbar Corella There is another complication. In some items, the fixed alias is already added – as in, they have both “somethíng” and “someth’ng” as alias. What happens in this case with the aliases? Let’s try it out with one sample item, )"See https://www.wikidata.org/wiki/User:TweetsFactsAndQueries/Darwin%C3%ADs_Fox"=summary ,aliases(editAliases.goffins_cockatoo # prints True: those two aliases are now the same ]2]["en"[aliases == ]1]["en"[aliases )"ä" ,"‰"(replace.)"’" ,"í"(replace.alias = ]index][lang[aliases :alias in "í" if ]):lang[aliases(enumerate in alias ,index for :aliases in lang for ]"aliases"()[get.goffins_cockatoo = aliases )"Q757402" ,repo(ItemPage.pwb = goffins_cockatoo present twice: it looks like either pywikibot or the Wikidata API quietly removed the duplicate alias. That’s exactly what we want, so no need to worry here! notA quick look at the item page confirms that the alias is now Let’s move on to the mass edit, then! Here’s a SPARQL query that finds all items with an alias that contains broken characters and, when fixed, is identical to the taxon common name of the item.
Use at
- https://query.wikidata.org/sparql
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?item
WHERE
{
?item wdt:P1843 ?commonName;
skos:altLabel ?alias.
FILTER(CONTAINS(?alias, "í") && CONTAINS(?alias, " ")).
FILTER(REPLACE(REPLACE(STR(?alias), "í", "’"), "‰", "ä") = STR(?commonName)).
}