query-3aae288a0053db4ca8d57394a840fd38

rq turtle/ttl

commons category property does not match commons category sitelinkI'm basically trying to do P373 != commons sitelink where the commons sitelink starts with "Category:". Which mostly works - except if there is a non-latin character in the sitelink then the encoding doesn't match up, so it returns as a false positive. Any suggestions for fixing this / going about this another way, please?

Use at

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>

SELECT ?item ?commonscat ?sitelink (replace(str(replace(str(?sitelink), ".*Category:", "")), "_", " ") as ?sitelink2) WHERE {
  ?item wdt:P373 ?commonscat.
  ?sitelink schema:about ?item.
  ?sitelink schema:isPartOf <https://commons.wikimedia.org/>.
  FILTER (CONTAINS(str(?sitelink),'Category:')) .
  FILTER(?commonscat != (replace(str(replace(str(?sitelink), ".*Category:", "")), "_", " "))) .
}
LIMIT 20

Query found at

graph TD classDef projected fill:lightgreen; classDef literal fill:orange; classDef iri fill:yellow; v1("?commonscat"):::projected v3("?item"):::projected v2("?sitelink"):::projected v4("?sitelink2") c9([https://commons.wikimedia.org/]):::iri f0[["?commonscat != replace(str(replace(str(?sitelink),'.*Category:','')),'_',' ')"]] f0 --> v1 f0 --> v2 f1[["contains(str(?sitelink),'Category:')"]] f1 --> v2 v3 --"wdt:P373"--> v1 v2 --"schema:about"--> v3 v2 --"schema:isPartOf"--> c9 bind2[/"replace(str(replace(str(?sitelink),'.*Category:','')),'_',' ')"/] v2 --o bind2 bind2 --as--o v4