query-5b34c11da8ef5b61547d093411c23528
technical structure of the split ]reply[19:36, 9 February 2024 (UTC)) talk (ArthurPSmithDo I understand correctly that the way you are splitting is by the subject item in a triple? So ?a ?b ?c is in the scholarly subgraph if ?a is an instance of scholarly article, otherwise it's in the main subgraph? https://query-scholarly-experimental.wikidata.org/#select%20%3Finst%20%3Flabel%20%3Fcount%0Awith%20%7B%0A%20%20select%20%3Finst%20%28count%28%2a%29%20as%20%3Fcount%29%20where%20%7B%0A%20%20%20%20%5B%5D%20wdt%3AP31%20%3Finst%20.%0A%20%20%7D%20group%20by%20%3Finst%0A%7D%20as%20%25i%0Awhere%20%7B%0A%20%20include%20%25i%0A%20%20service%20%3Chttps%3A%2F%2Fquery.wikidata.org%2Fsparql%3E%20%7B%0A%20%20%20%20%3Finst%20rdfs%3Alabel%20%3Flabel%20.%20filter%28lang%28%3Flabel%29%3D%22en%22%29%0A%20%20%7D%0A%7D%20order%20by%20desc%28%3Fcount%29%0A: At the moment of writing subclasses of scholarly articles are included. Proof: ArthurPSmith@]reply[23:40, 12 February 2024 (UTC)) talk (InfrastrukturThere seem to be some bad subclasses listed. Errors in the classification will obviously cause items to end up in the wrong graph removing results from queries. Since it takes a month to initially populate a graph this issue could be bad if left unaddressed. ]reply[23:55, 12 February 2024 (UTC)) talk (InfrastrukturI'm not sure if items that are instance of say "scholarly article" and "parchment" end in one of the graphs or gets duplicated on both, the latter seems safer. ]reply[16:05, 14 February 2024 (UTC)) talk (InfrastrukturIt's a join order thing. ?author ?b ?c . hint:Prior hint:runLast true . ]reply[15:38, 14 February 2024 (UTC)) talk (ArthurPSmithI get a timeout. Am I doing federation wrong, or is something broken here? } LIMIT 1 ?author ?b ?c . } } wd:Q22683203 wdt:P50 ?author . SELECT ?author WHERE { > {https://query-scholarly-experimental.wikidata.org/sparqlSERVICE < select ?author ?b ?c WHERE { - the labels work, and the list of author items is good (and very quick - 203 ms). So great. But then what if I want some other data on these authors? If I try this:https://w.wiki/9AcP - I get the id's as labels. If I query for any triple with author as subject I get nothing. So authors are present in the scholarly graph only as item id's, with no statements of their own. The same query on the main graph gives nothing (that's what I would expect). If I try the same query using federation: https://w.wiki/9AcHfor an example - say I want a list of all the authors, wikidata ID and English name, on a particular paper. On the scholarly graph if I do this - on properties - I guess my main concern was with the auto-complete in finding properties by name, but it looks like auto-complete is searching both graphs for properties and items so no issue there.: thanks - some comments, a little out of order: ABaso (WMF)@ ]reply[23:23, 13 February 2024 (UTC)) talk (ABaso (WMF)For the timeouts: I haven't looked closely yet, but I think this may be the result of the summed times, particularly when BlazeGraph is exhibiting a sequenced behavior, exceeding 60 seconds, which is the base timeout value on the given endpoint. One way to mitigate this could be in these experimental endpoints to bump the timeout up to 2 minutes to try to sidestep this for this experimental period. This way if one side takes one second and the other takes 60 seconds it's okay for now. We would probably want to examine this more closely later, as we wouldn't really want to just allow queries directed at one graph to get more time-expensive. Now, I have heard that there can be challenges when the results from a federated target are too big - and it's possible what we're seeing may be the symptom of that sort of thing manifesting in the merger of the results between the graphs for federated queries. Are folks seeing that, for the case of a federated query, the queries issued in isolation against their specific graph are both taking under 60 seconds apiece? I'm seeing lexemes only in the main graph, at least based on looking for triples with a predicate of ontolex:lexicalForm. Regarding properties, is the question about whether the same set of wikibase:propertyType predicate triples exist in both graphs? If so, yes, that's the case, and the actual triples employed for a given property depend on item-to-property assignments. This said, is there a different aspect of this question to consider? Would you have an example in mind, though? https://w.wiki/9AG2 https://w.wiki/9AFq I think this may help in part on the question on triples (in HDFS, quads) associated with a value - here are a couple examples of what you might see in the scholarly graph: Thanks. I'll try to respond piece by piece here, and appreciate your help getting down to specifics.ArthurPSmith@ ]reply[14:57, 13 February 2024 (UTC)) talk (ArthurPSmith: Thanks. By "values" (item 3) where the value is an item, are any triples/quads associated with that item also included, or only the item id/URI? I'm assuming properties are fully present in both graphs. Are lexemes only in the main graph? And do you understand what's going on with the timeouts that I and others seem to be seeing on this (see the Phabricator task comments from the last few days)? Is that something that can be fixed, or are we not querying correctly? ABaso (WMF)@ ]reply[03:34, 13 February 2024 (UTC)) talk (ABaso (WMF)6. Then remove from 5 the references and values that are only attached to the scholarly graph, but keep any other references or values - some references and values are used in both graphs. 5. From the full graph, subtract the items identified in 1. Main ("non-scholarly") graph: 4. Add those together to produce the scholarly graph. 3. Get the values for the elements in 1 and 2. 2. Find the references for the elements in 1. 1. Find subjects whose P31 is a scholarly article (Q13442814) and find all quads whose context matches those subjects. Scholarly graph: to create the split. I'll see if we can get some time on a Meet go through it, but will try to summarize here what happens.ScholarlyArticleSplitter.scalaThose quads undergo extraction in There are routines that pull the Wikidata dumps into HDFS, producing about 15 billion quads for the full graph - context, subject, predicate, object. The underlying transformations are composed of several parts, but I'll try to focus on the essential pieces. for the cool queries.Infrastruktur, good to hear from you! And thanks @ArthurPSmithHi @ ]reply[07:08, 13 February 2024 (UTC)) talk (InfrastrukturPlease disregard. My conclusions were incorrect. ]reply[00:25, 13 February 2024 (UTC)) talk (Infrastruktur https://query-main-experimental.wikidata.org/#select%20%3Fitem%20%3Flabel%20%0Awhere%20%7B%0A%20%20bind%28wd%3AQ24669646%20as%20%3Fitem%29%0A%20%20service%20%3Chttps%3A%2F%2Fquery.wikidata.org%2Fsparql%3E%20%7B%0A%20%20%20%20%3Fitem%20rdfs%3Alabel%20%3Flabel%20.%20filter%28lang%28%3Flabel%29%3D%22en%22%29%0A%20%20%7D%0A%7DGuess they are duplicated: Like this?
Use at
- https://query.wikidata.org/sparql
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?author ?b ?c
WHERE {
?author ?b ?c .
SERVICE <https://query-scholarly-experimental.wikidata.org/sparql> {
SELECT ?author
WHERE {
wd:Q22683203 wdt:P50 ?author .
}
}
}