Shortcuts: WD:RAQ, w.wiki/LX

Wikidata:Request a query

From Wikidata
Jump to navigation Jump to search

Help with WDGS

[edit]

Hi, I have a number of queries written as part of a project Wikidata:WikiProject LSEThesisProject and will need to re-write them due to the Graph Split. My SPARQL knowledge is basic and the queries produced were achieved by trial and error / modifying others' queries / kind help from the community. In preparation for trying to learn how I might re-write those queries I tried, using the Federation Guide, to write federated queries which would pick up all research outputs produced by an academic - this includes not only scholarly articles, but also book chapters, version edition translations, blog posts, chapters and articles. In the main graph as it was all these can be picked up in one query https://w.wiki/B6Ct but I'm failing to re-write this for the scholarly graph. I've tried

SELECT ?item ?itemLabel ?itemType ?itemTypeLabel

WHERE

{

  ?item wdt:P50 wd:Q17508688.

  SERVICE wdsubgraph:wikidata_main {

   ?item wdt:P50 wd:Q17508688.


}

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } # Helps get the label in your language, if not, then default for all languages, then en language

}

This gives me no results.


And I've tried

SELECT ?item ?itemLabel ?itemType ?itemTypeLabel

WHERE

{

  ?item wdt:P50 wd:Q17508688. 

  UNION 

  { SERVICE wdsubgraph:wikidata_main { ?item wdt:P50 wd:Q17508688}  }

    

 

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } # Helps get the label in your language, if not, then default for all languages, then en language

}

Which gives an error message and says the query is malformed at UNION.

Would someone be able to point out what I'm doing wrong and show me how to produce these queries.

Thanks HelsKRW (talk) 08:40, 4 September 2024 (UTC)[reply]

@HelsKRW The UNION requires the parts to be wrapped with curly brackets:
  { ?item wdt:P50 wd:Q17508688. } 
  UNION 
  { SERVICE wdsubgraph:wikidata_main { ?item wdt:P50 wd:Q17508688}  }
Here below should be your query rewritten (to run on https://query-main.wikidata.org/):
SELECT ?item ?itemLabel ?itemType ?itemTypeLabel WHERE {
  VALUES (?author) {(wd:Q17508688)}
  {
    # get the publications from the scholarly subgraph 
    SERVICE wdsubgraph:scholarly_articles {
      ?item wdt:P50 ?author ;
            wdt:P31 ?itemType
      # Instruct the label service to gather the label of the publication
      # The label for ?itemType will be fetched in the host query, the type is probably part of the main graph
      BIND(?itemLabel AS ?itemLabel)
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
  } UNION {
    # Union them with the publications in the main graph (blogs, articles...)
    ?item wdt:P50 ?author ;
          wdt:P31 ?itemType
  }  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it DCausse (WMF) (talk) 11:28, 4 September 2024 (UTC)[reply]
Thank you very much for your help. I've modified the query I'd written for the scholarly graph which is now working and I can see that the longer query you've written for the main graph is also working. Could you tell me more about how to know when the query should be written on the scholarly graph or the main graph? And would you be able to tell me more about the VALUES, BIND and UNION commands in the query you've written for the main graph. Using this query I've tried modifying some other queries, but I'm hitting up against a series of error messages and despite reading the federated guide am struggling to understand or get to grips with how to write a federated query. Thanks HelsKRW (talk) 10:25, 5 September 2024 (UTC)[reply]
Unfortunately, while writing Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide I could not find a reasonable and comprehensive set of characteristics to determine if it's better to use query-main or query-scholarly for the host query. Generally both are doable but for certain queries using one or the other greatly impact the complexity of the query.
What I would suggest is perhaps using query-main first (this is the one I most often used when writing Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples) and consider using query-scholarly if the query happens to be difficult to write. I hope that with more examples we can improve the guide over time.
  • VALUES is a sparql feature that allows to define a variable, I used it to avoid having to repeat wd:Q17508688 in the two clause around UNION. So that you can change it in single place when willing to see publication of another author.
  • BIND(?itemLabel AS ?itemLabel) is a trick we use to make the wikibase:label understand that we want to keep the label the of the item, this explained at Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Misplacing_the_label_service. But in general BIND is creating a variable, for instance in place of VALUES (?author) {(wd:Q17508688)} I could've written BIND(wd:Q17508688 as ?author).
  • UNION allows to collect the information from multiple expressions: { EXPRESSION1 } UNION { EXPRESSION2 }, in the query above EXPRESSION1 extract the scientific publications (?item) and their labels (?itemLabel) from the scholarly subgraph, EXPRESSION2 is collecting the other publications (blogs, articles) from the host service (here serving the wikidata_main graph).DCausse (WMF) (talk) 13:11, 5 September 2024 (UTC)[reply]
Thank you, In practice I seem to be struggling with the UNION command - I've tried it in multiple queries and always get an error message, whatever combination of curly brackets I try!
If I take this query from my thesis project https://w.wiki/5aHL which gives me a list of LSE’s doctoral theses with author links to Wikipedia pages where available, and try to re-write it for the new main graph... I edit it to include the hint optimizer,  the SERVICE scholarly graph and BIND – the query runs, but gives me no results   https://w.wiki/B7Fj
So I try to add in the UNION command, but whatever I do with curly bracket combinations I get an error message so can’t run the query
SELECT ?thesis ?thesisDescription ?thesisLabel ?author ?authorLabel ?authorwp ?lse_url WHERE {
  hint:Query hint:optimizer "None" .
  SERVICE wdsubgraph:scholarly_articles {
  
  ?thesis wdt:P31/wdt:P279* wd:Q1266946 ;
   wdt:P953 ?lse_url.
  
    BIND(?thesisLabel AS ?thesisLabel)
     SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  }
  } UNION {
   # Union them with the publications in the main graph (blogs, articles...)
    ?thesis wdt:P31/wdt:P279* wd:Q1266946 ;
   wdt:P953 ?lse_url.
  } 
  OPTIONAL {
   ?thesis wdt:P50 ?author.
   OPTIONAL {
     ?authorwp schema:about ?author;
      schema:isPartOf https://en.wikipedia.org/.
   }
  }
FILTER(STRSTARTS(STR(?lse_url), http://etheses.lse.ac.uk))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?thesisDescription)
Are you able to advise what I’m doing wrong on this one?  HelsKRW (talk) 10:10, 6 September 2024 (UTC)[reply]
@HelsKRW Your query is syntactically incorrect because it does not balance the opening and closing curly brackets. With complicated queries like this I highly suggest to use proper wikipedia:Indentation_style to rapidly identify where the problem is.
Every time a curly bracket is opened you indent the next line with 2 spaces to the right, when closing one you remove 2 spaces. Open or close only one curly bracket per line. With your query you could perhaps have identified that the problem happened right before the UNION where you have an extra closing curly bracket.
Similarly when not repeating the subject in the patterns (when using ;) try to align the predicates like this:
?thesis wdt:P31/wdt:P279* wd:Q1266946 ;
        wdt:P953 ?lse_url .
So that it's clearer that the wdt:P953 applies to the ?thesis.
After there was several other things incorrect:
Please see below your query rewritten with federation (to run on query-main) and some explanations in the comments:
SELECT
  ?thesis
  ?thesisDescription
  ?thesisLabel
  (COALESCE(IF(BOUND(?author), ?author, 'N/A')) AS ?author)
  ?authorLabel (COALESCE(IF(BOUND(?authorwp), ?authorwp, 'N/A')) AS ?authorwp)
  ?lse_url
WHERE {
  hint:Query hint:optimizer "None" .
  # Ideally we want to select thesis with: ?thesis wdt:P31/wdt:P279* wd:Q1266946
  # This property path might require navigating triples in the two subgraphs and thus we can't use it
  # We extract ?thesisType first so that we will match it with a simple pattern ?thesis wdt:P31 ?thesisType
  ?thesisType wdt:P279* wd:Q1266946 .
  {
    SERVICE wdsubgraph:scholarly_articles {
      SELECT ?thesis ?thesisLabel ?thesisDescription ?thesisType ?lse_url (COALESCE(IF(BOUND(?author), ?author, 'N/A')) AS ?author) { 
        ?thesis wdt:P31 ?thesisType ;
                wdt:P953 ?lse_url.
        FILTER(STRSTARTS(STR(?lse_url), "http://etheses.lse.ac.uk"))
        # We return a variable bound in an OPTIONAL clause, we have to be careful here 
        # see https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Returning_variables_bound_by_OPTIONAL
        OPTIONAL { ?thesis wdt:P50 ?author. }
        # No need to use the BIND(?thesisLabel AS ?thesisLabel)/BIND(?thesisDescription AS ?thesisDescription) trick here since we wrap our federated query
        # with a SELECT to workaround issues with the optionally bound ?author variable
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
      }    
    }    
  } UNION {
    # Union them with the publications in the main graph (blogs, articles...)
    ?thesis wdt:P31 ?thesisType ;
            wdt:P953 ?lse_url.
    FILTER(STRSTARTS(STR(?lse_url), "http://etheses.lse.ac.uk"))
    OPTIONAL { ?thesis wdt:P50 ?author. }
  }
  OPTIONAL {
    ?authorwp schema:about ?author;
              schema:isPartOf <https://en.wikipedia.org/> .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY (?thesisDescription)
DCausse (WMF) (talk) 13:37, 6 September 2024 (UTC)[reply]
Thank you for this, and all the extra detail to help my learning, which I'm just working through. I've tried on a couple of days to save the query on the main graph, but get a message to say URL shortening failed...and I'm getting that with one other query on the main graph today, though have been able to get shortened URLs for plenty of other queries - is this the place to report that, or somewhere else? Thanks! HelsKRW (talk) 11:22, 12 September 2024 (UTC)[reply]
Unfortunately it is a known limitation that I face myself, I'm not sure how others workaround it but for my part I simply copy/paste the whole URL in wikitext. If I want to show the query in the page I sadly have to repeat it twice:
- once with the mw:Extension:SyntaxHighlight using lang="sparql"
- once by copy/paste the full URL in an external link like: [https://query-main.wikidata.org/#AWFULLY%20LONG%20AND%20UNREADABLE%20URL%20PARAMETERS Try it!]
<syntaxhighlight lang="sparql">
SELECT * {?s ?p ?o} LIMIT 1
</syntaxhighlight>
[https://query-main.wikidata.org/#SELECT%20%2a%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%201 Try It!]
Template:SPARQL does not yet support query-main nor query-scholarly but if it does at some point I suppose this might be quite handy. DCausse (WMF) (talk) 06:48, 13 September 2024 (UTC)[reply]
Thank you! HelsKRW (talk) 10:18, 13 September 2024 (UTC)[reply]

Labels for scholarly articles

[edit]

I took my very simplest query to try to get my head round federated queries. I am looking simply for the count of different types of thesis at an institution. I'm not getting the labels for the type of thesis, even though I think those labels must be in the scholarly subgraph, what am I doing wrong?

SELECT ?thesisType ?thesisTypeLabel (COUNT(?thesisType) AS ?count) WHERE {
  ?thesis wdt:P4101 wd:Q1048626;
    wdt:P31 ?thesisType.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?thesisType ?thesisTypeLabel
ORDER BY DESC (?count)

DrThneed (talk) 23:26, 4 September 2024 (UTC)[reply]

No the label for the types are in the main graph. So this works:
SELECT ?thesisType ?thesisTypeLabel (COUNT(?thesisType) AS ?count) WHERE {
 hint:Query hint:optimizer "None" .
 ?thesis wdt:P4101 wd:Q1048626;
         wdt:P31 ?thesisType.
  SERVICE wdsubgraph:wikidata_main { ?thesisType rdfs:label ?thesisTypeLabel .
    FILTER (LANG(?thesisTypeLabel) = 'en')  
  }
}  
GROUP BY ?thesisType ?thesisTypeLabel ORDER BY DESC (?count)
Although the test link doesn't work. We need to update that one to specify the scholarly query service. Here's a working short link: https://w.wiki/B6j4 Ainali (talk) 09:32, 5 September 2024 (UTC)[reply]
Oh I should have thought of that. Thanks Jan. *Individual theses* would have a label in the scholarly subgraph, but not the subclasses, right? DrThneed (talk) 20:30, 5 September 2024 (UTC)[reply]
OK I thought that was OK on first glance but now I see the counts are completely different!
The query is returning 1654 master's theses for Lincoln University on the main graph and 94980 on the scholarly subgraph! The 1654 is the correct figure (and the numbers look to be correct for the initial query I posted without labels). What's going on? DrThneed (talk) 21:10, 5 September 2024 (UTC)[reply]
My fault, I should have counted the distinct thesis when getting the labels. This gives your expected result with labels:
SELECT ?thesisType ?thesisTypeLabel (COUNT(DISTINCT ?thesis) AS ?count) 
WHERE {
 hint:Query hint:optimizer "None" .
 ?thesis wdt:P4101 wd:Q1048626;
         wdt:P31 ?thesisType.
  SERVICE wdsubgraph:wikidata_main { ?thesisType rdfs:label ?thesisTypeLabel .
    FILTER (LANG(?thesisTypeLabel) = 'en')  
  }
}  
GROUP BY ?thesisType ?thesisTypeLabel ORDER BY DESC (?count)
Real shortlink: [1] Ainali (talk) 22:05, 5 September 2024 (UTC)[reply]
Thanks Jan - needed a space after COUNT (https://w.wiki/B72w) but otherwise works! I'd like to understand why adding labels requires a 'distinct' here, when it doesn't for the same query on the main graph, is that something you can explain? DrThneed (talk) 22:21, 5 September 2024 (UTC)[reply]
@DrThneed The reason is a limitation of federation and blazegraph. In Wikidata:SPARQL_query_service/WDQS_graph_split/Federation_Limits we explain that federation can happen in two different ways:
  • the host service sending data to the federated service (least efficient)
  • the host service receiving data from the federated service
In your query federation works by sending the publications to the wikidata_main subgraph endpoint, but because there are many publications it is making multiple requests (by sending them in chunks) but the types it is asking are likely the same and thus it's retrieving multiple times the same label, blazegraph being unable to determine that these are the same types they remain as duplicates.
I think that a better way to do what you want is using query-main and pulling the publications from the scholarly subgraph:
SELECT ?thesisType ?thesisTypeLabel (COUNT(?thesis) AS ?count) 
WHERE {
 hint:Query hint:optimizer "None" .
 SERVICE wdsubgraph:scholarly_articles {
  ?thesis wdt:P4101 wd:Q1048626;
          wdt:P31 ?thesisType
 }
 ?thesisType rdfs:label ?thesisTypeLabel .
 FILTER (LANG(?thesisTypeLabel) = 'en')
}  
GROUP BY ?thesisType ?thesisTypeLabel ORDER BY DESC (?count)
Try it DCausse (WMF) (talk) 09:18, 6 September 2024 (UTC)[reply]
Thank you for the explanation @DCausse (WMF), that's really helpful. So much to learn! DrThneed (talk) 04:52, 7 September 2024 (UTC)[reply]

humans without source ?

[edit]

Hello! I'd like to see a list of humans, on the Dutch wikipedia, that don't have any source listed. Preferably with some info like birth date and place, gender, the wikidata description. Thanks! 81.164.2.207 21:30, 7 September 2024 (UTC)[reply]

Olympic medalists

[edit]

Hi folks! How would you approach getting a list of Olympic medalists? I tried using instance of (P31)/subclass of (P279) : Olympic medalist (Q58826204), but there are very, very few results. Strainu (talk) 11:29, 10 September 2024 (UTC)[reply]

inferring narrower occupations

[edit]

Problem: we have large numbers of people with a sole occupation of "researcher" and a description either "researcher" or based on an ORCID. This makes disambiguation really hard.

Proposed solution: Most journals have a main subject, many of which are linked by a P3095 to an occupation, so we can link a human through articles to journals then topics and occupations. If the person has 10 articles in wikidata, picking the most common occupation linked to them should be a good approximation of their occupation.

Problem: So far the query I've got times out. How do I make it go faster so it doesn't timeout? How to ignore people occupation of "researcher" AND another occupation?

SELECT ?occupation ?author (COUNT(?article) AS ?count)  WHERE
    {
        ?topic wdt:P3095 ?occupation .
        ?journal wdt:P921 ?topic .
        ?article wdt:P1433 ?journal ; wdt:P31 wd:Q13442814 ; wdt:P50 ?author .
        ?author wdt:P31 wd:Q5 ; wdt:P106 wd:Q1650915 .
    } 
GROUP BY  ?occupation ?author 
HAVING (?count > 10) LIMIT 5

Secondary problem: how do I find academic journals without P921's and P3095's?

Stuartyeates (talk) 10:18, 11 September 2024 (UTC)[reply]

Islands

[edit]

A lift of islands whose name (in English) begins with a letter A-H.

Thank you! — Martin (MSGJ · talk) 13:04, 11 September 2024 (UTC)[reply]

Something like this...
SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q23442 . # may want subclass as well
  ?item rdfs:label ?itemLabel .
  FILTER (LANG(?itemLabel) = "en" ) 
  FILTER REGEX(?itemLabel, "^[A-H]" ) # regular expression: the start of the string is in the range A...H
}
Try it!

Piecesofuk (talk) 14:53, 11 September 2024 (UTC)[reply]

Slightly different results after federating a query

[edit]

I noticed slightly different numbers in the results between my ordinary query and my rewritten for WDGS query. What's going on (probably I did something wrong!) The query is to count the types of things that main subjects of my theses are. The original query:

The following query uses these:

Features: BubbleChart (Q24515280)  View with Reasonator View with SQID

#defaultView:BubbleChart
SELECT DISTINCT ?instanceLabel (count(?instanceLabel) as ?count)
WHERE 
{  
  ?entity wdt:P5008 wd:Q111645234; wdt:P31/wdt:P279* wd:Q1266946 .  
       ?entity p:P921 ?prop . 
  OPTIONAL { ?prop ps:P921 ?value } 
  OPTIONAL { ?value wdt:P31 ?instance }
       SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
      }
GROUP BY ?instanceLabel
ORDER BY DESC(?count)

The rewritten query:

The following query uses these:

Features: BubbleChart (Q24515280)  View with Reasonator View with SQID

#defaultView:BubbleChart
SELECT DISTINCT ?instanceLabel (count(?instanceLabel) as ?count)


WHERE {       
        {     
         SERVICE wdsubgraph:scholarly_articles {    
           ?thesis wdt:P5008 wd:Q111645234 ;
                   wdt:P921 ?mainsubject.
           }                                        
        }          
       OPTIONAL {                                            
         ?mainsubject wdt:P31 ?instance 
         BIND(?instanceLabel as ?instanceLabel)
        }                                           


  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}                                                  
GROUP BY ?instanceLabel
ORDER BY DESC(?count)

DrThneed (talk) 22:21, 11 September 2024 (UTC)[reply]

Oh - I realised it probably means there is some publication(s) in the thesis project that isn't in the scholarly subgraph for some reason and so its main subjects are the reason for the difference. We have a few things like reports, papers, etc, but I would have thought they all fell into the scholarly subgraph. How can I figure out which publication(s) that is? DrThneed (talk) 22:38, 11 September 2024 (UTC)[reply]
OK, never mind - reviewed the list of types of things in the project. I suspect there is a qualification or similar thing that falls within the project and has a main subject statement on, but isn't a publication. DrThneed (talk) 23:18, 11 September 2024 (UTC)[reply]

Slice, how does it work?

[edit]

Hi there! Some time ago, I've received some help to run a large query, where the solution was to slice the results. Here's the relevant snippet:

SERVICE bd:slice {
      ?item p:P569 [].
  bd:serviceParam bd:slice.offset 0 . # Start at item number (not to be confused with QID)
  bd:serviceParam bd:slice.limit 100000 . # List this many items
}
Try it!

That worked ok! However, when I changed the element used (p:P569) I got different results (and a different number of items). Then, I'd like to understand better how does it work, and how can I use it. The element selected for the slice affects the results? I couldn't find any documentation or details about it. Pruna.ar (talk) 21:01, 13 September 2024 (UTC)[reply]

It's well hidden, but there you go: https://blazegraph.com/database/apidocs/com/bigdata/rdf/sparql/ast/eval/SliceServiceFactory.html
For all triples it the database it returns triples that matches the basic graph pattern you provide starting from the offset you provide and returning at most limit triples.
There are two use-cases for the slicing service, either you want predictable pagination or it is used to optimize a slow query. When it is used to optimize a slow query it is because an intermediate join is too large. Normally the SPARQL optimizer will try to order the joins such that it starts with the smallest possible set, but this doesn't always work.
And yes you will get different results based on the BGP you choose for the slice service as you artificially restrict that set. Suppose we have two sets A: items matching ten male given names and B: set of all humans that have ever lived. If we restrict set A to 100 items and AND the set with set B you might expect the resulting set to also have 100 items. For the second try let's restrict set B to 100 random humans and AND those sets together. We will probably get less than 10 items in the resulting set, depending on how common the names are. . Does that make sense? Infrastruktur (talk) 22:27, 13 September 2024 (UTC)[reply]
Thanks @Infrastruktur for your fast & instructive response! So the slice occurs before the joins, right?
I plan to use the different time properties (in different queries), then I'd need to first get a total count of elements for each and slice each time property depending on the total number of elements it has. I'm thinking on this correctly? Perhaps some example is needed to explain myself better? Pruna.ar (talk) 00:11, 14 September 2024 (UTC)[reply]
Yes, the slice occurs before the join. You also shouldn't need to worry about the size of the input sets. If one of the sets is 100 items long and we request a slice of 10 and keep increasing the offset by 10 for each time this will produce all the different output combinations. It shouldn't matter which set you chose, as I think combined they will all produce the same output set, but each iteration might look different and be a different size. Another way of saying it is that you walk though all of the subsets of B, which combined is the whole of set B and so you effectively take the intersection between set A and B. . Hope the notation doesn't make any mathematicians cry. Infrastruktur (talk) 06:32, 14 September 2024 (UTC)[reply]
Let me share some examples about the size. Let's assume I'm looking for peace treaties with location and time.
An initial query is:
PREFIX schema: <http://schema.org/>
::PREFIX wikibase: <http://wikiba.se/ontology#>
::PREFIX wd: <http://www.wikidata.org/entity/>
::PREFIX wdt: <http://www.wikidata.org/prop/direct/>
:: SELECT
:: DISTINCT
::        ?item ?itemLabel ?itemDescription
::        ?Sdate ?SdatePrecision
::        ?where ?whereLabel
:: WHERE {
::  ?item wdt:P31 wd:Q625298 . # peace treaty
::  ?item wdt:P276 ?where.
:: 
::  ?item p:P585/psv:P585 ?SdateNode. # point in time
::  ?SdateNode wikibase:timeValue ?Sdate.
::  ?SdateNode wikibase:timePrecision ?SdatePrecision.
::  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
::}
::
Try it!
That brings 272 results, using point in time (P585). Then there are other time options to change time, like start time (P580) that shows 3 results, or publication date (P577) that shows 4.
Now let's assume for a moment that I need to slice it, so I change and add it for an intermediate part:
PREFIX schema: <http://schema.org/>
::PREFIX wikibase: <http://wikiba.se/ontology#>
::PREFIX wd: <http://www.wikidata.org/entity/>
::PREFIX wdt: <http://www.wikidata.org/prop/direct/>
::SELECT
:: DISTINCT
::        ?item ?itemLabel ?itemDescription
::        ?Sdate ?SdatePrecision
::        ?where ?whereLabel # WHERE
:: WHERE {
::  ?item wdt:P31 wd:Q625298 . # peace treaty
::  ?item wdt:P276 ?where. # location
::  SERVICE bd:slice {
::    ?item p:P585 [].
::    bd:serviceParam bd:slice.offset 1000000 . # Start at item number (not to be confused with QID)
::    bd:serviceParam bd:slice.limit 1000000 . # List this many items
::  }
:: 
::  ?item p:P585/psv:P585 ?SdateNode. # point in time
::  ?SdateNode wikibase:timeValue ?Sdate.
::  ?SdateNode wikibase:timePrecision ?SdatePrecision.
::  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
::}
::
Try it!
As you explained, that will show only a portion of the results, and that's ok. If I run the other slices, I'll get the whole results.
But, if I change it to the next time property P580 I'll get an error: Unknown error: offset is out of range. That's what I mean about taking care of the size. As I change the property used for slicing, I must check the total size and avoid exceeding it. Pruna.ar (talk) 19:42, 14 September 2024 (UTC)[reply]
@Infrastruktur about your references to size, I'm not sure about how to "measure" it.
For example, if I try to count the elements, it shows only the results, not the possible triplets. Sample query I tried to identify the quantity for one of the examples I shared:
PREFIX schema: <http://schema.org/>
:::PREFIX wikibase: <http://wikiba.se/ontology#>
:::PREFIX wd: <http://www.wikidata.org/entity/>
:::PREFIX wdt: <http://www.wikidata.org/prop/direct/>
::: SELECT (COUNT(*) AS ?count)
::: WHERE {
::: ?item wdt:P31 wd:Q625298 . # peace treaty
::: 
::: ?item p:P585/psv:P585 ?SdateNode. # point in time
::: ?SdateNode wikibase:timeValue ?Sdate.
::: ?SdateNode wikibase:timePrecision ?SdatePrecision.
::: SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
:::}
:::
Try it!
Any ideas on how to calculate the limits for each element I use for slices?
Thanks! Pruna.ar (talk) 02:06, 18 September 2024 (UTC)[reply]
It's inconvenient to check sizes of input sets. On stock WDQS I would CTRL+left click on the Wikidata icon to bring up another browser tab to avoid messing up my query, then copy&paste in the basic graph pattern that I would like to see the size of: 'SELECT (COUNT(*) AS ?count) WHERE { ?item wdt:P31 wd:Q625298 . }'. To check the size of a join I would write that join between 2 BGPs, but we have no idea in what order the query engine will do the joins in so for that I would have to run an EXPLAIN query to see what is going on under the hood. For your initial query it would look something like this: [2]. This also conveniently gives you the sizes of all of the sets and the sizes of all the joins in the order that they happen and metric ton of other information. If you spot any joins that result in more than 100 000 items, that would be a red flag, more so if it happens early. The value for the slice limit could be a lot bigger though, it is the size of the resulting intermediate set(s) that is what ends up impacting performance. Infrastruktur (talk) 09:21, 18 September 2024 (UTC)[reply]
Thanks @Infrastruktur. As you mention, that explain has tons of info. I couldn't understand most of it. However, a couple of specific lines in the "Query Evaluation Statistics" section, where predSummary column refers to the date property (something like SPOPredicate[3](?item, Vocab(18)[3]:XSDUnsignedShort(585), ?--pp-anon-80770bc8-6329-4fb2-adb9-b85b9f33ae6b)), I can see that fastRangeCount is a little over 1M (so my sliced worked ok), and if I change it to property P580 it's 815k, and then an offset of 1M raised an error. Pruna.ar (talk) 21:14, 18 September 2024 (UTC)[reply]
Hi @Infrastruktur & community, me again :-)
As I evolve in this review, I was trying an alternative to search for any document (Q49848). I used then a very short query where I'm extending the search all over the hierarchy by using (wdt:P31|wdt:P279)+, so the code looks like:
SELECT DISTINCT
:::::: ?item ?itemLabel ?itemDescription ?Sdate
::::::WHERE {
:::::: ?item (wdt:P31|wdt:P279)+ wd:Q49848 .
:::::: ?item wdt:P585 ?Sdate.
:::::: SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
::::::}
::::::
Try it!
In this scenario, even the EXPLAIN can't show the results. And as the idea is to look for every instance of or subclass of, it grows a lot. Slice at time might not help much (as I was planing to do before), and I'm not sure if it's possible to slice when there's that "or". Any ideas to consider here? Pruna.ar (talk) 21:40, 19 September 2024 (UTC)[reply]

List of persons whose age is a multiple of 25

[edit]

I would like a list of people who are celebrating a milestone birthday this year (25, 50, 75, 100, 125, 150, etc.). I've got this far and now I can't manage to filter them:

SELECT DISTINCT ?subject ?subjectLabel ?subjectDescription (YEAR(?birthdate) AS ?year) ?age WHERE {
  ?subject ((wdt:P19|wdt:P551|wdt:P20|wdt:P1321|wdt:P937)/(wdt:P131*)) wd:Q12713;
    wdt:P569 ?birthdate.
  BIND((YEAR(NOW())) - (YEAR(?birthdate)) AS ?age)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],de". }
}
ORDER BY (?date)
Try it!

Rerumscriptor (talk) 16:36, 14 September 2024 (UTC)[reply]

Adding this filter should work: FILTER (?age - (25 * xsd:integer( ?age / 25 )) = 0) Piecesofuk (talk) 17:32, 14 September 2024 (UTC)[reply]
Thank you very much, that was really helpful! Rerumscriptor (talk) 16:02, 15 September 2024 (UTC)[reply]
@Rerumscriptor: Ich hab mal eine Abfrage gebastelt, die die Jubilare von Dresden anzeigt. User:Stefan_Kühn/Dresden#Personen_mit_Bezug_zu_Dresden,_die_heute_ein_Jubiläum_haben. Vielleicht hilft dir das ja auch weiter. --sk (talk) 13:42, 16 September 2024 (UTC)[reply]
@Stefan Kühn Vielen Dank für den Hinweis! Genau danach habe ich gesucht. Manchmal ist es bei so vielen Möglichkeiten einfach schwierig, das richtige zu finden. Rerumscriptor (talk) 19:13, 17 September 2024 (UTC)[reply]

Query to find all Renaissance Artists born in Italy

[edit]

Hi, I am totally new to Wikidata and SPARQL. I am studying but an example to start with would be awesome! Can I get all the names of Artists from the Renaissance movement that were born in Italy? Is that sufficnet information to create a query? Thank you! 93.151.230.93 20:13, 15 September 2024 (UTC)[reply]

#-----------------------------------------------------------------------
# Artists of Renaissance born in Italy
#-----------------------------------------------------------------------
#defaultView:table
select ?item ?itemLabel ?occupationLabel ?image
where {
  ?item wdt:P31 wd:Q5.                          # is human
  ?movement wdt:P361 wd:Q4692.                  # part of Renaissance
  ?item wdt:P135 ?movement.                     # movement is part of Renaissance
  # optional {?item wdt:P106 ?occupation. }       # occupation of this person
  ?item wdt:P19 ?place_of_birth.                # place of birth
  ?place_of_birth wdt:P17 wd:Q38.               # place of birth in Italy
  OPTIONAL { ?item wdt:P18 ?image. }            # image
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
limit 200
Try it!
So this is my solution. It shows only the people when there is a info about the movement. Maybe another resolution will be go over the birth date. --sk (talk) 13:37, 16 September 2024 (UTC)[reply]

List of cyclists and URLs to Wikipedia in different languages

[edit]

Hi, using the Wikidata Query Service, I've managed to get a list of Wikidata entries with a ProCyclingStats page.

SELECT DISTINCT ?item ?itemLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P1663 ?statement0.
      ?statement0 (ps:P1663) _:anyValueP1663.
    }
    LIMIT 100
  }
}
Try it!

What I'd now like, is to have the URLs to the English language article, and let's say the Spanish and French one (if they exist). Any idea if this is at all possible? Yannick1 (talk) 10:33, 16 September 2024 (UTC)[reply]

SELECT DISTINCT ?item ?itemLabel ?article_de ?article_en ?article_es WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
  {
    SELECT DISTINCT ?item WHERE {
      ?item p:P1663 ?statement0.
      ?statement0 (ps:P1663) _:anyValueP1663.
    }
    LIMIT 100
  }
  OPTIONAL {
    ?article_de schema:about ?item.
    ?article_de schema:isPartOf <https://de.wikipedia.org/>.
  }
  OPTIONAL {
    ?article_en schema:about ?item.
    ?article_en schema:isPartOf <https://en.wikipedia.org/>.
  }  
  OPTIONAL {
    ?article_es schema:about ?item.
    ?article_es schema:isPartOf <https://es.wikipedia.org/>.
  }    
}
Try it!

Is this what you want? --sk (talk) 13:16, 16 September 2024 (UTC)[reply]

That is perfect, thank you very much! Yannick1 (talk) 15:02, 16 September 2024 (UTC)[reply]

Filter by instance of country doesn't work for Bosnia and Herzegovina

[edit]

Hi there,

I'm relatively new to SPARQL, so there's a good chance I'm missing something obvious.

I try to get a List of all countries:

SELECT ?countryLabel WHERE {
  ?country wdt:P31 wd:Q6256;
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?countryLabel
Try it!

In the result, Bosnia and Herzegovina is missing, although it has the statement instance of country. If I compare the Wikidata entry of Bosnia and Herzegovina with United States of America I notice, that the country statement has another color. Maybe this is the difference why Bosnia and Herzegovina isn't found by my query?

Many thank in advance! ChakeMH (talk) 09:16, 18 September 2024 (UTC)[reply]

Yes. "wdt:" returns best-values, so if one statement is marked with preferred rank it will not match the other claims. This looks like improper use of ranking however, I don't see why "sovereign state" is somehow more correct than "country". Infrastruktur (talk) 10:16, 18 September 2024 (UTC)[reply]
Thank you for giving the right hints! I was able to change the query to match every instance of country statement. ChakeMH (talk) 11:10, 18 September 2024 (UTC)[reply]