Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreliable ordering of datastore results #2317

Closed
aliceh75 opened this issue Feb 23, 2015 · 8 comments · Fixed by servercode/ckan#1 or #6237
Closed

Unreliable ordering of datastore results #2317

aliceh75 opened this issue Feb 23, 2015 · 8 comments · Fixed by servercode/ckan#1 or #6237
Assignees

Comments

@aliceh75
Copy link
Contributor

Hi,

In the absence of an "ORDER BY" clause, PostgreSQL does not make any promises regarding the ordering of rows. In particular if you use "OFFSET/LIMIT" clauses the ordering might actually change between two subsequent queries. This is from the PostgreSQL documentation:

The query planner takes LIMIT into account when generating a query plan, so you are very likely to get different plans (yielding different row orders) depending on what you use for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.

http://www.postgresql.org/docs/current/static/sql-select.html

This issue affects datastore queries (I have witnessed it). While this is not technically a CKAN bug (the problem can be moved upwards to the API consumers), it is an unexpected behaviour. I would suggest the best place to address this is in the datastore itself, to always ensure queries include an ORDER BY term on a unique key (eg. _id)

I can do a pull request if people agree this is a good thing to do.

@aliceh75
Copy link
Contributor Author

Note that this can be fixed in an extension. (Though I really think it should be fixed in ckan - data scientists using the API shouldn't have to know about issues like these!)

@amercader amercader self-assigned this Feb 24, 2015
@amercader
Copy link
Member

@aliceh75 dev meeting agrees with you :)

Just to confirm, is the _id field created by us and indexed?

@aliceh75
Copy link
Contributor Author

Yes, the datastore always creates a _id field which is a primary key. See:
https://github.com/ckan/ckan/blob/master/ckanext/datastore/db.py#L295

This cannot be overriden by users as fields that start with an underscore are forbidden via the API:
https://github.com/ckan/ckan/blob/master/ckanext/datastore/db.py#L81

Postgresql always indexes the primary key as far as I understand:

Adding a primary key will automatically create a unique btree index on the column or group of columns used in the primary key.
http://www.postgresql.org/docs/9.1/static/ddl-constraints.html

@aliceh75
Copy link
Contributor Author

So I will create a PR (next week, I'm away until Monday!)

@Aaron-M
Copy link

Aaron-M commented Feb 24, 2015

ref #2309 - in CKAN 2.3 sometimes _id field is displayed in preview, and other times it is missing (and on the same dataset).

@rossjones
Copy link
Contributor

Not sure why this got closed by me merging a remote PR into a remote repo :( I've re-opened and submitted the PR into CKAN itself.

@rossjones rossjones reopened this Sep 11, 2018
@wrinklenose
Copy link
Contributor

Is there any progress according to this issue?

We've stumbled into the same problem and would be happy about a fix. The PR seems to be pretty close - where it's getting stuck?

@wardi
Copy link
Contributor

wardi commented Jul 7, 2021

We just need a new PR to review

wrinklenose added a commit to wrinklenose/ckan that referenced this issue Jul 9, 2021
wrinklenose added a commit to wrinklenose/ckan that referenced this issue Jul 9, 2021
wrinklenose added a commit to wrinklenose/ckan that referenced this issue Jul 12, 2021
wrinklenose added a commit to wrinklenose/ckan that referenced this issue Jul 12, 2021
smotornyuk pushed a commit that referenced this issue Aug 24, 2021
smotornyuk pushed a commit that referenced this issue Aug 24, 2021
smotornyuk pushed a commit that referenced this issue Aug 24, 2021
fostermh added a commit to cioos-siooc/ckan that referenced this issue May 27, 2022
* [docs-only] Override click version to avoid conflict

* [docs-only] Remove click version from other req files

* [docs-only] Remove click override

* Remove erroneous argument

`--config` isn't a valid option to `rebuild`

* Catch TypeError from invalid thrown by dateutils

Also provide better logs so that it is possible to identify the issue.

* Update search index date tests to throw TypeErrors

* Revert "Catch TypeError from invalid dates thrown by dateutils"

* Update version for 2.9.4b

* [ckan#5597] Fix DataPusher error during resource_update

Create local session in the DataPusher submit action and pass that to
task_status_update to avoid interferring with the overall
resource_update transaction.

See ckan#6137 for details

* Improve documentation: resource_patch

When someone reads "Uploading a new version of a resource file" is not expected that deletes all extras.

* [ckan#6113] Fix user email validator when using name as id parameter

CKAN 2.9 introduced the `email_is_unique` validator to prevent duplicates. But the
current implementation only works when the id param contains the actual UUID not
the name.

* Update ckan (paster) commands for ckan 2.9.x

* Update ckan (paster) commands for ckan 2.9.x

* [I18N] [FIX ckan#5586] Don't cache license translations across requests

* Licenses are read and cached in the package model
* Translate the license in the helper for each request.

* Use CKANRequest object implicitly in templates

* Test to ensure that we are using CKANRequest in templates

* fix test

* fix pep8 test...

* [ckan#5998] db init error in alembic

* ckan#6086 fix sqlalchemy configuration for datastore

* render activity timestamps with title= attribute

* Fixture for plugin DB migrations

* Remove unnecessary imports

* Add pending-migrations

* Drop extra tables after tests

* update argument

* Fix py3 security alerts

* Remove blogspam link from the documentation footer

* [ckan#6162] fix pagination links for custom org types

* [ckan#6179] Fix exception when using solr_user and solr_password on Py3

Fixes ckan#6179

Use the base64 module to encode the Solr Basic Auth header to avoid
bytes/str exception on Python 3

* [ckan#6179] Encoding definition

* [ckan#6181] Fix page render errors when search facets are not defined

* fix guard clause on has_more_facets, ckan#6190

* restore Page import so that other modules can refer to it via 'h.Page', ckan#6190

* [ckan#6207] Fix for g.__timer

* Remove not accessed user object

* Restore user object to _get_action

* Remove not accessed variables

* [ckan#2317] fix default ordering consuming resource content from datastore

* [ckan#2317] added changelog fragement

* [ckan#2317] made default sort by _id less aggressive

* [ckan#2317] restrict default order to non distinct case

* [2317] accepted proposed change in changelog

Co-authored-by: Ian Ward <[email protected]>

* [ckan#6159] Add and Update Docstring for existing functions

* Add doctring to `chained_action` function
* Update `chained_auth` docstring
* update `chained_helper` docstring

* solve conflict

* fix pep8 error

* update `chained_helper` and `chain_action` doctsting

* Fix documentation typo

* [ckan#6270] Typo in default_package_sort config example

* DataPusher and XLoader documented in Source docs

* Update install-from-source.rst

* fixes HTTP basic auth cred handling

* [ckan#6319] Fix datetime formatting when listing user tokens on py2.

* Fix typo

* coerce query string keys/values before passing to quote()

* ensure fallback behaviour matches default

* remove unused import

* reversed function used

* failing test case for validation error list position

* Revert "[ckan#4801] Simplify even more the erros handling in navl"

This reverts commit 7b6702b.

* Revert "[ckan#4801] Refactor processing of errors in NAVL"

This reverts commit a632078.

* Revert "[ckan#4801] Refactor processing of errors in NAVL"

This reverts commit a632078.

* Remove valid lists from error output, add tests

* [ckan#6340] Handle Traceback Exception for HTTP and HTTP status Code in logging

The traceback for HTTP Exception will be added only when debug is true

* [ckan#6340] Add the changelog

* [ckan#6340] Coding Standard

* [ckan#6340] Change the error level to debug/info

* fix for render_datetime

* [ckan#6051] Allow UTF-8 in JS translations

See issue ckan#6051

* Fix runaway preview height

* Fix unpriviledged users being able to access bulk process

Any user was able to access and view the bulk process page for organization
management. This fix checks access and returns unauthorized exception
if theuser shouldn't be there.

* Move auth check to _prepare method

Previously implementation only worked for GET requests, now works with
GET and POST requests

* Use bulk_update_public to check access

Was using group_update. Also removed duplicate check from POST request

* Update tests

* fixing access atom feed for a deleted organization

* fixing access atom feed for a deleted group

* Call to defaul action to allow links to be opened in new tab

* Switch to JQuery recommended method

* [ckan#6051] Py2/py3 compatible version of open

* Remove leftover paste import

* Coding standards

* Rebuild requirements.txt to sync it with requirements.in

* Fix io.open call in lib/i18n.py

* Pin watchdog on py3 so it installs a compatible version

* Use codecs instead of io

* Fix bad merge of ckan#6149

* Rename lib/io.py module which was giving problems

* Rename __init__.py file in extension

* Show job title on job start/finish log messages

To make it easier to debug background job calls.

Before:

```
INFO  [ckan.lib.jobs] Worker rq:worker:f0792c8bd67344f288b5704d39c43124 starts job 2baa42e5-4582-4103-92e5-b4a384d0b1da from queue "default"
```

After:

```
INFO  [ckan.lib.jobs] Worker rq:worker:f0792c8bd67344f288b5704d39c43124 starts job 2baa42e5-4582-4103-92e5-b4a384d0b1da (Process data fields) from queue "default"
```

* Add missing __init__.py file

* String literals

* snippet names rendered in non-debug mode

* Update changelog for 2.9.4

* Build frontend

* [i18n] Pull po files from Transifex

* [i18n] Compile mo files

* Upgrade version for 2.9.4

* Update version for 2.9.5b

* Consistent cli behavior

* pep8

* Py2 compatible fix for ckan#6135

* [ckan#6390] fix user create/edit email validators

* Allow strict types for user/group uploads

CKAN 2.9 specific changes when cherry-picking:

* Replace f-strings with .format()
* Don't use faker / Pillow for tests, as there is no faker fixture in
the Python 2 version

* Add changelog entry for group image types

* Move type verification into upload method

* Fix APIToken CLI test

* Update docs

* Link to config options from changelog

* Allow children for select2

* Fix children type

* [ckan#6531] Py2/py3 compatible version of open

* Add select2 features

* Undo change

* Replace f-string

* Fix standards

* [ckan#6530] Add Solr 8 support

* Set logging level to error in error mail handler

* Add RootPathMiddleware to flask stack to support non-root installs running on python 3

* Add previously removed RootPathMiddleware back to common middleware as it is still needed

* Added utility functions for common CKAN admin commands.

* Use correct auth function when editing organizations

* [ckan#5820] fix invite user with existing email error

* Fix regression when validating resource subfields (by @TomeCirun)

* [ckan#6408] Add timeout param to request get calls (by @EricSoroos)

* [ckan#6408] Document new options

* Accept empty string in one of validator

* Negate empty string check

* Fix pep8

* [i18n] Pull translation from Transifex

* [i18n] Compile mo files

* Compile frontend

* Small fix adding virtual env path to ckan command.

* Update changelog before 2.9.5

* Include the Solr 8 schema file in the 2.9 branch

* Update version for 2.9.5

* Updates to ckan_utils.sh.

* move spatial harvester into ckanext-cioos_harvest extension and allow POST requests to the spatial search api endpoint in the spatial api

* document spatil harvester config

* update submodules

* create dev branch in submodules and update

* add compile css command

* upgrade solr to 8.11.1

* update

* update all submodules to latest dev version

* update submodules again

* update schema

* Updated submodule contrib/docker/src/ckanext-cioos_theme

* Updated submodule contrib/docker/src/ckanext-spatial

* fix a few integration bugs

* update

* Updated submodule contrib/docker/src/ckanext-cioos_theme

* update translations

* fix bugs, update logos

* Updated submodule contrib/docker/src/ckanext-scheming

* add atlantic eov icons

* Updated submodule contrib/docker/src/ckanext-cioos_theme

* add collapsed option to indicate how truncated fields initially load

* add wasRevisionOf to schema.org profile output

* release all submodules -  merge dev into main

* remove geoview pip file from dockerfile

Co-authored-by: amercader <[email protected]>
Co-authored-by: Chris Wood <[email protected]>
Co-authored-by: Ken Tsang <[email protected]>
Co-authored-by: Jari Voutilainen <[email protected]>
Co-authored-by: Guillermo Ferrer Bosque <[email protected]>
Co-authored-by: Carlo Cancellieri <[email protected]>
Co-authored-by: Eric Soroos <[email protected]>
Co-authored-by: Alexandr Cherniavskyi <[email protected]>
Co-authored-by: etj <[email protected]>
Co-authored-by: chris48s <[email protected]>
Co-authored-by: Sergey Motornyuk <[email protected]>
Co-authored-by: robbi5 <[email protected]>
Co-authored-by: ThrawnCA <[email protected]>
Co-authored-by: pdelboca <[email protected]>
Co-authored-by: Christian Buck <[email protected]>
Co-authored-by: wrinklenose <[email protected]>
Co-authored-by: Ian Ward <[email protected]>
Co-authored-by: steveoni <[email protected]>
Co-authored-by: Andres Vazquez <[email protected]>
Co-authored-by: Gauravp-NEC <[email protected]>
Co-authored-by: Erin Doherty <[email protected]>
Co-authored-by: Niall McAndrew <[email protected]>
Co-authored-by: Sunny-NEC <[email protected]>
Co-authored-by: Shubham Mahajan <[email protected]>
Co-authored-by: Tome Cirun <[email protected]>
Co-authored-by: jescgom <[email protected]>
Co-authored-by: franz osorio <[email protected]>
Co-authored-by: Dan Mihaila <[email protected]>
Co-authored-by: Francesco Frassinelli <[email protected]>
Co-authored-by: Teemu Erkkola <[email protected]>
Co-authored-by: Jeff Cullis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants