-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreliable ordering of datastore results #2317
Comments
Note that this can be fixed in an extension. (Though I really think it should be fixed in ckan - data scientists using the API shouldn't have to know about issues like these!) |
@aliceh75 dev meeting agrees with you :) Just to confirm, is the |
Yes, the datastore always creates a This cannot be overriden by users as fields that start with an underscore are forbidden via the API: Postgresql always indexes the primary key as far as I understand:
|
So I will create a PR (next week, I'm away until Monday!) |
ref #2309 - in CKAN 2.3 sometimes _id field is displayed in preview, and other times it is missing (and on the same dataset). |
Not sure why this got closed by me merging a remote PR into a remote repo :( I've re-opened and submitted the PR into CKAN itself. |
Is there any progress according to this issue? We've stumbled into the same problem and would be happy about a fix. The PR seems to be pretty close - where it's getting stuck? |
We just need a new PR to review |
* [docs-only] Override click version to avoid conflict * [docs-only] Remove click version from other req files * [docs-only] Remove click override * Remove erroneous argument `--config` isn't a valid option to `rebuild` * Catch TypeError from invalid thrown by dateutils Also provide better logs so that it is possible to identify the issue. * Update search index date tests to throw TypeErrors * Revert "Catch TypeError from invalid dates thrown by dateutils" * Update version for 2.9.4b * [ckan#5597] Fix DataPusher error during resource_update Create local session in the DataPusher submit action and pass that to task_status_update to avoid interferring with the overall resource_update transaction. See ckan#6137 for details * Improve documentation: resource_patch When someone reads "Uploading a new version of a resource file" is not expected that deletes all extras. * [ckan#6113] Fix user email validator when using name as id parameter CKAN 2.9 introduced the `email_is_unique` validator to prevent duplicates. But the current implementation only works when the id param contains the actual UUID not the name. * Update ckan (paster) commands for ckan 2.9.x * Update ckan (paster) commands for ckan 2.9.x * [I18N] [FIX ckan#5586] Don't cache license translations across requests * Licenses are read and cached in the package model * Translate the license in the helper for each request. * Use CKANRequest object implicitly in templates * Test to ensure that we are using CKANRequest in templates * fix test * fix pep8 test... * [ckan#5998] db init error in alembic * ckan#6086 fix sqlalchemy configuration for datastore * render activity timestamps with title= attribute * Fixture for plugin DB migrations * Remove unnecessary imports * Add pending-migrations * Drop extra tables after tests * update argument * Fix py3 security alerts * Remove blogspam link from the documentation footer * [ckan#6162] fix pagination links for custom org types * [ckan#6179] Fix exception when using solr_user and solr_password on Py3 Fixes ckan#6179 Use the base64 module to encode the Solr Basic Auth header to avoid bytes/str exception on Python 3 * [ckan#6179] Encoding definition * [ckan#6181] Fix page render errors when search facets are not defined * fix guard clause on has_more_facets, ckan#6190 * restore Page import so that other modules can refer to it via 'h.Page', ckan#6190 * [ckan#6207] Fix for g.__timer * Remove not accessed user object * Restore user object to _get_action * Remove not accessed variables * [ckan#2317] fix default ordering consuming resource content from datastore * [ckan#2317] added changelog fragement * [ckan#2317] made default sort by _id less aggressive * [ckan#2317] restrict default order to non distinct case * [2317] accepted proposed change in changelog Co-authored-by: Ian Ward <[email protected]> * [ckan#6159] Add and Update Docstring for existing functions * Add doctring to `chained_action` function * Update `chained_auth` docstring * update `chained_helper` docstring * solve conflict * fix pep8 error * update `chained_helper` and `chain_action` doctsting * Fix documentation typo * [ckan#6270] Typo in default_package_sort config example * DataPusher and XLoader documented in Source docs * Update install-from-source.rst * fixes HTTP basic auth cred handling * [ckan#6319] Fix datetime formatting when listing user tokens on py2. * Fix typo * coerce query string keys/values before passing to quote() * ensure fallback behaviour matches default * remove unused import * reversed function used * failing test case for validation error list position * Revert "[ckan#4801] Simplify even more the erros handling in navl" This reverts commit 7b6702b. * Revert "[ckan#4801] Refactor processing of errors in NAVL" This reverts commit a632078. * Revert "[ckan#4801] Refactor processing of errors in NAVL" This reverts commit a632078. * Remove valid lists from error output, add tests * [ckan#6340] Handle Traceback Exception for HTTP and HTTP status Code in logging The traceback for HTTP Exception will be added only when debug is true * [ckan#6340] Add the changelog * [ckan#6340] Coding Standard * [ckan#6340] Change the error level to debug/info * fix for render_datetime * [ckan#6051] Allow UTF-8 in JS translations See issue ckan#6051 * Fix runaway preview height * Fix unpriviledged users being able to access bulk process Any user was able to access and view the bulk process page for organization management. This fix checks access and returns unauthorized exception if theuser shouldn't be there. * Move auth check to _prepare method Previously implementation only worked for GET requests, now works with GET and POST requests * Use bulk_update_public to check access Was using group_update. Also removed duplicate check from POST request * Update tests * fixing access atom feed for a deleted organization * fixing access atom feed for a deleted group * Call to defaul action to allow links to be opened in new tab * Switch to JQuery recommended method * [ckan#6051] Py2/py3 compatible version of open * Remove leftover paste import * Coding standards * Rebuild requirements.txt to sync it with requirements.in * Fix io.open call in lib/i18n.py * Pin watchdog on py3 so it installs a compatible version * Use codecs instead of io * Fix bad merge of ckan#6149 * Rename lib/io.py module which was giving problems * Rename __init__.py file in extension * Show job title on job start/finish log messages To make it easier to debug background job calls. Before: ``` INFO [ckan.lib.jobs] Worker rq:worker:f0792c8bd67344f288b5704d39c43124 starts job 2baa42e5-4582-4103-92e5-b4a384d0b1da from queue "default" ``` After: ``` INFO [ckan.lib.jobs] Worker rq:worker:f0792c8bd67344f288b5704d39c43124 starts job 2baa42e5-4582-4103-92e5-b4a384d0b1da (Process data fields) from queue "default" ``` * Add missing __init__.py file * String literals * snippet names rendered in non-debug mode * Update changelog for 2.9.4 * Build frontend * [i18n] Pull po files from Transifex * [i18n] Compile mo files * Upgrade version for 2.9.4 * Update version for 2.9.5b * Consistent cli behavior * pep8 * Py2 compatible fix for ckan#6135 * [ckan#6390] fix user create/edit email validators * Allow strict types for user/group uploads CKAN 2.9 specific changes when cherry-picking: * Replace f-strings with .format() * Don't use faker / Pillow for tests, as there is no faker fixture in the Python 2 version * Add changelog entry for group image types * Move type verification into upload method * Fix APIToken CLI test * Update docs * Link to config options from changelog * Allow children for select2 * Fix children type * [ckan#6531] Py2/py3 compatible version of open * Add select2 features * Undo change * Replace f-string * Fix standards * [ckan#6530] Add Solr 8 support * Set logging level to error in error mail handler * Add RootPathMiddleware to flask stack to support non-root installs running on python 3 * Add previously removed RootPathMiddleware back to common middleware as it is still needed * Added utility functions for common CKAN admin commands. * Use correct auth function when editing organizations * [ckan#5820] fix invite user with existing email error * Fix regression when validating resource subfields (by @TomeCirun) * [ckan#6408] Add timeout param to request get calls (by @EricSoroos) * [ckan#6408] Document new options * Accept empty string in one of validator * Negate empty string check * Fix pep8 * [i18n] Pull translation from Transifex * [i18n] Compile mo files * Compile frontend * Small fix adding virtual env path to ckan command. * Update changelog before 2.9.5 * Include the Solr 8 schema file in the 2.9 branch * Update version for 2.9.5 * Updates to ckan_utils.sh. * move spatial harvester into ckanext-cioos_harvest extension and allow POST requests to the spatial search api endpoint in the spatial api * document spatil harvester config * update submodules * create dev branch in submodules and update * add compile css command * upgrade solr to 8.11.1 * update * update all submodules to latest dev version * update submodules again * update schema * Updated submodule contrib/docker/src/ckanext-cioos_theme * Updated submodule contrib/docker/src/ckanext-spatial * fix a few integration bugs * update * Updated submodule contrib/docker/src/ckanext-cioos_theme * update translations * fix bugs, update logos * Updated submodule contrib/docker/src/ckanext-scheming * add atlantic eov icons * Updated submodule contrib/docker/src/ckanext-cioos_theme * add collapsed option to indicate how truncated fields initially load * add wasRevisionOf to schema.org profile output * release all submodules - merge dev into main * remove geoview pip file from dockerfile Co-authored-by: amercader <[email protected]> Co-authored-by: Chris Wood <[email protected]> Co-authored-by: Ken Tsang <[email protected]> Co-authored-by: Jari Voutilainen <[email protected]> Co-authored-by: Guillermo Ferrer Bosque <[email protected]> Co-authored-by: Carlo Cancellieri <[email protected]> Co-authored-by: Eric Soroos <[email protected]> Co-authored-by: Alexandr Cherniavskyi <[email protected]> Co-authored-by: etj <[email protected]> Co-authored-by: chris48s <[email protected]> Co-authored-by: Sergey Motornyuk <[email protected]> Co-authored-by: robbi5 <[email protected]> Co-authored-by: ThrawnCA <[email protected]> Co-authored-by: pdelboca <[email protected]> Co-authored-by: Christian Buck <[email protected]> Co-authored-by: wrinklenose <[email protected]> Co-authored-by: Ian Ward <[email protected]> Co-authored-by: steveoni <[email protected]> Co-authored-by: Andres Vazquez <[email protected]> Co-authored-by: Gauravp-NEC <[email protected]> Co-authored-by: Erin Doherty <[email protected]> Co-authored-by: Niall McAndrew <[email protected]> Co-authored-by: Sunny-NEC <[email protected]> Co-authored-by: Shubham Mahajan <[email protected]> Co-authored-by: Tome Cirun <[email protected]> Co-authored-by: jescgom <[email protected]> Co-authored-by: franz osorio <[email protected]> Co-authored-by: Dan Mihaila <[email protected]> Co-authored-by: Francesco Frassinelli <[email protected]> Co-authored-by: Teemu Erkkola <[email protected]> Co-authored-by: Jeff Cullis <[email protected]>
Hi,
In the absence of an "ORDER BY" clause, PostgreSQL does not make any promises regarding the ordering of rows. In particular if you use "OFFSET/LIMIT" clauses the ordering might actually change between two subsequent queries. This is from the PostgreSQL documentation:
http://www.postgresql.org/docs/current/static/sql-select.html
This issue affects datastore queries (I have witnessed it). While this is not technically a CKAN bug (the problem can be moved upwards to the API consumers), it is an unexpected behaviour. I would suggest the best place to address this is in the datastore itself, to always ensure queries include an ORDER BY term on a unique key (eg. _id)
I can do a pull request if people agree this is a good thing to do.
The text was updated successfully, but these errors were encountered: