Skip to content

Add an example application about how to properly deal with stale documents on the vector database #612

@eolivelli

Description

@eolivelli

All the example applications that we currently have don't show how to deal with these two common issues:

Shorter pages

When you re-index a website then new version of the page may be shorter, so with less chunks.
You can override the chunks with lower ids, but you keep the old chunks with higher ids.
We need to show how to remove stale chunks

Pages that disappeared

This is trickier. When you know that you are re-indexing the whole corpus of documents (for instance a whole website) you should drop the documents that are no more available, the risks are to have outdated documents or to have duplicate content (in case of a page that has been renamed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions