Skip to content

[Feat]: Support on-premise reranker API #1212

@1saac-k

Description

@1saac-k

Is your feature request related to a problem?

Currently, MemMachine supports local rerankers but does not support API-based on-premise rerankers.

When you specify ce_ranker_id in configurations.yml, it internally calls the local reranker using Hugging Face's sentence-transformers library.

Similar to on-premise LLMs that use OpenAI-compatible APIs, it would be great if rerankers could also support on-premise APIs.

vLLM provides rerank APIs compatible with Jina and Cohere APIs. We expect other inference engines (sglang, etc.) to support similar features.

https://docs.vllm.ai/en/latest/serving/openai_compatible_server/

Regarding terminology, strictly speaking, this is not an "OpenAI" Compatible API. OpenAI does not have a reranker service.

This would be called a "Jina" Compatible Reranker API or "Cohere" Compatible Reranker API. However, vLLM doesn't seem to use these expressions either. The term "OpenAI Compatible" also seems acceptable.

Describe the solution you'd like

For embeddings, both the openai_embedder and openai_compatible_embedder options internally use the openai.AsyncOpenAI SDK. The only difference is that openai_compatible_embedder changes the base_url.

For rerankers, it would be good to follow the same approach as embeddings.
Since cohere_reranker_id is currently provided, adding a cohere_compatible_reranker_id option seems intuitive.
The current cohere_reranker_id uses the cohere.ClientV2 SDK, which also appears to provide a base_url option.

Describe alternatives you've considered

No response

Additional context

We’ve already begun work on supporting this feature and will upload a draft PR shortly.

Metadata

Metadata

Labels

priority: highIssue is urgent or highly impactful. Needs to be addressed as soon as possible.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions