Skip to content

[Feature request] Add more automatic configuration for LangChain LLMs for Python processors #707

@devinbost

Description

@devinbost

From 0.4.3+, information specified in the configuration YAML is automatically available to the context object in Python processors.
This feature is extremely helpful.
However, it doesn't cover every case.
For example, to switch between using OpenAI vs Azure OpenAI for the LLM in LangChain implementations, there's a different wrapper that must be provided.

import os
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
os.environ["OPENAI_API_BASE"] = "..."
os.environ["OPENAI_API_KEY"] = "..."

from langchain.llms import AzureOpenAI
llm = AzureOpenAI(
    deployment_name="td2",
    model_name="text-davinci-002",
)

(https://python.langchain.com/docs/integrations/llms/azure_openai#deployments)

We run into a similar issue with WatsonX. (In https://ibm.github.io/watson-machine-learning-sdk/fm_extensions.html you can see that WatsonxLLM(model=model) must be declared.)

In these cases, to configure the env variables, we still need to either:

  • map them in the pipeline YAML file (which is a potential source of mapping errors)
  • use os.enviroment in Python (which is worse since it's also subject to the same human error but now adds more code to maintain and can't be validated as easily at build time)

Then, we also need to ensure that the correct LLM wrapper is instantiated in the Python code.
It would be quite useful if LangStream could simplify this by providing something like:
llm = context.buildLLM()

In LangChain, there is a small tree now of LLM classes, but most of them appear to implement either BaseLanguageModel or BaseLLM.

For example:

  • AzureChatOpenAI < ChatOpenAI < BaseChatModel < BaseLanguageModel[BaseMessage]
  • AzureOpenAI < BaseOpenAI < BaseLLM < BaseLanguageModel[str]

Different implementations require different env vars to be set, so it is quite annoying to need to manually keep track of them all, especially when needing to switch between providers.

Perhaps we can leverage this to add additional no-code configuration to add more value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions