Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: multiple worrying inconsistencies with openai apis and docs #17066

Open
1 task done
xucian opened this issue Nov 26, 2024 · 2 comments
Open
1 task done

[Question]: multiple worrying inconsistencies with openai apis and docs #17066

xucian opened this issue Nov 26, 2024 · 2 comments
Labels
question Further information is requested

Comments

@xucian
Copy link
Contributor

xucian commented Nov 26, 2024

Question Validation

  • I have searched both the documentation and discord for an answer.

Question

basically I was just confused about strict=True not being included with structured outputs with openai, but I've found many other inconsistencies during writing this.
I see structured function calling (docs) needs strict: True:

        "function": {
            "name": "get_weather",
            "strict": True,
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["c", "f"]},
                },
                "required": ["location", "unit"],
                "additionalProperties": False,
            },
        },

inspect:
Screenshot 2024-11-25 at 18 01 32

is this a bug or am I missing something?
I also think there should be an error for structure outputs if the pydantic models don't have the:

    class Config:
        # Disallow additional properties. Needed for OpenAI pydantic program mode
        extra = 'forbid'  

that's required for the "additionalProperties": False, to be added to the json schema, which is required to enable structured outputs.

--
I just recently discovered that I might not need function calling at all and response_format is enough (the frequency of function calling mentions in docs is pretty annoying imo as most of people are looking for a basic structured response).

while writing this I discovered the OpenAI's strict param, I set it to true and when inspecting the callstack, strict: True is still not added. I imagine it was supposed to enter this function, but my debugger doesn't hit it:

    def _prepare_chat_with_tools(
        self,
        tools: List["BaseTool"],
        user_msg: Optional[Union[str, ChatMessage]] = None,
        chat_history: Optional[List[ChatMessage]] = None,
        verbose: bool = False,
        allow_parallel_tool_calls: bool = False,
        tool_choice: Union[str, dict] = "auto",
        strict: Optional[bool] = None,
        **kwargs: Any,
    ) -> Dict[str, Any]:
        """Predict and call the tool."""
        tool_specs = [tool.metadata.to_openai_tool() for tool in tools]

        # if strict is passed in, use, else default to the class-level attribute, else default to True`
        if strict is not None:
            strict = strict
        else:
            strict = self.strict

this is how I'm constructing the LLM:

        self.llm = OpenAIFixedForStructuredOutput(
        # self.llm = OpenAI(
            model=self.persona.meta.llm_model,
            temperature=temperature,
            max_tokens=max_output_tokens,
            api_key=OpenAiPersonaLlm.TOKEN,
            # timeout=300
            timeout=60,
            # This is important to use openai's native structured output
            # pydantic_program_mode=PydanticProgramMode.DEFAULT,
            pydantic_program_mode=PydanticProgramMode.OPENAI,
            # pydantic_program_mode=PydanticProgramMode.FUNCTION,
            
            # Update: Llamaindex just disregards it, might be because of PydanticProgramMode.OPENAI, TODO investigate
            # IMPORTANT FOR STRUCTURED OUTPUT, OTHERWISE THIS MODE DOESN'T SEEM TO BE "ACTIVATED" 
            #  AT OPENAI'S SIDE AND IT SILENTLY CONTINUES WITH NORMAL MODE
            strict=True
        )

...

# And the custom OpenAI llm class (provided for completeness but I don't think it's relevant to the issue)
class OpenAIFixedForStructuredOutput(OpenAI):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        assert self.pydantic_program_mode == PydanticProgramMode.OPENAI, "Pydantic program mode must be OPENAI for this fixed version of OpenAI llm class"

    @dispatcher.span
    def structured_predict(
        self, *args: Any, llm_kwargs: Optional[Dict[str, Any]] = None, **kwargs: Any
    ) -> BaseModel:
        assert not llm_kwargs, "llm_kwargs not supported in this fixed version of OpenAI llm class"
        llm_kwargs = llm_kwargs or {}
        # Skipping one base class (OpenAI) since that's what we're trying to fix 
        # TODO consider monkey patching as it might be more "pythonic"
        return super(FunctionCallingLLM, self).structured_predict(*args, llm_kwargs=llm_kwargs, **kwargs)
    
    # Check base class for what we are overriding
    @dispatcher.span
    async def astructured_predict(
        self, *args: Any, llm_kwargs: Optional[Dict[str, Any]] = None, **kwargs: Any
    ) -> BaseModel:
        assert not llm_kwargs, "llm_kwargs not supported in this fixed version of OpenAI llm class"
        llm_kwargs = llm_kwargs or {}
        # Skipping one base class (OpenAI) since that's what we're trying to fix 
        # TODO consider monkey patching as it might be more "pythonic"
        return await super(FunctionCallingLLM, self).astructured_predict(*args, llm_kwargs=llm_kwargs, **kwargs)
    
    @dispatcher.span
    def stream_structured_predict(
        self, *args: Any, llm_kwargs: Optional[Dict[str, Any]] = None, **kwargs: Any
    ) -> Generator[Union[Model, List[Model]], None, None]:
        assert not llm_kwargs, "llm_kwargs not supported in this fixed version of OpenAI llm class"
        llm_kwargs = llm_kwargs or {}
        # Skipping one base class (OpenAI) since that's what we're trying to fix 
        # TODO consider monkey patching as it might be more "pythonic"
        return super(FunctionCallingLLM, self).stream_structured_predict(*args, llm_kwargs=llm_kwargs, **kwargs)

just while writing this I've seen:

@deprecated("Please use `FunctionCallingProgram` instead.")
class OpenAIPydanticProgram(BaseLLMFunctionProgram[LLM]):

but OPENAI here is not marked as deprecated

class PydanticProgramMode(str, Enum):
    """Pydantic program mode."""

    DEFAULT = "default"
    OPENAI = "openai"
    LLM = "llm"
    FUNCTION = "function"
    GUIDANCE = "guidance"
    LM_FORMAT_ENFORCER = "lm-format-enforcer"

aand again while writing this, llamaindex's docs appear to contradict openai's docs and actually suggest to always use function calling where available.
at this point my conclusion is llamaindex needs a doc update and some new issues created here on github.
it actually took me around 1-2h of research so it's not like I didn't do anything, but I don't have the necessary energy left to create some PRs (I'm also after a 4h-sleep night and pretty much regretting having the pretention to expect llamaindex would allow me to move and ship fast)
in other words the (apparently arbitrary) preference of function calling by default made me waste 2 days of an already extreme tight schedule during my launch week

moving on, I now tried pydantic_program_mode=PydanticProgramMode.LLM hoping this one would take care of adding the response_format and 'strict: True' to the openai api call
and guess what, it doesn't happen.

next, I searched the repo for the openai's response_format param, and.. no references whatsoever. I don't even know what to say, something just doesn't add up

so how come I have to write double/triple the amount of code to generate a structured output compared to using a native api call through openai's sdk? This is how I have to call the thing (that now works):

        resp: Question = await pllm.llm.astructured_predict(
            Question, PromptTemplate(prompt),
            llm_kwargs=pllm.get_additional_kwargs_for_structured_output(Question)
        )

    # LLlamaindex doesn't do this
    # https://platform.openai.com/docs/guides/structured-outputs#structured-outputs-vs-json-mode
    def get_additional_kwargs_for_structured_output(self, output_model: Type[BaseModel]):
        return {
            'response_format': {
                'type': 'json_schema',
                'json_schema': {
                    # TODO consider pascal_to_snake to have snake-case IF needed
                    'name': output_model.__name__,
                    'description': output_model.model_config.get('description', output_model.__doc__),
                    'strict': True,
                    'schema': output_model.model_json_schema()
                }
            }
        }
@xucian xucian added the question Further information is requested label Nov 26, 2024
Copy link

dosubot bot commented Nov 26, 2024

Hey @xucian! I'm here to assist you with any bugs, questions, or contribution inquiries. Let's work together to resolve your issue.

The issues you're encountering seem to stem from a few misunderstandings and potential gaps in the LlamaIndex implementation regarding OpenAI's structured outputs and the use of the strict parameter.

  1. Strict Parameter: The strict parameter is intended to enforce strict adherence to the specified schema in structured outputs. If setting strict=True is not reflecting in your API calls, it might be due to how the parameter is being handled or passed within the LlamaIndex or your custom implementation. It's crucial to ensure that this parameter is correctly integrated into the API call setup.

  2. Pydantic Program Mode: The pydantic_program_mode in LlamaIndex determines how structured outputs are handled. The OPENAI mode should ideally align with OpenAI's structured output requirements, but if it's not working as expected, it might be worth checking if the mode is correctly implemented or if there are any updates needed in the LlamaIndex codebase.

  3. Response Format: The lack of references to response_format in the LlamaIndex repository suggests that this feature might not be fully supported or documented. This could be a limitation if you're relying on this parameter for structured outputs.

  4. Documentation and Implementation Gaps: Your observation about the discrepancies between LlamaIndex's documentation and OpenAI's guidelines indicates a need for updates or clarifications in the LlamaIndex documentation. This could help users better understand when to use function calling versus response format.

  5. Custom Implementations: Your custom class OpenAIFixedForStructuredOutput seems to be an attempt to address some of these issues. Ensure that any custom logic aligns with OpenAI's API requirements and that parameters like strict and response_format are correctly passed and utilized.

Given these points, it might be beneficial to review the LlamaIndex code handling these parameters and consider contributing to the documentation or codebase if you identify specific areas for improvement.

To continue talking to Dosu, mention @dosu.

@logan-markewich
Copy link
Collaborator

Looking into how to best clean this up, totally get the confusion.

FYI you would have gotten results you wanted with strict=True and PydanticProgramMode.FUNCTION (I might have mentioned that in another issue).

Beyond that, tldr

  • The pydantic program modes are confusingly named, deprecated, and not documented well
  • The docs probably do emphasize function calling/tool use a bit too much for some use cases (even though response_format is the same as function calling with tool_choice in openai)
  • openai program should either be updated to use some latest openai apis, or be removed in favour of the function calling program

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants