OCR_ERROR : OCR failed on image / Details: socket hang up (hosted version)

**Describe the bug**
We currently (and consistently) need to wait a long time (~10 minutes or longer) for even simple files to finish. This is a new phenomena and also does **not** happen when using the vendor multimodal model option (for example GPT-4o).

When looking at the related job details inside cloud.llamaindex.ai, we see the following error messages:

```
Page 3 [warning] - OCR_ERROR : OCR failed on image /home/user/dist/worker/pipeline/../../../tmp/8c1b928c-9066-44b4-a7cd-795dc09edbe8/img/img_p2_2.png. Details: socket hang up
Page 3 [warning] - OCR_ERROR : OCR failed on image /home/user/dist/worker/pipeline/../../../tmp/8c1b928c-9066-44b4-a7cd-795dc09edbe8/img/img_p2_1.png. Details: socket hang up
Page 6 [warning] - OCR_ERROR : OCR failed on image /home/user/dist/worker/pipeline/../../../tmp/8c1b928c-9066-44b4-a7cd-795dc09edbe8/img/img_p5_5.png. Details: socket hang up
```

**Files**
One of the documents we've been testing with is the public AirBnB pitchdeck, please find it attached.

[AirBnB-Deck.pdf](https://github.com/user-attachments/files/17804720/AirBnB-Deck.pdf)

**Job ID**
- 31513454-e41c-41e9-8117-4aed16203ea5
- b7964a02-c44e-435e-99a3-6231d98dff8f
- 5ed77e71-a68b-421b-9856-809569f84311
(and many more)

**Client:**
 - API

**Additional context**
Does not happen in multimodal mode.

This is the data we submit to your API via POST:

```ts
const formData = createFormData({
  file,
  parsing_instruction,
  do_not_cache: true,
  invalidate_cache: true,
  ...(richContent && {
    use_vendor_multimodal_model: true,
    vendor_multimodal_model_name: 'openai-gpt4o',
    vendor_multimodal_api_key: OPENAI_API_KEY,
  }),
})
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCR_ERROR : OCR failed on image / Details: socket hang up (hosted version) #494

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OCR_ERROR : OCR failed on image / Details: socket hang up (hosted version) #494

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions