The OCR Software Blog

Automatic OCR Language Detection

Fri, 17 Jan 2025 00:00:00 +0100

The January 2025 OCR API update not only added six new OCR languages, but also added language auto-detection.

To use the new language autodetect feature, select the option to “Autodetect OCR Language on our Online OCR page:

In the OCR API call, you can enable the OCR language autodetection with the parameter language="auto" instead of e. g. language="eng" or language="chs".

How to use OCR language autodetection

To ensure full backward compatibilty, language autodetection is not enabled by default. To enable it, you need to use language="auto" in your OCR API call. Remember to also select the OCR Engine 2 with the ocrengine=2 parameter.

So if you want to detect the languages in the images automatically and extract the detected text, use OCR Engine 2. The OCR software can also detect multiple languages within a single image or PDF. This is done automatically once language auto-detection is enabled.

Six New OCR Languages

Thu, 16 Jan 2025 00:00:00 +0100

With the January 2025 OCR API update we added support for six new OCR languages to our OCR Engine 2:

Korean OCR
Japanese OCR
Russian OCR
Ukranian OCR
Thai OCR
Vietnamese OCR

We also added language auto-detection and support for vertical text OCR.

Vertical Text OCR

The new OCR Engine2 version can read/OCR vertical text OCR, too. This is especially useful for languages that often use vertical writing, such as Japanese or Chinese. Here is an example that uses a Japanese Manga as input image:

To test the new OCR languages and vertical OCR, select the option to “Autodetect OCR Language” on the OCR.space website:

In the OCR API call, you can enable OCR language autodetection with the parameter language="auto".

OCR API runs on 100% renewable energy

Sat, 11 Nov 2023 00:00:00 +0100

Taking responsibility for the environment means there is an increasing need to obtain energy from renewable sources. We are happy to report that the Ocr.Space cloud OCR API uses only energy from renewable sources to power its server.

Datacenter 1:

Our servers in the German data center use hydropower. Power provider is Energiedienst AG, a TÜV certified company that generates green energy from 100 percent carbon dioxide-free and environmentally-friendly hydropower.

Datacenter 2:

The data center in Finland uses wind and hydropower for all of its energy uses. Power provider is the Finnish energy company Oomi Oy.

Datacenter 3:

In our data center in France the energy has a 100% hydraulic Guarantee of Origin (GO), too.

Hindi, Thai and Vietnamese OCR

Fri, 02 Sep 2022 00:00:00 +0200

We added new OCR languages to our experimental OCR Engine3. This includes the often requested Hindi OCR, Thai OCR and Vietnamese OCR support.

All available OCR languages are listed in the updated OCR languages tables.

Improved Chinese OCR

Mon, 08 Aug 2022 00:00:00 +0200

Many users like the OCR quality of our OCR Engine2 for Western character sets/languages.

With our new OCR Engine5 we bring the same high OCR quality to non-latin characters languages. It starts with support for Chinese. Simplified characters (as used in China) and traditional characters (as used in Taiwan, Hongkong and Singapore) are both supported.

Engine5 can handle text on complex backgrounds, low contrast, uneven light or if it’s distorted well. As an unintended side effect of these improvements, we noticed that the new OCR Engine can also read simple captchas quite good.

So if you have a text that can not be read well by OCR Engine1 or 2, try it with OCR Engine5. But the same goes the other way around, too. Engine5 is not always better. There are documents where Engine1 or 2 give you better OCR results.

Document solutions vendor Nitro Software acquires PDFpen software

Mon, 28 Jun 2021 00:00:00 +0200

The document productivity vendor Nitro Software has acquired the document software technology PDFpen for $6 million. PDFpen is a suite of PDF productivity applications for Apple Mac, iPhone and iPad devices, including digital signatures, Optical Character Recognition (OCR), PDF editing and cloud storage.

The technology was acquired from US-based Smile Inc., which also sells communications software platform TextExpander. This is a funny coincidence, since, without knowing about the acquisition, our RPA software team reviewed TextExpander earlier today in its Mac Desktop Automation blog post.

Nitro said PDFpen “strategically expands” its Nitro Productivity Platform to Apple PCs and mobile devices, reaching more knowledge workers globally.

The deal is also Nitro’s first since getting listed on the Australian Securities Exchange in late 2019.

The acquisition is subject to customary closing conditions, which Nitro expects it would meet by 9 July 2021.

Improved Searchable PDF Generation

Sat, 27 Jun 2020 00:00:00 +0200

This updates brings the abilitiy to generate searchable PDF to OCR Engine2. Searchable PDF are sometimes called Sandwich PDF because they contain two layers: The original PDF and a second layer with the OCR’ed text.

This updates also improves the searchable PDF generation in general. The new searchable PDF software supports mixed format documents as input now. Mixed format documents are scans that contain both landscape and portrait pages within one scan PDF.

Since the format of the returned OCR API result (JSON) is identical for both OCR engines, you can easily switch between both engines as needed. Just change the OCREngine parameter from “1” to “2”. If you have any question about using Engine 1 or 2, please ask in our UI.Vision OCR API Forum.

Updated On-Premise OCR engine (Offline OCR)

Wed, 10 Jun 2020 00:00:00 +0200

Our popular on-premise OCR software received a major update: OCR Engine 2 is now integrated in the offline version of the OCR.Space OCR API.

The offline OCR is 100% the same as the hosted OCR service that we offer at https://ocr.space , and the documentation that you find on https://ocr.space/ocrapi applies to the on-premise version as well.

Alpha-numeric OCR engine gets PDF OCR support

Sat, 29 Feb 2020 00:00:00 +0100

We updated our OCR Engine 2 with full PDF OCR support, including auto-rotation and receipt OCR. And while we are at it, we also improved the processing speed of the engine 2.

How to test the PDF OCR Features of Engine2

You can test the PDF OCR and performance update with the free online OCR feature on our front page. You can switch between both OCR engines and compare the result. Please note that the the free online OCR uses our free OCR API, so the performance is the one of the free OCR API endpoint api.ocr.space.

Engine 1 continues to be the engine with the most language options, including Arabic OCR, Chinese OCR, Korean OCR, Japanese OCR, Russian OCR.

OCR engine2 is best suited for alpha-numberic OCR with Latin characters.

Both engines are fast, but OCR engine1 is still about twice as fast. Excluding upload times, typical conversions speeds are ~0.3-0.5s for engine1, and ~0.6-0.8s for engine2 for the PRO endpoints. Our free OCR API endpoint is sometimes slower due to the much higher load on the free OCR API servers. We run the free OCR servers with a much high load to keep the cost of offering free OCR software (SaaS) reasonable.

The format of returned OCR API result (JSON) is identical for both engines. Thus you can easily switch between both engines as needed. Just change the OCREngine parameter from “1” to “2”. If you have any question about using Engine 1 or 2, please ask in our UI.Vision OCR API Forum.

Next on our todo list is to add the option to generate searchable PDF to engine2 and make OCR Engine2 available as on-premise OCR software, too.

WebP OCR supported added

Fri, 28 Feb 2020 00:00:00 +0100

The February 2020 update adds WebP to the list of supported image files for image ocr. The supported image formats are PNG, JPG, GIF, TIFF and WebP now. And of course PDF for document OCR. But wait… what is WebP?

What is WebP?

A WEBP file is an image saved in the WebP (pronounced “Weppy”) raster image format developed by Google for web graphics. The WebP format reduces file size more than standard JPEG compression while maintaining similar or better image quality. It supports both lossy and lossless compression and includes an alpha channel for transparency, similar to .PNG.

As of 2020, WEBP files are supported by most web browsers, except for Apple Safari and Microsoft Internet Explorer.

You can view WEBP files with Chrome, Firefox, Opera, or Microsoft Edge. Many graphics editors can open and save WebP files, such as Adobe Photoshop, Gimp, Image Magick, and IrfanView.

Below you find a WEBP demo image file for testing webp ocr:

Screenshot image in WebP format

If you see this image, then your browser supports the WebP format. It is a screenshot from our RPA software product homepage.