The OCR Software BlogBlog posts about computer vision, cloud vision, OCR, OCR API, web scraping, selected tech news and our own software
https://ocr.space/blog/blog/
Sun, 19 Jan 2025 13:04:47 +0100Sun, 19 Jan 2025 13:04:47 +0100Jekyll v3.1.6Automatic OCR Language Detection<p>The January 2025 <a href="https://ocr.space/ocrapi">OCR API</a> update not only added <a href="https://ocr.space/blog/ocr-api-six-new-ocr-languages/">six new OCR languages</a>, but also
added language auto-detection.</p>
<p>To use the new language autodetect feature, select the option to “Autodetect OCR Language on our <a href="https://ocr.space/">Online OCR</a> page:</p>
<figure class="full-width caption">
<img src="/blog/images/posts/autodetect-ocr.jpg" alt="Autodetect OCR language" />
</figure>
<p>In the OCR API call, you can enable the
OCR language autodetection with the parameter <code class="highlighter-rouge">language="auto"</code> instead of e. g.
<code class="highlighter-rouge">language="eng"</code> or <code class="highlighter-rouge">language="chs"</code>.</p>
<!--more-->
<h2>How to use OCR language autodetection</h2>
<p>To ensure full backward compatibilty, language autodetection is not enabled by default. To enable it, you need
to use <code class="highlighter-rouge">language="auto"</code> in your <a href="https://ocr.space/ocrapi">OCR API</a> call. Remember to also select the OCR Engine
<strong>2</strong> with the <code class="highlighter-rouge">ocrengine=2</code> parameter.</p>
<p>So if you want to detect the languages in the images automatically and extract the detected text, use OCR Engine <strong>2</strong>.
The OCR software can also detect multiple languages within a single image or PDF. This is done automatically once language auto-detection
is enabled.</p>
Fri, 17 Jan 2025 00:00:00 +0100
https://ocr.space/blog/blog/ocr-api-language-autodetect/
https://ocr.space/blog/blog/ocr-api-language-autodetect/OCRSix New OCR Languages<p>With the January 2025 <a href="https://ocr.space/ocrapi">OCR API</a> update we added support for six new OCR languages
to our OCR Engine <strong>2</strong>:</p>
<ul>
<li>Korean OCR</li>
<li>Japanese OCR</li>
<li>Russian OCR</li>
<li>Ukranian OCR</li>
<li>Thai OCR</li>
<li>Vietnamese OCR</li>
</ul>
<p>We also added <a href="https://ocr.space/blog/ocr-api-language-autodetect/">language auto-detection</a> and support for vertical text OCR.</p>
<!--more-->
<h2>Vertical Text OCR</h2>
<p>The new OCR Engine2 version can read/OCR vertical text OCR, too.
This is especially useful for languages that often use
vertical writing, such as Japanese or Chinese. Here is an example that uses a Japanese Manga as input image:</p>
<figure class="full-width caption">
<img src="/blog/images/posts/manga-ocr-vertical.png" alt="Vertical OCR, for example for Japanese
Manga Reading" />
</figure>
<p>To test the new OCR languages and vertical OCR, select the option to “Autodetect OCR Language” on the
OCR.space website:</p>
<figure class="full-width caption">
<img src="/blog/images/posts/autodetect-ocr.jpg" alt="Autodetect OCR language" />
</figure>
<p>In the <a href="https://ocr.space/ocrapi">OCR API</a> call, you can enable OCR language autodetection with the parameter <code class="highlighter-rouge">language="auto"</code>.</p>
Thu, 16 Jan 2025 00:00:00 +0100
https://ocr.space/blog/blog/ocr-api-six-new-ocr-languages/
https://ocr.space/blog/blog/ocr-api-six-new-ocr-languages/languagesOCR API runs on 100% renewable energy<p>Taking responsibility for the environment means there is an increasing need to obtain energy from renewable sources. We are happy to report that the Ocr.Space cloud <a href="https://ocr.space/ocrapi">OCR API</a> uses only energy from renewable sources to power its server.</p>
<!--more-->
<h1>Datacenter 1:</h1>
<p>Our servers in the German data center use hydropower. Power provider is Energiedienst AG, a TÜV certified company that generates green energy from 100 percent carbon dioxide-free and environmentally-friendly hydropower.</p>
<h1>Datacenter 2:</h1>
<p>The data center in Finland uses wind and hydropower for all of its energy uses. Power provider is the Finnish energy company Oomi Oy.</p>
<h1>Datacenter 3:</h1>
<p>In our data center in France the energy has a 100% hydraulic Guarantee of Origin (GO), too.</p>
Sat, 11 Nov 2023 00:00:00 +0100
https://ocr.space/blog/blog/ocr-api-runs-on-100-renewable-energy/
https://ocr.space/blog/blog/ocr-api-runs-on-100-renewable-energy/SustainabilityHindi, Thai and Vietnamese OCR<p>We added new OCR languages to our experimental <a href="https://ocr.space/ocrapi#ocrengine3">OCR Engine3</a>. This includes the often requested Hindi OCR, Thai OCR and Vietnamese OCR support.</p>
<p>All available OCR languages are listed in the updated <a href="https://ocr.space/#ocrlanguages">OCR languages tables</a>.</p>
Fri, 02 Sep 2022 00:00:00 +0200
https://ocr.space/blog/blog/ocr-thai-hindi-vietnamese/
https://ocr.space/blog/blog/ocr-thai-hindi-vietnamese/Thai OCRHindi OCRVietnamese OCRImproved Chinese OCR<p>Many users like the OCR quality of our OCR Engine2 for <a href="https://ocr.space/latin">Western character sets/languages</a>.</p>
<p>With our <a href="https://ocr.space/ocrapi#ocrengine5">new OCR Engine5</a> we bring the same high OCR quality to non-latin characters languages. It starts with support for Chinese. Simplified characters (as used in China) and traditional characters (as used in Taiwan, Hongkong and Singapore) are both supported.</p>
<p>Engine5 can handle text on complex backgrounds, low contrast, uneven light or if it’s distorted well. As an unintended side effect of these improvements, we noticed that the new OCR Engine can also read simple captchas quite good.</p>
<p>So if you have a text that can not be read well by OCR Engine1 or 2, try it with OCR Engine5. But the same goes the other way around, too. Engine5 is not always better. There are documents where Engine1 or 2 give you better OCR results.</p>
Mon, 08 Aug 2022 00:00:00 +0200
https://ocr.space/blog/blog/improved-chinese-ocr/
https://ocr.space/blog/blog/improved-chinese-ocr/Chinese OCRDocument solutions vendor Nitro Software acquires PDFpen software<p>The document productivity vendor Nitro Software has acquired the document software technology PDFpen for $6 million. PDFpen is a suite of PDF productivity applications
for Apple Mac, iPhone and iPad devices, including digital signatures, <a href="https://ocr.space/">Optical Character Recognition (OCR)</a>, PDF editing and cloud storage.</p>
<p>The technology was acquired from US-based Smile Inc., which also sells communications software platform TextExpander. This is a funny coincidence, since,
without knowing about the acquisition, our <a href="https://ui.vision/rpa">RPA software</a> team reviewed TextExpander earlier today in its <a href="https://ui.vision/blog/mac-desktop-automation/">Mac Desktop Automation</a> blog post.</p>
<!--more-->
<p>Nitro said PDFpen “strategically expands” its Nitro Productivity Platform to Apple PCs and mobile devices, reaching more knowledge workers globally.</p>
<p>The deal is also Nitro’s first since getting listed on the Australian Securities Exchange in late 2019.</p>
<p>The acquisition is subject to customary closing conditions, which Nitro expects it would meet by 9 July 2021.</p>
Mon, 28 Jun 2021 00:00:00 +0200
https://ocr.space/blog/blog/document-solutions-vendor-nitro-software-acquires-pdfpen-software/
https://ocr.space/blog/blog/document-solutions-vendor-nitro-software-acquires-pdfpen-software/PDF OCRImproved Searchable PDF Generation<p>This updates brings the abilitiy to <a href="https://ocr.space/searchablepdf">generate searchable PDF</a>
to <a href="https://ocr.space/ocrapi#ocrengine">OCR Engine2</a>. Searchable PDF are sometimes called Sandwich PDF because they contain two layers: The original PDF and a second layer with the OCR’ed text.</p>
<!--more-->
<p>This updates also improves the searchable PDF generation in general. The new searchable PDF software supports mixed format documents as input now. Mixed format documents are scans that contain both landscape and portrait pages within one scan PDF.</p>
<p>Since the format of the returned OCR API result (JSON) is identical for both OCR engines, you can easily switch between both engines as needed. Just change the <a href="https://ocr.space/ocrapi#selectocrengine">OCREngine parameter</a> from “1” to “2”. If you have any question about using Engine 1 or 2, please ask in our UI.Vision <a href="https://forum.ui.vision/c/ocr-api">OCR API Forum</a>.</p>
Sat, 27 Jun 2020 00:00:00 +0200
https://ocr.space/blog/blog/searchable-pdf-update/
https://ocr.space/blog/blog/searchable-pdf-update/OCR APIUpdated On-Premise OCR engine (Offline OCR)<p>Our popular <a href="https://ocr.space/OCRAPI#local">on-premise OCR software</a> received a major update: <a href="https://ocr.space/ocrapi#ocrengine">OCR Engine 2</a> is now integrated in the offline version of the OCR.Space <a href="https://ocr.space/ocrapi">OCR API</a>.</p>
<!--more-->
<p>The <a href="https://ocr.space/OCRAPI#local">offline OCR</a> is 100% the same as the hosted OCR service that we offer at <a href="https://ocr.space">https://ocr.space</a> , and the documentation that you find on <a href="https://ocr.space/ocrapi">https://ocr.space/ocrapi</a> applies to the on-premise version as well.</p>
Wed, 10 Jun 2020 00:00:00 +0200
https://ocr.space/blog/blog/offline-ocr-e2/
https://ocr.space/blog/blog/offline-ocr-e2/OCR APIAlpha-numeric OCR engine gets PDF OCR support<p>We updated our <a href="https://ocr.space/ocrapi#ocrengine">OCR Engine 2</a> with full PDF OCR support, including <a href="https://ocr.space/ocrapi#detectorientation">auto-rotation</a> and <a href="https://ocr.space/receiptscanning">receipt OCR</a>. And while we are at it, we also improved the processing speed of the engine 2.</p>
<!--more-->
<h1>How to test the PDF OCR Features of Engine2</h1>
<p>You can test the PDF OCR and performance update with the free online OCR feature on our front page. You can switch between both OCR engines and compare the result.
Please note that the the free online OCR uses our free OCR API, so the performance is the one of the free OCR API endpoint api.ocr.space.</p>
<p><a href="https://ocr.space/ocrapi#ocrengine">Engine 1 </a> continues to be the engine with the most language options, including <a href="https://ocr.space/arabic">Arabic OCR</a>, <a href="https://ocr.space/Chinese">Chinese OCR</a>, <a href="https://ocr.space/korean">Korean OCR</a>, <a href="https://ocr.space/japanese">Japanese OCR</a>, <a href="https://ocr.space/russian">Russian OCR</a>.</p>
<p><a href="https://ocr.space/ocrapi#ocrengine">OCR engine2</a> is best suited for alpha-numberic OCR with Latin characters.</p>
<p>Both engines are fast, but <a href="https://ocr.space/ocrapi#ocrengine">OCR engine1</a> is still about twice as fast. Excluding upload times, typical conversions speeds are ~0.3-0.5s for engine1, and ~0.6-0.8s for engine2 for the <a href="https://ocr.space/OCRAPI#pro">PRO endpoints</a>. Our <a href="https://ocr.space/OCRAPI">free OCR API</a> endpoint is sometimes slower due to the much higher load on the free OCR API servers. We run the free OCR servers with a much high load to keep the cost of offering free OCR software (SaaS) reasonable.</p>
<p><strong>The format of returned OCR API result (JSON) is identical for both engines</strong>. Thus you can easily switch between both engines as needed. Just change the <a href="https://ocr.space/ocrapi#selectocrengine">OCREngine parameter</a> from “1” to “2”. If you have any question about using Engine 1 or 2, please ask in our UI.Vision <a href="https://forum.ui.vision/c/ocr-api">OCR API Forum</a>.</p>
<p>Next on our todo list is to add the option to <a href="https://ocr.space/searchablepdf">generate searchable PDF</a> to engine2 and make OCR Engine2 available as <a href="https://ocr.space/OCRAPI#local">on-premise OCR software</a>, too.</p>
Sat, 29 Feb 2020 00:00:00 +0100
https://ocr.space/blog/blog/pdf-ocr-alphanumeric/
https://ocr.space/blog/blog/pdf-ocr-alphanumeric/OCR APIWebP OCR supported added<p>The February 2020 update adds WebP to the list of supported image files for image ocr. The supported image formats are PNG, JPG, GIF, TIFF and WebP now. And of course PDF for document OCR. But wait… what is WebP?</p>
<!--more-->
<h1>What is WebP?</h1>
<p>A WEBP file is an image saved in the WebP (pronounced “Weppy”) raster image format developed by Google for web graphics. The WebP format reduces file size more than standard JPEG compression while maintaining similar or better image quality. It supports both lossy and lossless compression and includes an alpha channel for transparency, similar to .PNG.</p>
<p>As of 2020, WEBP files are supported by most web browsers, except for Apple Safari and Microsoft Internet Explorer.</p>
<p>You can view WEBP files with Chrome, Firefox, Opera, or Microsoft Edge. Many graphics editors can open and save WebP files, such as Adobe Photoshop, Gimp, Image Magick, and IrfanView.</p>
<p>Below you find a WEBP demo image file for testing webp ocr:</p>
<figure class="full-width caption">
<img src="/blog/images/posts/webp-example.webp" alt="" />
<figcaption class="caption-text">Screenshot image in WebP format</figcaption>
</figure>
<p>If you see this image, then your browser supports the WebP format. It is a screenshot from our <a href="https://ui.vision/">RPA software</a> product homepage.</p>
Fri, 28 Feb 2020 00:00:00 +0100
https://ocr.space/blog/blog/webp-ocr/
https://ocr.space/blog/blog/webp-ocr/Online OCRWebPOCR API