# Introduction
Tesseract is an open source [text recognition (OCR)](https://en.wikipedia.org/wiki/Optical_character_recognition) Engine, available under the [Apache 2.0 license.](http://www.apache.org/licenses/LICENSE-2.0) It can be used directly, or (for programmers) using an [API](https://github.com/tesseract-ocr/tesseract/blob/main/include/tesseract/baseapi.h) to extract printed text from images. It supports a wide variety of languages.
Tesseract doesn't have a built-in GUI, but there are several available from the [3rdParty](User-Projects-%E2%80%93-3rdParty.md) page.
# Installation
There are two parts to install, the engine itself, and the traineddata for the languages.
Tesseract is available directly from many Linux distributions. The package is generally called **'tesseract'** or **'tesseract-ocr'** - search your distribution's repositories to find it.
Packages for over 130 languages and over 35 scripts are also available directly from the Linux distributions. The language traineddata packages are called **'tesseract-ocr-langcode'** and **'tesseract-ocr-script-scriptcode'**, where `langcode` is three letter language code and `scriptcode` is four letter script code.
**Examples:** tesseract-ocr-eng (**English**), tesseract-ocr-ara (**Arabic**), tesseract-ocr-chi-sim (**Simplified Chinese**), tesseract-ocr-script-latn (**Latin Script**), tesseract-ocr-script-deva (**Devanagari script**), etc.
** FOR EXPERTS ONLY. **
If you are experimenting with OCR Engine modes, you will need to manually install language training data beyond what is available in your Linux distribution.
[Various types of training data](Data-Files) can be found on [GitHub](https://github.com/tesseract-ocr/.md). Unpack and copy the .traineddata file into a 'tessdata' directory. The exact directory will depend both on the type of training data, and your Linux distribution. Possibilities are `/usr/share/tesseract-ocr/tessdata` or `/usr/share/tessdata` or `/usr/share/tesseract-ocr/4.00/tessdata`.
Training data for obsolete Tesseract versions [=< 3.02](https://sourceforge.net/projects/tesseract-ocr-alt/files/?source=navbar) reside in another location.
## Platforms
If Tesseract is not available for your distribution, or you want to use a newer version than they offer, you can [compile your own](Compiling).
### Ubuntu
You can install Tesseract and its developer tools on Ubuntu by simply running:
```
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
```
**Note for Ubuntu users**: In case ```apt``` is unable to find the package try adding ```universe``` entry to the ```sources.list``` file as shown below.
```
sudo vi /etc/apt/sources.list
Copy the first line "deb http://archive.ubuntu.com/ubuntu bionic main" and paste it as shown below on the next line.
If you are using a different release of ubuntu, then replace bionic with the respective release name.
deb http://archive.ubuntu.com/ubuntu bionic universe
```
### Debian packages
* [Tesseract 4](https://notesalexp.org/tesseract-ocr/packages/)
* [Tesseract 5](https://notesalexp.org/tesseract-ocr/packages5/)
* [Tesseract 5 (devel)](https://notesalexp.org/tesseract-ocr/packages-dev/)
### Raspbian packages
* [Tesseract 4](https://notesalexp.org/tesseract-ocr/packages/)
* [Tesseract 5](https://notesalexp.org/tesseract-ocr/packages5/)
* [Tesseract 5 (devel)](https://notesalexp.org/tesseract-ocr/packages-dev/)
### Ubuntu packages
* [Tesseract 4](https://notesalexp.org/tesseract-ocr/packages/)
* [Tesseract 5](https://notesalexp.org/tesseract-ocr/packages5/)
* [Tesseract 5 (devel)](https://notesalexp.org/tesseract-ocr/packages-dev/)
### Ubuntu ppa
* [Tesseract 4](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr)
* [Tesseract 5](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr5)
* [Tesseract 5 (devel-daily)](https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr-daily)
## RHEL/CentOS/Scientific Linux, Fedora, openSUSE packages
* [Tesseract 4](https://build.opensuse.org/project/show/home:Alexander_Pozdnyakov)
* [Tesseract 5](https://build.opensuse.org/project/show/home:Alexander_Pozdnyakov:tesseract5)
See [Installation on OpenSuse](InstallationOpenSuse.md) page for detailed instructions.
### AppImage
_Instruction_
1. Download AppImage from [releases](https://github.com/AlexanderP/tesseract-appimage/releases) page
1. Open your terminal application, if not already open
1. Browse to the location of the AppImage
3. Make the AppImage executable:
`$ chmod a+x tesseract*.AppImage`
4. Run it:
`./tesseract*.AppImage -l eng page.tif page.txt`
_AppImage compatibility_
* Debian: ⥠10
* Fedora: ⥠29
* Ubuntu: ⥠18.04
* CentOS ⥠8
* openSUSE Tumbleweed
_Included traineddata files_
* deu - German
* eng - English
* fin - Finnish
* fra - French
* osd - Script and orientation
* por - Portuguese
* rus - Russian
* spa - Spanish
### snap
For distributions that are supported by snapd you may also run the following command to install the `tesseract` built binaries([Don't have snapd installed?](https://snapcraft.io/docs/core/install)):
sudo snap install --channel=edge tesseract
The traineddata is currently not shipped with the snap package and must be placed manually to `~/snap/tesseract/current`.
### macOS
You can install Tesseract using either [MacPorts](https://www.macports.org/) or [Homebrew](http://brew.sh).
A macOS wrapper for the Tesseract API is also available at [Tesseract macOS](https://github.com/scott0123/Tesseract-macOS).
#### MacPorts
To install Tesseract run this command:
```
sudo port install tesseract
```
To install any language data, run:
```
sudo port install tesseract-