The different steps in this article have been done:
- under Windows 7 Home Premium.
- with CMake 2.8
- with Qt 5.1 and its MinGW 4.8.
- with basic MinGW (without Qt).
Tesseract OCR source code
Download tesseract-ocr-3.02.02.tar.gz and extract it.
Leptonica library
From the Leptonica web site:
Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.
Leptonica is quite tedious to build because of all its dependencies. Fortunately, someone did this work for us.
Here is the link to his repository:
https://github.com/zdenop/tesseract-mingw .
Many thanks to zdenop for saving us time!
Download the following libraries from the bin folder:
- libgif-4.dll
- libjbig-1.dll
- libjpeg-8.dll
- liblept-3.dll : the Leptonica library.
- libpng15-15.dll
- libtiff-3.dll
- libtiffxx-3.dll
- libwebp-2.dll
- zlib1.dll
Maybe, you've noticed that a libtesseract-3.dll is also available. I've tried to use it in my projects but it didn't work. That's why I've decided to build it my way.
You must also get the source code. I didn't use the header files in zdenop's repo but you could try. I used the original headers from Leptonica version 1.69.
Extract Leptonica archive, create a bin directory in the new folder then copy all the libraries mentioned above in it.
CMake
I use CMake version 2.8.
MinGW
Installation of MinGW is out of the scope of this article. There are many tutorials about this.
I use MinGW version 4.8 supplied by Qt 5.1. All the necessary tools are already installed.
If you don't already have Qt installed and don't need it, you'll have to download MinGW C/C++ development packages to build the project.
Environment batch file
We'll name it env.bat. It adds MinGW bin directory to the PATH environment var.
@ECHO off
SET PATH=c:\your\path\to\mingw\bin;%PATH%
START %SYSTEMROOT%\system32\cmd
Example:
SET PATH=c:\mingw\bin;%PATH%
Other example for Qt users:
SET PATH=D:\Programs\Qt\Qt5.1.0\Tools\mingw48_32\bin;%PATH%
CMake batch files
If you code with Qt: cmake.bat.
@ECHO OFF
rmdir /s /q CMakeFiles
del /f /q CMakeCache.txt
cmake^
-G "Unix Makefiles"^
.
If you use MinGW out of Qt: cmake_noqt.bat.
@ECHO OFF
rmdir /s /q CMakeFiles
del /f /q CMakeCache.txt
cmake^
-G"Unix Makefiles"^
-D"CMAKE_MAKE_PROGRAM:PATH=C:/MinGW/bin/mingw32-make.exe"^
.
It assumes cmake.exe or CMake bin directory is in your PATH. If it's not the case, add a line in your env.bat.
For example:
SET PATH="C:\CMake 2.8\bin";%PATH%
CMakeLists.txt file
If you are not familiar with CMake, simply consider CMakeLists.txt as a project file.
In this section, we won't analyze the whole file but only the lines you will have to understand.
#_-_-_-_-_-_SOME DIRECTORIES_-_-_-_-_-_
set(OCR_DIR D:/prog/ocr)
set(MINGW_DIR D:/Programs/Qt/Qt5.1.0/Tools/mingw48_32/i686-w64-mingw32)
set(MINGW_LIB_DIR ${MINGW_DIR}/lib)
set(LEPTONICA_DIR ${OCR_DIR}/leptonica-1.69)
- OCR_DIR : base directory for my OCR tools.
- MINGW_DIR : parent directory for the MinGW lib one, C:\MinGW if you don't use Qt.
- MINGW_LIB_DIR : this one is needed to link against winsock2 library.
- LEPTONICA_DIR : Leptonica extraction directory.
set(CMAKE_BINARY_DIR ../${PROJECT_NAME}_output)
The build output directory.
set(WINDLL_NAME \"lib${TARGET_LIB_TESSERACT}.dll\")
add_definitions(-D_tagBLOB_DEFINED
-D__BLOB_T_DEFINED
-DUSE_STD_NAMESPACE
-DWINDLLNAME=${WINDLL_NAME})
Here, we add preprocessor definitions.
- _tagBLOB_DEFINED : to avoid conflicting declarations between wtypes.h (MinGW) and platform.h (tesseract) if you work with Qt.
- __BLOB_T_DEFINED : same as above if your MinGW installation is not part of Qt.
- WINDLLNAME : used by ccutil files.
- USE_STD_NAMESPACE : I have not searched its exact purpose but it must be declared.
#_-_-_-_-_-_LINKING_-_-_-_-_-_
set(CMAKE_FIND_LIBRARY_SUFFIXES .a ${CMAKE_FIND_LIBRARY_SUFFIXES})
Because we want to link against a static library.
find_library(LEPTONICA_LIB NAMES lept
lept-3
liblept
liblept-3
PATHS ${LEPTONICA_DIR}/bin)
Linking against Leptonica library.
find_library(WS2_32_LIB NAMES libws2_32.a
PATHS ${MINGW_LIB_DIR}
NO_DEFAULT_PATH
NO_SYSTEM_ENVIRONMENT_PATH)
Linking against winsock2 static library.
Final steps
- Copy our CMakeLists.txt in the tesseract-ocr source code directory, along with configure, eurotext.tif, etc...
- Copy env.bat and cmake.bat in tesseract-ocr parent directory.
- Launch env.bat.
- Enter the tesseract dir:
- Launch CMake:
or
- Build:
- Wait a few minutes...
You should end up with a tesseract_output directory containing:
- libtesseract3.02.02.dll
- svpaint.exe
- tesseract.exe
Batch files and CMakeLists.txt can be downloaded from my repository:
https://github.com/broija/tesseract_ocr_mingw