- under Windows 7 Home Premium.
- with CMake 2.8
- with Qt 5.1 and its MinGW 4.8.
- with basic MinGW (without Qt).
Tesseract OCR source code
Download tesseract-ocr-3.02.02.tar.gz and extract it.Leptonica library
From the Leptonica web site:Leptonica is a pedagogically-oriented open source site containing software that is broadly useful for image processing and image analysis applications.Leptonica is quite tedious to build because of all its dependencies. Fortunately, someone did this work for us.
Here is the link to his repository: https://github.com/zdenop/tesseract-mingw .
Many thanks to zdenop for saving us time!
Download the following libraries from the bin folder:
- libgif-4.dll
- libjbig-1.dll
- libjpeg-8.dll
- liblept-3.dll : the Leptonica library.
- libpng15-15.dll
- libtiff-3.dll
- libtiffxx-3.dll
- libwebp-2.dll
- zlib1.dll
You must also get the source code. I didn't use the header files in zdenop's repo but you could try. I used the original headers from Leptonica version 1.69.
Extract Leptonica archive, create a bin directory in the new folder then copy all the libraries mentioned above in it.
CMake
I use CMake version 2.8.MinGW
Installation of MinGW is out of the scope of this article. There are many tutorials about this.I use MinGW version 4.8 supplied by Qt 5.1. All the necessary tools are already installed.
If you don't already have Qt installed and don't need it, you'll have to download MinGW C/C++ development packages to build the project.
Environment batch file
We'll name it env.bat. It adds MinGW bin directory to the PATH environment var.@ECHO off SET PATH=c:\your\path\to\mingw\bin;%PATH% START %SYSTEMROOT%\system32\cmd
SET PATH=c:\mingw\bin;%PATH%
SET PATH=D:\Programs\Qt\Qt5.1.0\Tools\mingw48_32\bin;%PATH%
CMake batch files
If you code with Qt: cmake.bat.@ECHO OFF rmdir /s /q CMakeFiles del /f /q CMakeCache.txt cmake^ -G "Unix Makefiles"^ .
@ECHO OFF rmdir /s /q CMakeFiles del /f /q CMakeCache.txt cmake^ -G"Unix Makefiles"^ -D"CMAKE_MAKE_PROGRAM:PATH=C:/MinGW/bin/mingw32-make.exe"^ .
For example:
SET PATH="C:\CMake 2.8\bin";%PATH%
CMakeLists.txt file
If you are not familiar with CMake, simply consider CMakeLists.txt as a project file.In this section, we won't analyze the whole file but only the lines you will have to understand.
#_-_-_-_-_-_SOME DIRECTORIES_-_-_-_-_-_ set(OCR_DIR D:/prog/ocr) set(MINGW_DIR D:/Programs/Qt/Qt5.1.0/Tools/mingw48_32/i686-w64-mingw32) set(MINGW_LIB_DIR ${MINGW_DIR}/lib) set(LEPTONICA_DIR ${OCR_DIR}/leptonica-1.69)
- OCR_DIR : base directory for my OCR tools.
- MINGW_DIR : parent directory for the MinGW lib one, C:\MinGW if you don't use Qt.
- MINGW_LIB_DIR : this one is needed to link against winsock2 library.
- LEPTONICA_DIR : Leptonica extraction directory.
set(CMAKE_BINARY_DIR ../${PROJECT_NAME}_output)
set(WINDLL_NAME \"lib${TARGET_LIB_TESSERACT}.dll\") add_definitions(-D_tagBLOB_DEFINED -D__BLOB_T_DEFINED -DUSE_STD_NAMESPACE -DWINDLLNAME=${WINDLL_NAME})
- _tagBLOB_DEFINED : to avoid conflicting declarations between wtypes.h (MinGW) and platform.h (tesseract) if you work with Qt.
- __BLOB_T_DEFINED : same as above if your MinGW installation is not part of Qt.
- WINDLLNAME : used by ccutil files.
- USE_STD_NAMESPACE : I have not searched its exact purpose but it must be declared.
#_-_-_-_-_-_LINKING_-_-_-_-_-_ set(CMAKE_FIND_LIBRARY_SUFFIXES .a ${CMAKE_FIND_LIBRARY_SUFFIXES})
find_library(LEPTONICA_LIB NAMES lept lept-3 liblept liblept-3 PATHS ${LEPTONICA_DIR}/bin)
find_library(WS2_32_LIB NAMES libws2_32.a PATHS ${MINGW_LIB_DIR} NO_DEFAULT_PATH NO_SYSTEM_ENVIRONMENT_PATH)
Final steps
- Copy our CMakeLists.txt in the tesseract-ocr source code directory, along with configure, eurotext.tif, etc...
- Copy env.bat and cmake.bat in tesseract-ocr parent directory.
- Launch env.bat.
- Enter the tesseract dir:
cd tesseract-ocr
- Launch CMake:
or
..\cmake.bat
..\cmake_noqt.bat
- Build:
mingw32-make
- Wait a few minutes...
You should end up with a tesseract_output directory containing:
- libtesseract3.02.02.dll
- svpaint.exe
- tesseract.exe
Batch files and CMakeLists.txt can be downloaded from my repository:
https://github.com/broija/tesseract_ocr_mingw
Thankx for your tuto, but I'm somes problem when i compile it.
ReplyDelete1- C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\wtypesbase.h:385: erreur : conflicting declaration 'typedef struct tagBLOB BLOB'
} BLOB;
^
2-C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\wtypesbase.h:386: erreur : conflicting declaration 'typedef struct tagBLOB* LPBLOB'
typedef struct tagBLOB *LPBLOB;
and a lot of warning
C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\combaseapi.h:153: In file included from C:/Qt/Qt5.4.2/Tools/mingw491_32/i686-w64-mingw32/include/combaseapi.h:153:0,
C:\Qt\Qt5.4.2\Tools\mingw491_32\i686-w64-mingw32\include\objbase.h:14: from C:/Qt/Qt5.4.2/Tools/mingw491_32/i686-w64-mingw32/include/objbase.h:14,
.......
Hi,
DeleteSorry for the late answer. This tutorial was written using Qt 5.1 and MinGW 4.8.
The "_tagBLOB_DEFINED" directive was intended to avoid this problem : "to avoid conflicting declarations between wtypes.h (MinGW) and platform.h (tesseract) if you work with Qt.".
I can't remember if I faced any warning during compilation. Since I changed my computer, I can't rebuild the old environment.
Regards,
Need help to compile tesseract on win 7 having ming compiler, using cmake GUI.
ReplyDeleteHi,
DeleteThis post is 20-month old now. Unless you're trying to use the very same versions used in that post, in which case I'd be glad to help you, I'm sorry to say that I have not enough time to dig further with different versions.
Could you please be more specific?
Hi,
ReplyDeleteI followed your commands, and everything seems smooth without errors or warnings. But after the command: mingw32-make, there is no output.
in the CMakeList.txt, I tried to fix the binary output, but still no output.
Could you please helf me?
Thanks.
JackNguyen
Hello,
DeleteIf it is not too late, could you please detail exactly what you've used (which Windows OS, CMake version, etc...). I'm not sure the versions used in this post are still easy to find.
I'm still using Win7 by the way, so I won't be of any help if you're working on a more recent one.
Regards,