ONLINE HELP
 WINDEVWEBDEV AND WINDEV MOBILE

This content has been translated automatically.  Click here  to view the French version.
Help / WLanguage / WLanguage functions / Standard functions / PDF functions
  • Converting PDF to text
  • Special cases
WINDEV
WindowsLinuxJavaReports and QueriesUser code (UMC)
WEBDEV
WindowsLinuxPHPWEBDEV - Browser code
WINDEV Mobile
AndroidAndroid Widget iPhone/iPadIOS WidgetApple WatchMac Catalyst
Others
Stored procedures
Extracts text from a PDF file.
New in version 2025
AndroidAndroid Widget The syntax for manipulating pdfDocument variables is now available.
Example
MaChaîne is string
MaChaîne = PDFToText("C:\Temp\MonDocument.pdf")
// Affichage dans un champ de saisie multiligne
SAI_SaisieTexteMulti = MaChaîne
MonPDF is pdfDocument = PDFOpen("test.pdf")
MaChaîne = PDFToText(MonPDF, "1-2")
// Affichage dans un champ de saisie multiligne
SAI_SaisieTexteMulti = MaChaîne
Syntax

Extracting the content of a PDF using the file path Hide the details

<Result> = PDFToText(<PDF file> [, <Pages to extract> [, <Password> [, <Options>]]])
<Result>: Character string
Text of the PDF file.
<PDF file>: Character string
Name and path of the PDF file to be analyzed.
<Pages to extract>: Optional character string
Range of pages that the text will be extracted from. The format used is identical to that used in standard print boxes: individual page numbers or page ranges separated by semicolons.. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.
If this parameter is not specified or is an empty string (""), all pages are extracted.
<Password>: Optional string or Secret string
Password required to open the file if the PDF file is password protected.
New in version 2025
Secret strings: If you use the secret string vault, the type of secret string used for this parameter must be "ANSI or Unicode string".
To learn more about secret strings and how to use the vault, see Secret string vault.
<Options>: Integer constant
Text splitting mode:
pttCompatibleSplit PDF text using the algorithm from versions 24 and earlier.
pttDefault
(Default value)
Split PDF text using an optimized algorithm. This splitting may be different from previous versions.

Extracting the content of a PDF document present in a pdfDocument variable Hide the details

<Result> = PDFToText(<PDF document> [, <Pages to extract>])
<Result>: Character string
Text of the PDF file.
<PDF document>: pdfDocument variable
Name of the pdfDocument variable to be used.
<Pages to extract>: Optional character string
Range of pages that the text will be extracted from. The format used is identical to that used in standard print boxes: individual page numbers or page ranges separated by semicolons.. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.
If this parameter is not specified or is an empty string (""), all pages are extracted.
Remarks

Converting PDF to text

  • When converting a PDF to text, the document formatting is lost.
  • Text is extracted in the order in which the PDF commands appear and is written sequentially in the resulting string. Text blocks and paragraphs are preserved (as well as carriage returns).
  • Unicode characters are not returned.
  • Data from a PDF form is not extracted (this data is not stored in the PDF file).

Special cases

  • PDFIsProtected is used to know if a password is required to open a PDF file.
  • PDFNumberOfPages returns the total number of pages in a PDF file.
  • Android Starting with version 28, this function is supported by 32-bit ARM processors only if the pvtCompatible constant is used. New PDF features require a 64-bit execution mode.
    If an application is to be run on devices with 32-bit ARM processors, it must be generated with WINDEV Mobile 27.
Business / UI classification: Business Logic
Component: wd300wdpdf.dll
Minimum version required
  • Version 14
This page is also available for…
Comments
Click [Add] to post a comment

Last update: 05/16/2025

Send a report | Local help