|
|
|
|
|
- Converting PDF to text
- Special cases
PDFToText (Function) In french: PDFVersTexte Extracts text from a PDF file. New in version 2025MaChaîne is string
MaChaîne = PDFToText("C:\Temp\MonDocument.pdf")
SAI_SaisieTexteMulti = MaChaîne
MonPDF is pdfDocument = PDFOpen("test.pdf")
MaChaîne = PDFToText(MonPDF, "1-2")
SAI_SaisieTexteMulti = MaChaîne
Syntax
Extracting the content of a PDF using the file path Hide the details
<Result> = PDFToText(<PDF file> [, <Pages to extract> [, <Password> [, <Options>]]])
<Result>: Character string Text of the PDF file. <PDF file>: Character string Name and path of the PDF file to be analyzed. <Pages to extract>: Optional character string Range of pages that the text will be extracted from. The format used is identical to that used in standard print boxes: individual page numbers or page ranges separated by semicolons.. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.If this parameter is not specified or is an empty string (""), all pages are extracted. <Password>: Optional string or Secret string Password required to open the file if the PDF file is password protected.
New in version 2025Secret strings: If you use the secret string vault, the type of secret string used for this parameter must be "ANSI or Unicode string". To learn more about secret strings and how to use the vault, see Secret string vault. <Options>: Integer constant Text splitting mode: | | pttCompatible | Split PDF text using the algorithm from versions 24 and earlier. | pttDefault (Default value) | Split PDF text using an optimized algorithm. This splitting may be different from previous versions. |
Extracting the content of a PDF document present in a pdfDocument variable Hide the details
<Result> = PDFToText(<PDF document> [, <Pages to extract>])
<Result>: Character string Text of the PDF file. <PDF document>: pdfDocument variable Name of the pdfDocument variable to be used. <Pages to extract>: Optional character string Range of pages that the text will be extracted from. The format used is identical to that used in standard print boxes: individual page numbers or page ranges separated by semicolons.. For example, "1;3;4;6-10;12" means that the text of pages 1, 3, 4, 6 to 10, and 12 will be extracted.If this parameter is not specified or is an empty string (""), all pages are extracted. Remarks Converting PDF to text - When converting a PDF to text, the document formatting is lost.
- Text is extracted in the order in which the PDF commands appear and is written sequentially in the resulting string. Text blocks and paragraphs are preserved (as well as carriage returns).
- Unicode characters are not returned.
- Data from a PDF form is not extracted (this data is not stored in the PDF file).
Special cases - PDFIsProtected is used to know if a password is required to open a PDF file.
- PDFNumberOfPages returns the total number of pages in a PDF file.
Starting with version 28, this function is supported by 32-bit ARM processors only if the pvtCompatible constant is used. New PDF features require a 64-bit execution mode. If an application is to be run on devices with 32-bit ARM processors, it must be generated with WINDEV Mobile 27.
Business / UI classification: Business Logic Component: wd300wdpdf.dll
This page is also available for…
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|