Search and full-text index

ONLINE HELP
WINDEV, WEBDEV AND WINDEV MOBILE

Version:

Home | Sign in | English

This content has been translated automatically. Click here to view the French version.

Help / WLanguage / Managing databases / HFSQL

Overview
How to perform a "full-text" search?
How to create a full-text index?
Remarks
Defining the stop words
Defining the synonyms
How to create a query used to perform a full-text search?
Full-text search: Query editor
Full-text search: Query in SQL code
Syntax of the search value
Specific case of words separated by a dash
Analyzing the result of a full-text query
Managing the full-text indexes programmatically

WINDEV

WEBDEV

WINDEV Mobile

Others

See also

Search and full-text index

Available only with these kinds of connection

Overview

HFSQL proposes a quick search for character strings in the data. This search (called "full-text") is used to find a word or a set of words.

This allows you to index, without programming, the texts found in an HFSQL database. These texts can be found in "Text" items or in "Text memo" items.

An index can index one or more different items. The creation of the index can take the formatted texts into account (RTF, HTML): the tags of these formats will be ignored during the indexing. This allows you to find words stored in RTF or HTML documents.

The results of the full-text indexing are proposed according to a relevance order ("ranking").

Attention: the "Full-text" search is not a simple "contains" type of search: punctuation is not taken into account.

How to perform a "full-text" search?

To perform a "full-text" search, you must:

Create a full-text index in the data files affected by this search.
Create a query to perform the full-text search.
Study and display the result of the query.

How to create a full-text index?

Creating a full-text index is performed in the data model editor.

To create a full-text index:

In the data model editor, display the description of the items found in the relevant file ("Description of items" in the context menu).
Click . The window for defining the full-text index is displayed. Select the items taking part in the composition of the full-text index. Only the "Text" items and the "Text memo" items found in the file description are proposed. Validate.
The "Full-text index" item is displayed in the list of items.
Select the item and modify its characteristics if necessary:
- in the list of items, you have the ability to modify the name of the full-text index.
- in the right section of the window, select the parameters of the full-text index:
  - minimum size of indexed words (option "Do not index words whose size is less than or equal to"): only words larger than this size will be indexed. This size is set to 2 characters by default.
  - RTF tag management the "My text may contain RTF (do not index the RTF tags)" option is used to specify whether the text contains RTF tags. In this case, the RTF tags are ignored when indexing.
  - HTML tag management tags: the "My text may contain HTML (don't index the HTML tags)" option is used to specify whether the text contains HTML. In this case, the HTML tags are ignored when indexing.
  - Manage UNICODE punctuation This option specifies whether the text contains Unicode-specific punctuation characters (curved quotation marks, etc.).
  - case management If the "Case sensitive" option is checked, the case of words (upper/lower case) will be taken into account during indexing.
  - accentuation management If the "Accent sensitive" option is checked, accentuation will be taken into account during indexing. This option is required if the stemming is requested.
  - Hollow words management Hollow word management allows you to define hollow words that should not be taken into account by indexing. Hollow words are most often articles, linking words, etc. The button "Edit stop words" is used to define the stop words that must be taken into account. For more details, see Defining the stop words.
  - synonym management Synonym management allows you to define which words are to be considered as synonyms. For more details, see Defining the synonyms.
  - root management If synonym management is not activated, it is possible to activate rootization. The stemming is used to perform searches on a full-text index while taking the variations of a given word into account. This feature is interesting for the knowledge databases and it allows you to find the words in singular or plural form. The stemming depends on the language used.
    If this option is enabled, select the language used.
Validate the description window of items.
Save and generate the analysis. When generating the analysis, the description of the data files is modified and the full-text index is created (file with a ".FTX" extension).

Remarks

To directly create a full-text index from a text item or from a text memo item, select the item in the description window of the items and click the link "Create a full-text index on the item".
It is possible to manage full-text indexes in data files described programmatically: simply use a variable of type FullTextIndex Description to describe the index, then the HDescribeFullTextIndex function to validate index creation.
You can modify the composition of an existing full-text index at any time: simply click on the "Edit index" button on the left-hand side of the heading description in the data model editor.
To create a Full-text Index item, a generation of the analysis and an automatic data modification must necessarily be performed.
Rooting Due to certain grammatical irregularities (3rd group verbs in French, irregular verbs in English, etc.), some conjugations may not be grouped together (e.g. a search for "peux" will not return records containing "pouvons") or some plurals may not be grouped together with their singular (e.g. feux/fire).

Defining the stop words

If the option "Stop words" is selected, simply click "Edit stop words" to define the stop words.

Click "Edit stop words".
A list of stop words used by default is displayed.
You can:
- Add hollow words: click on the "Add" button. In the window that is displayed, enter the list of stop words to add. These words can be separated by spaces, tabulations, carriage returns or the ",", ";" and ":" characters.
- Delete one or more hollow words: select the hollow word(s) to be deleted from the list and click on the "Delete" button.
- Import hollow words: click on the "Import" button and select the text file containing the hollow words. These words can be separated by spaces, tabulations, carriage returns or the ",", ";" and ":" characters.
- Paste hollow words: to paste hollow words available on the clipboard (and separated by spaces, tabs, carriage returns or the characters ",", ";" and ":"), copy the words to the clipboard (Ctrl + C) and click on the "Paste" button.
Validate.

Defining the synonyms

If the option "Synonyms" is selected, simply click "Edit synonyms" to define the synonyms.

Click "Edit synonyms".
In the window that is displayed, specify line by line the synonyms that will be taken into account. For example:
house chalet villa condominium townhouse
Validate.

How to create a query used to perform a full-text search?

The full-text searches are performed via queries: queries created in the query editor or queries created in SQL code. The search condition is entered when creating the query.

Full-text search: Query editor

To create a query performing a full-text search, all you have to do is add a "Full-text index" item to the query. The description window of a full-text search is automatically displayed.

This window allows you to specify:

The search value. This value can be entered directly (a wizard is used to build the sought string) or it can correspond to a parameter. For more details, see "Syntax of the search value".
The sort options of the result.

When validating this window, the "full-text" item is automatically added into the query elements. This item corresponds to the relevance of the search result. A condition was automatically defined: only records whose relevance is greater than 0 will be displayed in the query result.

Full-text search: Query in SQL code

To create a query that performs a full-text search in SQL code, use the following syntax:

MATCH (<Index>) AGAINST <Condition>

where:

<Index> corresponds to the list of items found in the index separated by commas (the order of items is not important).
<Condition> corresponds to the sought string. For more details, see "Syntax of the search value".

Example In this example, SAI_Rechercher is an edit control and IdUserConnecté is a variable.

REQ_Rech is Data Source


MaRequête is string = [
	SELECT * FROM Contacts
	WHERE MATCH(Contacts.Nom, Contacts.Prenom, Contacts.CommentaireHTML, 
	Contacts.CommentaireTexteBrut, Contacts.Commentaires, 
	Contacts.Telephone, Contacts.Bureau, 
	Contacts.Portable, Contacts.Mail, Contacts.MSN, Contacts.Site_internet, 
	Contacts.Pays, Contacts.NumFax, Contacts.Ville)
	AGAINST ('
]
MaRequête = MaRequête + SAI_Rechercher + [
')  
AND Contacts.IDUtilisateur =  
]
MaRequête = MaRequête + IDUserConnecté + [
ORDER BY Nom DESC
]

HExecuteSQLQuery(REQ_Rech, hQueryDefault, MaRequête)
FOR EACH REQ_Rech
	TableAddLine(TABLE_Contact_par_catégorie, ...
		REQ_Rech.IDContact, REQ_Rech.IDCategorie, IDUserConnecté, ...
		REQ_Rech.Nom, REQ_Rech.Prenom) 
END
CASE ERROR: 
	Error(HErrorInfo())

Note: Query with a parameter on a full-text index: how to ignore the parameter?
To ignore the parameter, the "MATCH" statement of the query must not be found in the query result but in the WHERE statement. Indeed, if the pertinence must be included in the result, the parameter must be specified to evaluate the result.

In order for a query created with the query editor to have the MATCH statement included in the WHERE statement, the pertinence must not be displayed in the result.

Example with relevance:

SELECT
MATCH(XX, YY, ZZ) AGAINST({ParamFullText}) AS PertinenceFullText
FROM
TABLE
WHERE
<Parameters>
AND
PertinenceFullText > 0
ORDER BY
PertinenceFullText DESC

Example without relevance:

SELECT *
FROM
TABLE
WHERE
<Parameters>
AND MATCH(XXX, YYY, ZZZ) AGAINST({ParamFullText}) > 0

Syntax of the search value

The search value can contain the following elements:


Element	Meaning
A single word	The specified word will be sought. The relevance will be increased if the text contains this word. Example: "WINDEV" searches for "WINDEV".
Two words separated by a space character	Searches for one of the words. Example: "WINDEV WEBDEV" searches for the texts containing either "WINDEV" or "WEBDEV".
A word preceded by the "+" sign	The specified word is mandatory. Example: "+WINDEV" searches for the texts that necessarily contain "WINDEV".
Word preceded by the "-" sign	The text must not contain the specified word. Example: "-Index" searches for the texts that do no contain "Index".
A word preceded by the "~" sign	If the text contains the specified word, the relevance will be reduced.
One or more words enclosed in quotes	The specified words are searched in group and in order. Warning: if the "Ignore words smaller than " option is set to anything other than 0, words between quotation marks smaller than the specified size will not be searched.
A word followed by the "*" sign	The type of the search performed is "Starts with" the specified word.

Warning: full-text search returns no results if the parameter passed is empty or null.

Specific case of words separated by a dash

If the text contains words separated by a dash ('-'), each section of the word is indexed independently in a full-text index.

Example:

The text contains 'multi-platform': the two words "multi-" and "platform" will be indexed independently.
During the search, the documents containing "multi" and/or "platform" will be found.
To search for "multi-platform", you need to enclose the word in quotation marks: "multi-plateforme"..
Warning The search will return all documents containing the words "multi" and "platform" consecutively. The separator between the two words may not be a hyphen: it can be any separator (space, +, period, comma, carriage return, tab, etc.).

Analyzing the result of a full-text query

The result of a full-text query gives, for each record found in the data file, the relevance of the record in relation to the search value.

This relevance depends on several factors:

the number of times the sought word is found in the record.
the number of words in the record and their number of repetitions.
the ratio between the records that contain the sought words and the records that do not contain the sought words. Indeed, the more the sought word is found in all the records, the less the relevance will be important.
...

The result of a full-text query can then be processed like any other query result: for example, the result can be displayed in a table, sorted by relevance, etc.

Managing the full-text indexes programmatically

Several WLanguage functions are used to manage the full-text indexes:


HDescribeFullTextIndex	Describes a full-text index of data file created programmatically.
HListFullTextIndex	Returns the list of full-text indexes of a file (a query or a view) recognized by the HFSQL engine.
HListStopWord	Returns the list of stop words used by a full-text index.
HListSynonym	Returns the list of synonyms used by a full-text index.

Remarks:

Creating a full-text index in a data file created through programming can be done via a FullTextIndex Description variable.
HIndex is used to re-index full-text indexes.