- Overview
- How to perform a "full-text" search?
- How to create a full-text index?
- Remarks
- Defining the stop words
- Defining the synonyms
- How to create a query used to perform a full-text search?
- Full-text search: Query editor
- Full-text search: Query in SQL code
- Syntax of the search value
- Specific case of words separated by a dash
- Analyzing the result of a full-text query
- Managing the full-text indexes programmatically
Search and full-text index
Available only with these kinds of connection
HFSQL proposes a quick search for character strings in the data. This search (called "full-text") is used to find a word or a set of words. This allows you to index, without programming, the texts found in an HFSQL database. These texts can be found in "Text" items or in "Text memo" items. An index can index one or more different items. The creation of the index can take the formatted texts into account (RTF, HTML): the tags in these formats will be ignored when indexing. This allows you to find words stored in RTF or HTML documents. The results of the full-text indexing are proposed according to a relevance order ("ranking"). Caution: the "Full-text" search is not a simple search of "contains" type: the punctuation is not taken into account. How to perform a "full-text" search? To perform a "full-text" search, you must: - Create a full-text index in the data files affected by this search.
- Create a query to perform the full-text search.
- Study and display the result of the query.
How to create a full-text index? Creating a full-text index is performed in the data model editor. To create a full-text index: - In the data model editor, display the description of the items found in the relevant file ("Description of items" from the popup menu).
- Click . The window for defining the full-text index is displayed. Select the items taking part in the composition of the full-text index. Only the "Text" items and the "Text memo" items found in the file description are proposed. Validate.
- The "Full-text index" item is displayed in the list of items.
- Select the item and modify its characteristics if necessary:
- in the list of items, you have the ability to modify the name of the full-text index.
- in the right section of the window, select the parameters of the full-text index:
- minimum size of indexed words (option "Do not index words whose size is less than or equal to"): only words longer than this size will be indexed. This size is set to 2 characters by default.
- support for RTF tags: the option "My text may contain RTF
(do not index the RTF tags)" allows you to specify if the text contains text in RTF. In this case, the RTF tags are ignored when indexing.
- support for HTML tags: the option "My text may contain HTML
(don't index the HTML tags)" allows you to specify if the text contains text in HTML format. In this case, the HTML tags are ignored when indexing.
- Manage UNICODE punctuation: this option is used to specify whether the text contains punctuation characters specific to Unicode (curved quotes, etc.).
- sensitivity to the case: If the option "Case sensitive" is checked, the case (upper/lower case) will be taken into account when indexing.
- sensitivity to accented characters: If the option "Accent sensitive" is checked, accented characters will be taken into account when indexing. This option is required if the stemming is requested.
- management of stop words: The management of stop words allows you to define the stop words that will be ignored during the index operation. In most cases, the stop words are the articles, the connecting words, ... The button "Edit stop words" is used to define the stop words that must be taken into account. For more details, see Defining the stop words.
- management of synonyms: The management of synonyms is used to define the words that will be considered as synonyms. For more details, see Defining the synonyms.
- management of stemming: If the management of synonyms is not enabled, you have the ability to enable the stemming. The stemming is used to perform searches on a full-text index while taking the variations of a given word into account. This feature is interesting for the knowledge databases and it allows you to find the words in singular or plural form. The stemming depends on the language used.
If this option is enabled, select the language used.
- Validate the description window of items.
- Save and generate the analysis. When generating the analysis, the description of the data files is modified and the full-text index is created (file with a ".FTX" extension).
Remarks - To directly create a full-text index from a text item or from a text memo item, select the item in the description window of the items and click the link "Create a full-text index on the item".
- Full-text indexes can be managed in data files described programmatically: simply use a variable of type FullTextIndex Description to describe the index, then HDescribeFullTextIndex to validate the new index.
- The composition of an existing full-text index can be modified at any time: to do so, click the "Edit the index" button found in the left section of the item description in the data model editor.
- To create a Full-text Index item, a generation of the analysis and an automatic data modification must necessarily be performed.
- Stemming: Because of grammatical exceptions (irregular verbs in English, verbs of third group in French, etc.), some conjugations may not be grouped together (for example, a search for "fly" will not found the records containing "flown") or some plural forms may not be grouped with their singular form (foot/feet for example).
Defining the stop words If the option "Stop words" is selected, simply click "Edit stop words" to define the stop words. - Click "Edit stop words".
- A list of stop words used by default is displayed.
- You can:
- Add stop words: click the "Add" button. In the window that is displayed, enter the list of stop words to add. These words can be separated by spaces, tabulations, carriage returns or the ",", ";" and ":" characters.
- Delete one or more stop words: select the stop words to delete from the list and click the "Delete" button.
- Import stop words: click the "Import" button and select the text file containing the stop words. These words can be separated by spaces, tabulations, carriage returns or the ",", ";" and ":" characters.
- Paste stop words: to paste stop words available in the clipboard (and separated by spaces, tabs, carriage returns or the characters ",", ";" and ":"), copy the words in the clipboard (Ctrl + C) and click "Paste".
- Validate.
Defining the synonyms If the option "Synonyms" is selected, simply click "Edit synonyms" to define the synonyms. - Click "Edit synonyms".
- In the window that is displayed, specify line by line the synonyms that will be taken into account. For example:
house chalet villa condominium townhouse - Validate.
How to create a query used to perform a full-text search? The full-text searches are performed via queries: queries created in the query editor or queries created in SQL code. The search condition is entered when creating the query. Full-text search: Query editor To create a query performing a full-text search, all you have to do is add a "Full-text index" item to the query. The description window of a full-text search is automatically displayed. This window allows you to specify: - The search value. This value can be entered directly (a wizard is used to build the sought string) or it can correspond to a parameter. See "Syntax of the search value" for more details.
- The sort options of the result.
When validating this window, the "full-text" item is automatically added into the query elements. This item corresponds to the relevance of the search result. A condition was automatically defined: only the records whose relevance is greater than 0 will be displayed in the query result. Full-text search: Query in SQL code To create a query that performs a full-text search in SQL code, use the following syntax: MATCH (<Index>) AGAINST <Condition> where: - <Index> corresponds to the list of items found in the index separated by commas (the order of the items is not important).
- <Condition> corresponds to the sought string. See "Syntax of the search value" for more details.
Example: In this example, EDT_Find is an edit control and ConnectedUserID is a variable. QRY_Find is Data Source
MyQuery is string = [
SELECT * FROM Contacts
WHERE MATCH(Contacts.LastName, Contacts.FirstName, Contacts.HTMLComment,
Contacts.RoughTextComment, Contacts.Comments,
Contacts.Phone, Contacts.Office,
Contacts.Cell, Contacts.Email, Contacts.MSN, Contacts.Internet_site,
Contacts.Country, Contacts.FaxNum, Contacts.City)
AGAINST ('
]
MyQuery = MyQuery + EDT_Find + [
')
AND Contacts.UserID =
]
MyQuery = MyQuery + ConnectedUserID + [
ORDER BY LastName DESC
]
HExecuteSQLQuery(QRY_SRCH, hQueryDefault, MyQuery)
FOR EACH QRY_SRCH
TableAddLine(TABLE_Contact_by_category, ...
QRY_SRCH.ContactID, QRY_SRCH.CategoryID, ConnectedUserID, ...
QRY_SRCH.LastName, QRY_SRCH.FirstName)
END
CASE ERROR:
Error(HErrorInfo())
Remark: Query with parameter on a full-text index: how to ignore the parameter? To ignore the parameter, the "MATCH" statement of the query must not be found in the query result but in the WHERE statement. Indeed, if the pertinence must be included in the result, the parameter must be specified to evaluate the result. In order for a query created with the query editor to have the MATCH statement included in the WHERE statement, the pertinence must not be displayed in the result. Example with relevance: SELECT
MATCH(XX, YY, ZZ) AGAINST({ParamFullText}) AS PertinenceFullText
FROM
TABLE
WHERE
<Parameters>
AND
PertinenceFullText > 0
ORDER BY
PertinenceFullText DESC Example without relevance: SELECT *
FROM
TABLE
WHERE
<Parameters>
AND MATCH(XXX, YYY, ZZZ) AGAINST({ParamFullText}) > 0 Syntax of the search value The search value can contain the following elements: | | Element | Meaning | A single word | The specified word will be sought. The relevance will be increased if the text contains this word. Example: "WINDEV" searches for "WINDEV". | Two words separated by a space character | Searches for one of the words. Example: "WINDEV WEBDEV" searches for the texts containing either "WINDEV" or "WEBDEV". | A word preceded by the "+" sign | The specified word is mandatory. Example: "+WINDEV" searches for the texts that necessarily contain "WINDEV". | Word preceded by the "-" sign | The text must not contain the specified word. Example: "-Index" searches for the texts that do no contain "Index". | A word preceded by the "~" sign | If the text contains the specified word, the relevance will be reduced. | One or more words enclosed in quotes | The specified words are searched in group and in order. Caution: if "Ignore the words less than " differs from 0, the words enclosed in quotes less than the specified size will not be sought. | A word followed by the "*" sign | The type of the search performed is "Starts with" the specified word. |
Caution: The full-text search returns no result if the parameter passed is empty or null. Specific case of words separated by a dash If the text contains words separated by a dash ('-'), each section of the word is indexed independently in a full-text index. Example: - The text contains "multi-platform": the two words "multi" and "platform" will be indexed independently.
- During the search, the documents containing "multi" and/or "platform" will be found.
- To find "multi-platform", the word must be enclosed between double quotes: "multi-platform".
Caution: the search will return all the documents that contain "multi" and "platform" consecutively. The separator between the two words may not be the dash: it may be any separator (space, +, dot, comma, carriage return, tabulation, etc.).
Analyzing the result of a full-text query The result of a full-text query gives, for each record found in the data file, the relevance of the record in relation to the search value. This relevance depends on several factors: - the number of times the sought word is found in the record.
- the number of words in the record and their number of repetitions.
- the ratio between the records that contain the sought words and the records that do not contain the sought words. Indeed, the more the sought word is found in all the records, the less the relevance will be important.
- ...
Then, the result of a "full-text" search can be processed like any query result: you can for example display the result in a table, perform a sort according to the relevance, ... Managing the full-text indexes programmatically Several WLanguage functions are used to manage the full-text indexes:
| | HDescribeFullTextIndex | Describes a full-text index of data file created through programming. | HListFullTextIndex | Returns the list of full-text indexes of a file (a query or a view) recognized by the HFSQL engine. | HListStopWord | Returns the list of stop words used by a full-text index. | HListSynonym | Returns the list of synonyms used by a full-text index. |
Remarks: - Creating a full-text index in a data file created through programming can be done via a FullTextIndex Description variable.
- HIndex is used to re-index full-text indexes.
This page is also available for…
|
|
|