Introduction
(sqlite)The module fts1 (see also FtsTwo and FtsUsage ) adds full-text indexing capabilities to SQLite. It is not yet included in the SQLite binary distribution; you can find it in the SQLite CVS tree in the Building fts1 fts1 can be built either as a standalone shared library, or statically linked into the SQLite library. As a shared library To build the module as a shared library, compile all source files in the fts1 directory into a shared library (.so or .dll) on your platform. (Sorry - there's no makefile checked in yet. Coming soon.) Statically linked To statically link fts1 into SQLite, add all .c files from the fts1 directory to the Makefile you use to compile SQLite so that they will be linked into the SQLite image. You must define the preprocessor symbols SQLITE_CORE and SQLITE_ENABLE_FTS1 when compiling these files. Initializing fts1 When fts1 is built as a shared library, you can load it into SQLite using the ".load" shell command. sqlite> .load fts1
Or you can load it using a SELECT statement: SELECT load_extension('fts1');
Note that you may need to call sqlite3_enable_load_extension before loading the extension; see the SQLite LoadableExtensions documentation. In a static build, fts1 is always available; there's no need to load or otherwise initialize it. Using fts1 Full-text tables store one or more columns of fully indexed text. You can create a full-text table using the CREATE VIRTUAL TABLE statement. For example, the following creates a table with columns name and ingredients : sqlite>create virtual table recipe using fts1(name, ingredients);
You can insert rows into a full-text table in the same way as into an ordinary table with columns of type TEXT: sqlite>insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes');
The MATCH operator performs a full-text match on a column in a full-text table: sqlite> select rowid, name, ingredients from recipe where name match 'pie';
As can be seen in the preceding output, every row in a full-text table has a unique rowid, just as in any other SQLite table. A query may contain multiple terms, in which case it will return only documents containing all of the terms: sqlite> select name, ingredients from recipe where ingredients match 'onions cheese';
OR queries Inside a query, the OR operator may be used to retrieve documents containing either of two terms: sqlite> select name, ingredients from recipe where ingredients match 'onions OR cheese';
Note that the OR in this query must be capitalized. The OR operator binds more tightly than the implicit AND between two adjacent terms. Thus, the query 'onions OR cheese pumpkin' matches text which contains either "onions" or "cheese", and also contains "pumpkin": sqlite> select name, ingredients from recipe where ingredients match 'onions OR cheese pumpkin';
fts1 does not currently provide any grouping operator (i.e. parentheses) for overriding this default precedence. Excluding terms The - operator excludes any documents containing the term which follows it: sqlite> select name, ingredients from recipe where ingredients match 'onions -celery';
Note that a query must contain at least one non-excluded term: sqlite> select name, ingredients from recipe where ingredients match '-celery'; /* invalid! */
Phrase searches Phrases may be enclosed in double quotes: sqlite> select name, ingredients from recipe where ingredients match '"broccoli cheese"';
Querying multiple columns To query all columns in a full-text table simultaneously, use the table's name on the left-hand side of the MATCH operator: sqlite> select name, ingredients from recipe where recipe match 'pie';
When an all-column query contains multiple terms, a row will match even if the terms appear in different columns in the row: sqlite> select name, ingredients from recipe where recipe match 'sugar pie';
Any term in a query string may be preceded by the name of a particular column to use for matching that term: sqlite> select name, ingredients from recipe where recipe match 'name:pie ingredients:onions';
The following are entirely equivalent: sqlite> select name from recipe where ingredients match 'sugar';
When a specific column name appears to the left of the MATCH operator, that column is used for matching any term without an explicit column qualifier. Thus, the following are equivalent: sqlite> select name from recipe where recipe match 'name:pie ingredients:onions';
Note that fts1 currently requires a query to contain at most one instance of the MATCH operator. This means that to match multiple specific columns you must use field specifiers as described above; the following won't work: sqlite> select name, ingredients from recipe
Retrieving offset information When returning full-text matches, fts1 can return the character offsets of individual term matches. To retrieve offset information in a query, call the offsets function and pass it the name of your full-text table: sqlite> select name, ingredients, offsets(recipe) from recipe where recipe match 'sugar pie';
offsets returns a string containing a series of integers separated by spaces. Each match is represented by a series of four consecutive integers:
For example, in the query above two matches are returned, which we can decode as follows:
Generating snippets fts1 can generate snippets of text surrounding each match returned in a full-text query. To produce a snippet, call the snippet function and pass it the name of your full-text table: sqlite> create virtual table poem using fts1(name, text);
By default, each matching term in a snippet is surrounded with the delimiters "<b>" and "</b>", and an ellipsis "..." indicates text not included in a snippet. (Note that the ellipsis is itself surrounded by delimiters "<b>" and "</b>".) You can specify your own term delimiters and ellipsis text by specifying extra arguments to the snippet function. Its arguments are as follows:
For example: sqlite> select name, snippet(poem, '[', ']', '%%') from poem where text match 'land';
Joining full-text data A full-text table stores only full-text-indexed strings. To store full-text-indexed values along with other values, it's convenient to use both a full-text table and an ordinary SQLite table, joined by rowid. For example, suppose that you'd like to store a set of messages; each message has a sender and priority, which are not full-text indexed, and a subject and body, which are full-text indexed. You can use the following schema: create table email(sender text, priority integer);
You can insert a new message as follows: insert into email (sender, priority) values ('ashley@foo.com', 4);
To find the sender of all messages containing the word "jam" you can issue a query joining the email and email_text tables: select sender, subject from email join email_text on email.rowid = email_text.rowid
Tokenization As the module indexes a piece of text, it converts the text to a sequence of tokens. Each token becomes a term in the index and can be matched using a full-text query. The module currently uses the following generic tokenization mechanism. A token is a contiguous sequence of alphanumeric ASCII characters (A-Z, a-z and 0-9). All non-ASCII characters are ignored. Each token is converted to lowercase before it is stored in the index, so all full-text searches are case-insensitive. The module does not perform stemming of any sort. Soon, we hope to allow applications to define their own tokenizers (we in fact already have a generic tokenizer mechanism in our code; we just have yet to expose it to the outside world). Performance There are two steps you can take to greatly improve performance when inserting documents into a full-text index. First, you can set the synchronous pragma to OFF: sqlite>pragma synchronous = off;
In our testing we've found that this dramatically increases indexing speed. Of course, you should study the SQLite documentation (see http://www.sqlite.org/pragma.html ) to understand the safety/speed tradeoff which this pragma controls and to determine the setting which is right for your application. Secondly, you can index more than one document per transaction. In our testing, we've found throughput to be best when we index at least 50 or so documents in each transaction; this dramatically improves performance over the one-document-per-transaction case. We're still in the process of assessing and improving performance. Internals Every full-text table contains a column with the same name as the table itself. This self-named column is used in multi-column queries as described above, and will appear in queries for * : sqlite> create virtual table foo using fts1(name, address);
This column's value is private to fts1. When querying a full-text table, you probably want to specify column names explicitly rather than using *, to avoid seeing this self-named column. When you create a full-text table, fts1 creates two backing tables which hold the table's contents and full-text index: sqlite> .tables
You should not access the backing tables directly; their format is internal to fts1 and is subject to change. Caveats Please note that the full-text database format is subject to change at any time. We are not planning to implement backward compatibility in updates in the near future, so new code releases may fail spectacularly with old databases. Of course this will change at some future point once our data structures become more stable. Missing features The full-text module is still in an early development phase. The following features are missing but hopefully coming soon:
Requests/ideas/suggestions
|