ra-index and ra-retrieve make up the Savant search engine, an information retrieval engine designed as a
back-end for the Remembrance Agent (RA). Given a collection of the user's accumulated email, usenet news
articles, papers, saved HTML files and other text notes, the RA attempts to find those documents which
are most relevant to the user's current context. That is, it searches this collection of text for the
documents which bear the highest word-for-word similarity to the text the user is currently editing, in
the hope that they will also bear high conceptual similarity and thus be useful to the user's current
work. With the Emacs front-end, these suggestions are continuously displayed in a small buffer at the
bottom of the user's window. If a suggestion looks useful, the full text can be retrieved with a single
command.
The Remembrance Agent works in two stages. First, the user's collection of text documents is indexed
into a database saved in a vector format. After the database is created, the other stage of the
Remembrance Agent is run from emacs, where it periodically takes a sample of text from the working buffer
and finds those documents from the collection that are most similar. It summarizes the top documents in
a small emacs window and allows you to retrieve the entire text of any one with a keystroke. See the
README file for information on using the Emacs front-end.
The RA is primarily designed as a proactive information provider that continually gives you information
that might be relevant to your current environment. In this mode, ra-retrieve is run by a front end and
its output is parsed into a more human-readable format. However, Savant can also be used as a standard
text and information retrieval search engine.
USAGE
The one argument to ra-retrieve is <base-dir>, which is the directory containing the index files created
by ra-index. This starts an interactive process that handles queries and returns documents that most
match that query. When running with the -v argument, a menu and other information is printed. Without
the -v option it is assumed that ra-retrieve has been run from a front-end, and only minimal information
is printed. The following commands are available:
query<num-lines>
Find the <num-lines> most relevant documents to a query. Default is 5. Enter the text of the
query, followed by a ^D (ASCII 04). If the query matches a predefined template then fields are
parsed and separately. For example, in emacs mail mode the from, subject, date and body of the
message are all individually parsed. If no template matches, the query is assumed to be plain
text.
A query will output up to n summary lines followed by a period on a line by itself. Each summary
line will contain a line number, relevance number, document number, and a series of fields
describing the document. The final field is a comma-separated list of words from the query that
most contributed to this document being chosen.
retrieve<documentnumber>
Retrieve and print the document with the given document number. Document number is the third
field outputed by the query. The full text of the document is displayed. If the document is a
part of a larger file, such as in an email in an archive file, only that one document is shown.
loc-retrieve<documentnumber>
Retrieve and print the location of the document with the given document number. Three values are
displayed, each on their own separate line. The first is the character offset to where the
beginning of the document is found. The second is the character offset for the end of the
document. The third line contains the fully expanded filename for the document itself. This is
primarily so front-ends can load the document and display them with their own formatting.
info Display version and database info.
quit Quit.
? Display menu.
OPTIONS-v Verbose mode. Print a menu and other info for running without a front-end.
-d Debug mode. Print not-so-useful information.
[--docnum<docnum>]
Print the contents of the specified document number and exit. This option doesn't use as much
memory as interactive mode, and is useful for scripts that call this program.