This manual describes the C API of the Hspell Hebrew spellchecker. Please refer to hspell(1) for a
description of the Hspell project, its spelling standard, and how it works.
The hspell_init() function must be called first to initialize the Hspell library. It sets up some global
structures (see CAVEATS section) and then reads the necessary dictionary files (whose places are fixed
when the library is built). The 'dictp' parameter is a pointer to a structdict_radix* object, which is
modified to point to a newly allocated dictionary. A typical hspell_init() call therefore looks like
struct dict_radix *dict;
hspell_init(&dict, flags);
Note that the (struct dict_radix*) type is an opaque pointer - the library user has no access to the
separate fields in this structure.
The 'flags' parameter can contain a bitwise or'ing of several flags that modify Hspell's default
behavior; Turning on HSPELL_OPT_HE_SHEELA allows Hspell to recognize the interrogative He prefix (he ha-
she'ela). HSPELL_OPT_DEFAULT is a synonym for turning on no special flag, i.e., it evaluates to 0.
hspell_init() returns 0 on success, or negative numbers on errors. Currently, the only error is -1,
meaning the dictionary files could not be read.
The hspell_uninit() function undoes the effects of hspell_init(), freeing any memory that was allocated
during initialization.
The hspell_check_word() function checks whether a certain word is a correct Hebrew word (possibly with
prefix particles attached in a syntacticly-correct manner). 1 is returned if the word is correct, or 0 if
it is incorrect.
The 'word' parameter should be a single Hebrew word, in the iso8859-8 encoding, possibly containing the
ASCII quote or double-quote characters (signifying the geresh and gershayim used in Hebrew for
abbreviations, acronyms, and a few foreign sounds). If the calling programs works with other encodings,
it must convert the word to iso8859-8 first. In particular cp1255 (the MS-Windows Hebrew encoding)
extensions to iso8859-8 like niqqud characters, geresh or gershayim, are currently not recognized and
must be removed from the word prior to calling hspell_check_word().
Into the 'preflen' parameter, the function writes back the number of characters it recognized as a prefix
particle - the rest of the 'word' is a stand-alone word. Because Hebrew words typically can be read in
several different ways, this feature (of getting just one prefix from one possible reading) is usually
not very useful, and it is likely to be removed in a future version.
The hspell_enum_splits() function provides a way to get all possible splitting of the given 'word' into
an optional prefix particle and a stand-alone word. For each possible (and legal, as some words cannot
accept certain prefixes) split, a user-defined callback function is called. This callback function is
given the whole word, the length of the prefix, the stand-alone word, and a bitfield which describes what
types of words this prefix can get. Note that in some cases, a word beginning with the letter waw gets
this waw doubled before a prefix, so sometimes strlen(word)!=strlen(baseword)+preflen.
The hspell_trycorrect() tries to find a list of possible corrections for an incorrect word. Because in
Hebrew the word density is high (a random string of letters, especially if short, has a high probability
of being a correct word), this function attempts to try corrections based on the assumption of a spelling
error (replacement of letters that sound alike, missing or spurious immot qri'a), not typo (slipped
finger on the keyboard, etc.) - see also CAVEATS.
hspell_trycorrect() returns the correction list into a structure of type structcorlist. This structure
must be first allocated with a call to corlist_init() and subsequently freed with corlist_free(). The
corlist_n() macro returns the number of words held in an allocated corlist, and corlist_str() returns the
i'th word. Accordingly, here is an example usage of hspell_trycorrect():
struct corlist cl;
printf ("Found misspelled word %s. Possible corrections:\n", w);
corlist_init (&cl);
hspell_trycorrect (dict, w, &cl);
for (i=0; i<corlist_n(&cl); i++) {
printf ("%s\n", corlist_str(&cl, i));
}
The hspell_is_canonic_gimatria() function checks whether the given word is a canonic gimatria - i.e., the
proper way to write in gimatria the number it represents. The caller might want to accept canonic
gimatria as proper Hebrew words, even if hspell_check_word() previously reported such word to be a non-
existent word. hspell_is_canonic_gimatria() returns the number represented as gimatria in 'word' if it
is indeed proper gimatria (in canonic form), or 0 otherwise.
hspell_init() normally reads the dictionary files from a path compiled into the library. This makes sense
when the library's code and the dictionaries are distributed together, but in some scenarios the library
user might want to use the Hspell dictionaries that are already present on the system in an arbitrary
path. The function hspell_set_dictionary_path() can be used to set this path, and should be used before
calling hspell_init(). The given path is that of the word list, and other input files have that path
with an appended prefix. hspell_get_dictionary_path() can be used to find the current path. On many
installations, this defaults to "/usr/local/share/hspell/hebrew.wgz".