libextractor - meta-information extraction library 1.0.0
Contents
Availability
You can obtain the original author's latest version from http://www.gnu.org/software/libextractor/.
GNU libextractor 1.0.0 Sept 4, 2012 LIBEXTRACTOR(3)
Bugs
A couple of file-formats (on the order of 10^3) are not recognized...
Description
GNU libextractor is a simple library for keyword extraction. libextractor does not support all formats
but supports a simple plugging mechanism such that you can quickly add extractors for additional formats,
even without recompiling libextractor. libextractor typically ships with dozens of plugins that can be
used to obtain meta data from common file-types. If you want to write your own plugin for some filetype,
all you need to do is write a little library that implements a single method with this signature:
voidEXTRACTOR_XXX_extract_method(structEXTRACTOR_ExtractContext*ec);ec contains function pointers for reading, seeking, getting the overall file size and returning meta
data. There is also a field with options for the plugin. New plugins will be automatically located and
used once they are installed in the respective directory (typically something like
/usr/lib/libextractor/).
The application extract gives an example how to use libextractor.
The basic use of libextractor is to load the plugins (for example with EXTRACTOR_plugin_add_defaults),
then to extract the keyword list using EXTRACTOR_extract, and finally unloading the plugins (with
EXTRACTOR_plugin_remove_all).
Textual meta data obtained from libextractor is supposed to be UTF-8 encoded if the text encoding is
known. Plugins are supposed to convert meta-data to UTF-8 if necessary. The EXTRACTOR_meta_data_print
function converts the UTF-8 keywords to the character set from the current locale before printing them.
Legal Notice
libextractor is released under the GPL and a GNU package (http://www.gnu.org/).
Name
libextractor - meta-information extraction library 1.0.0
See Also
extract(1)
Synopsis
#include<extractor.h>constchar*EXTRACTOR_metatype_to_string(enumEXTRACTOR_MetaTypetype);constchar*EXTRACTOR_metatype_to_description(enumEXTRACTOR_MetaTypetype);enumEXTRACTOR_MetaTypeEXTRACTOR_metatype_get_max(void);structEXTRACTOR_PluginList*EXTRACTOR_plugin_add_defaults(enumEXTRACTOR_Optionsflags);structEXTRACTOR_PluginList*EXTRACTOR_plugin_add(structEXTRACTOR_PluginList*prev,constchar*library,constchar*options,enumEXTRACTOR_Optionsflags);structEXTRACTOR_PluginList*EXTRACTOR_plugin_add_last(structEXTRACTOR_PluginList*prev,constchar*library,constchar*options,enumEXTRACTOR_Optionsflags);structEXTRACTOR_PluginList*EXTRACTOR_plugin_add_config(structEXTRACTOR_PluginList*prev,constchar*config,enumEXTRACTOR_Optionsflags);structEXTRACTOR_PluginList*EXTRACTOR_plugin_remove(structEXTRACTOR_PluginList*prev,constchar*library);voidEXTRACTOR_plugin_remove_all(structEXTRACTOR_PluginList*plugins);voidEXTRACTOR_extract(structEXTRACTOR_PluginList*plugins,constchar*filename,constvoid*data,size_tsize,EXTRACTOR_MetaDataProcessorproc,void*proc_cls);intEXTRACTOR_meta_data_print(void*handle,constchar*plugin_name,enumEXTRACTOR_MetaTypetype,enumEXTRACTOR_MetaFormatformat,constchar*data_mime_type,constchar*data,size_tdata_len);EXTRACTOR_VERSION
