HsHyperEstraier-0.1: HyperEstraier binding for HaskellContentsIndex
Text.HyperEstraier.Database
Contents
Types
Opening and closing databases
Manipulating database
Getting documents in and out
Statistics of databases
Searching for documents
Description
An interface to functions to manipulate databases.
Synopsis
data Database
data EstError
= InvalidArgument
| AccessForbidden
| LockFailure
| DatabaseProblem
| IOProblem
| NoSuchItem
| MiscError
data AttrIndexType
= SeqIndex
| StrIndex
| NumIndex
data OptimizeOption
= NoPurge
| NoDBOptimize
data RemoveOption = CleaningRemove
data PutOption
= CleaningPut
| WeightStatically
data GetOption
= NoAttributes
| NoText
| NoKeywords
data OpenMode
= Reader [ReaderOption]
| Writer [WriterOption]
data ReaderOption = ReadLock LockingMode
data WriterOption
= Create [CreateOption]
| Truncate [CreateOption]
| WriteLock LockingMode
data LockingMode
= NoLock
| NonblockingLock
data CreateOption
= Analysis AnalysisOption
| Index IndexTuning
| Score [ScoreOption]
data AnalysisOption
= PerfectNGram
| CharCategory
data IndexTuning
= Small
| Large
| Huge
| Huge2
| Huge3
data ScoreOption
= Nullified
| StoredAsInt
| OnlyToBeStored
withDatabase :: FilePath -> OpenMode -> (Database -> IO a) -> IO a
openDatabase :: FilePath -> OpenMode -> IO (Either EstError Database)
closeDatabase :: Database -> IO ()
addAttrIndex :: Database -> String -> AttrIndexType -> IO ()
flushDatabase :: Database -> Int -> IO ()
syncDatabase :: Database -> IO ()
optimizeDatabase :: Database -> [OptimizeOption] -> IO ()
mergeDatabase :: Database -> FilePath -> [RemoveOption] -> IO ()
setCacheSize :: Database -> Int -> Int -> Int -> Int -> IO ()
putDocument :: Database -> Document -> [PutOption] -> IO ()
removeDocument :: Database -> DocumentID -> [RemoveOption] -> IO ()
updateDocAttrs :: Database -> Document -> IO ()
getDocument :: Database -> DocumentID -> [GetOption] -> IO Document
getDocAttr :: Database -> DocumentID -> String -> IO (Maybe String)
getDocIdByURI :: Database -> URI -> IO DocumentID
getDatabaseName :: Database -> IO String
getNumOfDocs :: Database -> IO Int
getNumOfWords :: Database -> IO Int
getDatabaseSize :: Database -> IO Integer
hasFatalError :: Database -> IO Bool
searchDatabase :: Database -> Condition -> IO [DocumentID]
searchDatabase' :: Database -> Condition -> IO ([DocumentID], [(String, Int)])
metaSearch :: [Database] -> Condition -> IO [(Database, DocumentID)]
metaSearch' :: [Database] -> Condition -> IO ([(Database, DocumentID)], [(String, Int)])
scanDocument :: Database -> Document -> Condition -> IO Bool
Types
data Database
Database is an opaque object representing a HyperEstraier database.
data EstError
EstError represents an error occured on various operations. It is usually thrown as a DynException.
Constructors
InvalidArgumentAn argument passed to the function was invalid.
AccessForbiddenThe operation is forbidden.
LockFailureFailed to lock the database.
DatabaseProblemThe database has a problem.
IOProblemAn I/O operation failed.
NoSuchItemAn object you specified does not exist.
MiscErrorErrors for other reasons.
show/hide Instances
Eq EstError
Show EstError
Typeable EstError
data AttrIndexType
AttrIndexType represents an index type for an attribute.
Constructors
SeqIndexMap from a document ID to an attribute value. This type of index increses the efficiency of, say, getDocAttr.
StrIndexMap from an attribute value to a document ID. This increases the search speed when you search for documents by an attribute value.
NumIndexThis is similar to StrIndex but for attributes whose value is a number.
show/hide Instances
data OptimizeOption
OptimizeOption is an option for the optimizeDatabase action.
Constructors
NoPurgeOmit the process which purges garbages of removed documents.
NoDBOptimizeOmit the process which optimizes the database file.
show/hide Instances
data RemoveOption
RemoveOption is an option for the mergeDatabase action and the removeDocument action.
Constructors
CleaningRemoveClean up the region in the database where the removed documents were placed.
show/hide Instances
data PutOption
PutOption is an option for the putDocument action.
Constructors
CleaningPutIf the new document overwrites an old one, clean up the region in the database where the old document were placed.
WeightStaticallyStatically apply the "@weight" attribute of the document.
show/hide Instances
data GetOption
GetOption is an option for the getDocument action.
Constructors
NoAttributesDon't retrieve the attributes of the document.
NoTextDon't retrieve the body of the document.
NoKeywordsDon't retrieve the keywords of the document.
show/hide Instances
data OpenMode
OpenMode represents how to open a database.
Constructors
Reader [ReaderOption]Open the database with read-only mode. You can specify ReaderOption to modify the behavior of the database.
Writer [WriterOption]Open the database with writable mode. You can specify WriterOption to modify the behavior of the database.
show/hide Instances
data ReaderOption
ReaderOption is an option for the Reader constructor.
Constructors
ReadLock LockingModeSpecify how to lock the database.
show/hide Instances
data WriterOption
WriterOption is an option for the Writer constructor.
Constructors
Create [CreateOption]Create a database if an old one doesn't exist. You can specify CreateOption to modify the behavior of the database.
Truncate [CreateOption]Always create a new database even if an old one already exists. You can specify CreateOption to modify the behavior of the database.
WriteLock LockingModeSpecify how to lock the database.
show/hide Instances
data LockingMode
LockingMode represents how to lock the database.
Constructors
NoLockDo no exclusive access control at all. This option is very unsafe.
NonblockingLockDo non-blocking lock. (The author of this module doesn't know what happens if this option is in effect. See the manual and the source code of HyperEstraier and QDBM.)
show/hide Instances
data CreateOption
CreateOption is an option for the Create constructor.
Constructors
Analysis AnalysisOptionSpecify the word analysis method.
Index IndexTuningSpecify the prospective size of the database.
Score [ScoreOption]Specify how to handle scores of the documents.
show/hide Instances
data AnalysisOption
AnalysisOption is an option for the Analysis constructor.
Constructors
PerfectNGramUse the perfect N-gram analyzer.
CharCategoryUse the character category analyzer.
show/hide Instances
data IndexTuning
IndexTuning is an option for the Index constructor.
Constructors
SmallPredict the database will have less than 50,000 documents.
LargePredict the database will have less than 300,000 documents.
HugePredict the database will have less than 1,000,000 documents.
Huge2Predict the database will have less than 5,000,000 documents.
Huge3Predict the database will have more than 10,000,000 documents.
show/hide Instances
data ScoreOption
ScoreOption is an option for the Score constructor.
Constructors
NullifiedNullify anything about the score of documents.
StoredAsIntStore the scores for documents into the database as 32-bit integer.
OnlyToBeStoredStore the scores for documents into the database but don't use them during the search operation.
show/hide Instances
Opening and closing databases
withDatabase :: FilePath -> OpenMode -> (Database -> IO a) -> IO a
withDatabase fpath mode f opens a database at fpath and compute f. When the action f finishes or throws an exception, the database will be closed automatically. If withDatabase fails to open the database, it throws an EstError. See openDatabase.
openDatabase :: FilePath -> OpenMode -> IO (Either EstError Database)

openDatabase fpath mode opens a database at fpath. If it succeeds it returns Right Database, otherwise it returns Left EstError.

The Database can be shared by multiple threads, but there is one important limitation in the current implementation of the HyperEstraier itself. A single process can NOT open the same database twice simultaneously. Such attempt results in AccessForbidden.

closeDatabase :: Database -> IO ()
closeDatabase db closes the database db. If the db has already been closed, this operation causes nothing.
Manipulating database
addAttrIndex :: Database -> String -> AttrIndexType -> IO ()
addAttrIndex db attr idxType creates an index of type idxType for attribute attr into the database db.
flushDatabase :: Database -> Int -> IO ()
flushDatabase db numWords flushes at most numWords index words in the cache of the database db. If numWords <= 0 all the index words will be flushed.
syncDatabase :: Database -> IO ()
Synchronize a database to the disk.
optimizeDatabase :: Database -> [OptimizeOption] -> IO ()
Optimize a database.
mergeDatabase :: Database -> FilePath -> [RemoveOption] -> IO ()
mergeDatabase db fpath opts merges another database at fpath (source) to the db (destination). The flags of the two databases must be the same. If any documents in the source database have the same URI as the documents in the destination, those documents in the destination will be overwritten.
setCacheSize
:: DatabaseThe database.
-> IntMaximum size of the index cache. (default: 64 MiB)
-> IntMaximum records of cached attributes. (default: 8192 records)
-> IntMaximum number of cached document text. (default: 1024 documents)
-> IntMaximum number of the cached search results. (default: 256 records)
-> IO ()
Change the size of various caches of a database. Passing negative values leaves the old values unchanged.
Getting documents in and out
putDocument :: Database -> Document -> [PutOption] -> IO ()
Put a document into a database. The document must have an "@uri" attribute. If the database already has a document whose URI is the same as of the new document, the old one will be overwritten. See setURI and updateDocAttrs.
removeDocument :: Database -> DocumentID -> [RemoveOption] -> IO ()
Remove a document from a database.
updateDocAttrs :: Database -> Document -> IO ()
Update attributes of a document in a database. The document to be updated is determined by the document ID. It is an error to change the URI of the document to be the same as of one of existing documents. Note that the document body will not be updated. See putDocument.
getDocument :: Database -> DocumentID -> [GetOption] -> IO Document
Find a document in a database by an ID.
getDocAttr :: Database -> DocumentID -> String -> IO (Maybe String)
Get an attribute of a document in a database.
getDocIdByURI :: Database -> URI -> IO DocumentID
Find a document in a database by an URI and return its ID.
Statistics of databases
getDatabaseName :: Database -> IO String
Get the name of a database.
getNumOfDocs :: Database -> IO Int
Get the number of documents in a database.
getNumOfWords :: Database -> IO Int
Get the number of words in a database.
getDatabaseSize :: Database -> IO Integer
Get the size of a database.
hasFatalError :: Database -> IO Bool
Return True iff the document has a fatal error.
Searching for documents
searchDatabase :: Database -> Condition -> IO [DocumentID]
Search for documents in a database by a condition.
searchDatabase' :: Database -> Condition -> IO ([DocumentID], [(String, Int)])
Search for documents in a database by a condition. The second item of the resulting tuple is a map from each search words to the number of documents which are matched to the word.
metaSearch :: [Database] -> Condition -> IO [(Database, DocumentID)]
Search for documents in many databases at once.
metaSearch' :: [Database] -> Condition -> IO ([(Database, DocumentID)], [(String, Int)])
Search for documents in many databases at once. The second item of the resulting tuple is a map from each search words to the number of documents which are matched to the word.
scanDocument :: Database -> Document -> Condition -> IO Bool

Check if a document matches to every phrases in a condition.

To be honest with you, the author of this binding doesn't really know what est_db_scan_doc() does. Its documentation is way too ambiguous across the board. Moreover, the names of symbols of the HyperEstraier are very badly named. Can you imagine what, say est_db_out_doc() does? How about the constant named ESTCONDSURE? The author got tired of examining the commentless source code over and over again to write this binding. Its functionality is awesome though...

Produced by Haddock version 0.8