Product Reviews
Lightning Searching for Text Data
Search your own system or use the dtSearch engine in your products.
dtSearch makes a fast search engine for text data that's been around
for over a decade now. I took a look at the Desktop member of their product
line, which also includes a network version, a version to add searching
to your web site, and a developer version of the search engine (more on
that later). To use dtSearch, you choose the data that you're interested
in and turn it loose to build its own index. Indexing about 1.5 gigabytes
of stuff, including lots of files, two huge Outlook stores, and a couple
of web sites, took just about exactly ten hours on a fast machine. The
index was nearly the size of the source data when the software finished
building it.
The difference, of course, is that finding things with the index is infinitely
faster than finding them without it. A search on "guinea fowl" on my desktop,
for example, pulls out 48 documents containing those silly birds from
the nearly 100,000 that I indexed in less than a second. The dtSearch
Desktop interface than allows browsing through the found documents, displaying
them in its own interface or letting you launch external viewers, with
the search text highlighted. Supported search options include Boolean,
stemmed, fuzzy, synonym, phonic, phrase, and "near" searches. dtSearch
can also search unindexed documents, though this slows it down substantially.
You can build multiple indexes and search them all at once with the FindPlus
feature, which also enables a desktop user to make use of a network index
for additional searching. This opens the possibility of distributed search
indexing. The program understands quite a few file formats, having no
trouble pulling information out of Word, Excel, Access, or Outlook files,
as well as common formats such as RTF or PDF. I looked for, but could
not find, a complete list of supported formats. You'll also want to use
some care in deciding what to index. By default, the indexer uses a list
of file extensions to decide what NOT to index, but the default list is
hardly complete -- it doesn't block any of the common video file formats,
for example. The result can be an index full of nonsense words made from
traipsing through binary file formats. You can extend the list of blocked
extensions yourself, supply your own list of specific extensions to index
while ignoring everything else (this is where the list of supported formats
would have come in handy), or organize your hard drive to keep documents
separated from other stuff.
I also took a look at the dtSearch engine from a programmer's point of
view. You can incorporate dtSearch's index and search technology within
your own application through either a C++ API or through supplied ActiveX
objects. Either way you have access to the entire range of indexing and
searching functionality. There are a variety of ways to license the engine,
including a single server license ($999), royalty-based licenses starting
at $2,500 or royalty-free licenses starting at $9,995. The sample code
that I looked at worked well from VB.
If you're buried in documents and need to find things quickly, and have
plenty of hard drive space, dtSearch Desktop offers a straightforward
interface and impressive speed. If you need wide-ranging search capabilities
in your own application, their Text Retrieval Engine package is definitely
worth considering.
About the Author
Mike Gunderloy, MCSE, MCSD, MCDBA, is a former MCP columnist and the author of numerous development books.