luceneSearch
Lucene Full Text Indexing
Index > Books > Declarative Development Guide > Reference Library > Accessor Listing > luceneSearch

Rate this page:
Really useful
Satisfactory
Not helpful
Confusing
Incorrect
Unsure
Extra comments:


Module

urn:org:ten60:netkernel:mod:lucene

Definition

Active URI Base

active:luceneSearch

Format

<instr>
  <type>luceneSearch</type>
  <operator>
    <luceneSearch>
      <index>ffcpl:/org/ten60/test/myIndex/</index>
      <query>red cat</query>
      <unique />
    </luceneSearch>
  </operator>
</instr>

Syntax

ArgumentRulesDescription
typeMandatory luceneSearch
operatorMandatory an operator document containing an index URI, and a search criteria
targetOptional a results document contain all the locations where matches where found

Purpose

Lucene is a full text indexing and searching technology written by Apache. Lucene provides low-level text indexing and searching facilities. This accessor adds a layer over Lucene to support indexing over the content XML documents preserving the xpath locations of the content. This approach allows content to be located down to the element level across multiple documents.

The luceneSearch accessor supports searching over a single lucene index.

The <unique/> tag in the operator document causes the search results to be filter for on the best match per indexed docId.

Query Syntax

By default the search looks for complete words in the text content of the document. Multiple words can be specified and these are 'OR'ed together (matches will all of them score highest). 'AND' can be used to only find all keywords.

Examples:

  • cow only find documents with the word cow mentioned
  • blue cow only find documents with the words cow or blue mentioned
  • blue AND cow only find documents with the words cow and blue mentioned
  • cow AND basis:/animal/name only find documents with the word cow in elements with the path /animal/name
  • docid:addressbook.xml only find matches in the document indexed under the id of addressbook.xml

This may not be the whole story- digging deeper into the lucene document may reveal more.

Search result document

Example result document:

<luceneQuery>
  <match>
    <basis>/root/name</basis>
    <xpath>/root/name[1]</xpath>
    <uri>ffcpl:/org/ten60/ura/lucene/test/doc1.xml</uri>
    <docid>doc1.xml</docid>
    <score>1.0</score>
  </match>
  <match>
    <basis>/root</basis>
    <xpath>/root</xpath>
    <uri>ffcpl:/org/ten60/ura/lucene/test/doc1.xml</uri>
    <docid>doc1.xml</docid>
    <score>0.53795576</score>
  </match>
</luceneQuery>

<basis> contains an a basis xpath expression that describes the effective element type. Multiple elements may have the same basis.
<xpath> contains an a xpath expression locates a unique single element with the document.
<uri> contains the uri of the originally indexed document
<docid> contains the id that the document was indexed under
<score> contains a scoring for the match normalized between one and zero. One being a perfect match. A match is lower if it is found within a larger body of text. A match is lower if not all of multiple keywords matched.

References

Apache Jakarta Lucene Homepage


(C) 2003, 1060 Research Limited
© 2003-2005, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.