Library:Apache Lucene 2.0.0  
Package
org.apache.lucene.search.spans
Overview
Members
Books
SinceNot specified.
VersionNot specified.
AuthorNot specified.
The calculus of spans.

A span is a <doc,startPosition,endPosition> tuple.

The following span query operators are implemented:

  • A SpanTermQuery matches all spans containing a particular Term.
  • A SpanNearQuery matches spans which occur near one another, and can be used to implement things like phrase search (when constructed from SpanTermQueries) and inter-phrase proximity (when constructed from other SpanNearQueries).
  • A SpanOrQuery merges spans from a number of other SpanQueries.
  • A SpanNotQuery removes spans matching one SpanQuery which overlap another. This can be used, e.g., to implement within-paragraph search.
  • A SpanFirstQuery matches spans matching q whose end position is less than n. This can be used to constrain matches to the first part of the document.
In all cases, output spans are minimally inclusive. In other words, a span formed by matching a span in x and y starts at the lesser of the two starts and ends at the greater of the two ends.

For example, a span query which matches "John Kerry" within ten words of "George Bush" within the first 100 words of the document could be constructed with:

SpanQuery john   = new SpanTermQuery(new Term("content", "john"));
SpanQuery kerry  = new SpanTermQuery(new Term("content", "kerry"));
SpanQuery george = new SpanTermQuery(new Term("content", "george"));
SpanQuery bush   = new SpanTermQuery(new Term("content", "bush"));

SpanQuery johnKerry =
   new SpanNearQuery(new SpanQuery[] {john, kerry}, 0, true);

SpanQuery georgeBush =
   new SpanNearQuery(new SpanQuery[] {george, bush}, 0, true);

SpanQuery johnKerryNearGeorgeBush =
   new SpanNearQuery(new SpanQuery[] {johnKerry, georgeBush}, 10, false);

SpanQuery johnKerryNearGeorgeBushAtStart =
   new SpanFirstQuery(johnKerryNearGeorgeBush, 100);

Span queries may be freely intermixed with other Lucene queries. So, for example, the above query can be restricted to documents which also use the word "iraq" with:

Query query = new BooleanQuery();
query.add(johnKerryNearGeorgeBushAtStart, true, false);
query.add(new TermQuery("content", "iraq"), true, false);
Wiki javadoc Use textile entry format.
Add your comments here.