The previous two posts describe a longstanding limitation of Lucene graph queries, and describe an enhancement that attempts to address this limitation. This post enumerates some of the downstream possibilities that this enhancement could facilitate.
Restores functionality for index-time multi-term synonyms This in turn allows synonym generation to incorporate contextual analysis, as opposed to query-time synonym expansion, which generally has much less context available.
CJK text and orthographic variation CJK text indexing in particular could stand to benefit from support for indexed graphs and complete graph query support.
As described in the previous post, enhancements to various Lucene components have increasingly invalidated some assumptions in the SpanNearQuery graph query implementation, leading to buggy and/or unpredictable query behavior. This post presents a high-level overview of the approach used to implement a candidate fix for this bug. The fixes/enhancements described have been running in production for several months as the backend to the University of Pennsylvania Libraries’ main catalog interface.
SpanNearQuery implementation of graph queries is incomplete Lucene SpanNearQuery identifies valid paths through token-graphs that represent the content of each document. As the component concerned with discovering the “edges” linking query subclause “nodes”, SpanNearQuery is arguably the essential component of graph query in Lucene. But SpanNearQuery is not a complete graph query implementation; accurate matching depends on restrictive assumptions about the match-length variability of individual subclauses.
For many common use cases (e.