Archives

2019

January

Sort by SKG/relatedness over high-cardinality fields

2018

September

Michael Gibney

Core Java developer, musician, person

Senior Application Developer

University of Pennsylvania Libraries

no post found

Sort by SKG/relatedness over high-cardinality fields

Jan 1, 2019

One of the most intriguing uses for the Solr JSON facet API’s SKG/relatedness function is for sorting facet buckets by relatedness; when sorting buckets by relatedness, the FacetFieldProcessor must calculate relatedness for every bucket. The current FacetFieldProcessorByArray implementation (as of Solr 7.6) uses a standard uninverted approach (either docValues or UnInvertedField) to calculate facet counts over the domain base docSet, and then uses that initial pass as a pre-filter for a second-pass, inverted approach of fetching docSets for each relevant term (i.

Lucene graph queries: potential applications

Sep 9, 2018

The previous two posts describe a longstanding limitation of Lucene graph queries, and describe an enhancement that attempts to address this limitation. This post enumerates some of the downstream possibilities that this enhancement could facilitate. Restores functionality for index-time multi-term synonyms This in turn allows synonym generation to incorporate contextual analysis, as opposed to query-time synonym expansion, which generally has much less context available. CJK text and orthographic variation CJK text indexing in particular could stand to benefit from support for indexed graphs and complete graph query support.

Complete graph query support in Lucene: candidate implementation

Sep 9, 2018

As described in the previous post, enhancements to various Lucene components have increasingly invalidated some assumptions in the SpanNearQuery graph query implementation, leading to buggy and/or unpredictable query behavior. This post presents a high-level overview of the approach used to implement a candidate fix for this bug. The fixes/enhancements described have been running in production for several months as the backend to the University of Pennsylvania Libraries’ main catalog interface.

Lucene graph query support is incomplete

Sep 9, 2018

SpanNearQuery implementation of graph queries is incomplete Lucene SpanNearQuery identifies valid paths through token-graphs that represent the content of each document. As the component concerned with discovering the “edges” linking query subclause “nodes”, SpanNearQuery is arguably the essential component of graph query in Lucene. But SpanNearQuery is not a complete graph query implementation; accurate matching depends on restrictive assumptions about the match-length variability of individual subclauses. For many common use cases (e.