Monday, February 20, 2012

NEAR syntax in Full-Text Search

In tuning our search engine which is running against SQL 2005, I'm trying to understand the "near" operator in my CONTAINSTABLE query. I'm doing a query like the following:

select * FROM CONTAINSTABLE(Catalog, *,
'FORMSOF(INFLECTIONAL,"class") OR
FORMSOF(INFLECTIONAL,"calendar") OR
("class" near "calendar")', 1000)

Entries that have class and calendar directly next to eachother are being ranked higher, as would be expected. But entries that have Class <word> calendar, are being ranked the exact same as they would be with just the two Inflectional ORs and no near syntax.

I thought the near syntax was supposed to be useful up to 50 words?

The near syntax is useful up to 50 words means if the two words are more than 50 words apart, the ranking score (of that OR clause) will be 0.

Ranking is rather complicated. In your example, the ranking will be affected by all 3 OR clauses. The ranking also depends on other factors, like document length, average document length of the corpus, frequency of the terms in the document and across corpus. So, word distance in the near clause may or may not be visible in your final ranking score.

No comments:

Post a Comment