Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Filled out section on Full-Grid grading

...

Clustering documents, and then repurposing a few documents from each cluster as test searches, just gives a starting point for grading.  So searches born from a particular cluster of docs are given a default grade of "B" for all of their siblings left in the cluster, and a default grade of "F" for documents in all other clusters.  In this method it's obvious that there will be numerous rading errors.  Documents within a cluster likely more relevant to some of their siblings than others.  And at least a few of those documents are likely related to documents in other clusters.  This becomes even clearer if you recluster the documents, or use a different clustering algorithm.  One fix is that humans might still double check the work, perhaps scanning the default grades in some patches, and corrected where needed.  But this goes back to the M x N mathematics, although perhaps double checking grades is much faster than creating them from scratch.  And there could even be a means of tracking and then predicting which patches need the most attention.  But again all these deviate from "perfection".

...

The third optimization example mentioned above, using mutliple search engines, and checking down the results well past where the last relevant document was found, also has flaws.  For example, since many search engines share well known algorithms, they may tend to make the same mistakes.  Perhaps a highly relevant document uses an important term many times, spelled out as 2 words, but the test search combined the term into a single word.  So you might have "metadata" vs "meta data" or "doorbell" vs. "door bell".  Some modern engines can catch this, espcially if the test search uses the hyphenated form such as "meta-data".  This problem is compounded in other languages such as German, where this practice is very common.  Or perhaps the test searches were written using American English spellings, whereas a critical document was written by somebody in British English, so many words are a mistmatch, such as color vs. colour, check vs. cheque, etc.  Here again modern search engines often support using a thesaurus, but that brings with it other problems.  The thesaurus might not be turned on, or if it is turned on, it might actually harm the "precision" measurement (a common problem), or perhaps the open source engine doesn't have a a licensed up-to-date thesaurus for the subject matter being tested.  Or perhaps one engine makes one mistake, and the other 2 engines make a different mistake, but they all still miss some critical documents.  The point is that this optimization is very likely to miss some matches.

Other forms of Relevancy Assertions

...

Order vs. grade

Detla grading

Web index

domain disambiguation