Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

  • If the search corpus contains "bald" Latin, simply use SerbianNormalizationFilterFactory.
  • If the search corpus has only Cyrillic or regular Latin text, and the users can be expected to enter Cyrillic or regular Latin, use SerbianNormalizationFilterFactory with the parameter haircut="regular".
  • If the search corpus has only Cyrillic or regular Latin text, but users can be expected to search with "bald" Latin, there are two solutions:
    • To simply use SerbianNormalizationFilterFactory with slightly worse results.
    • Wiki Markup
      To use two indices: one index should use {{SerbianNormalizationFilterFactory}} and the other should use {{SerbianNormalizationFilterFactory}} with {{haircut="regular"}} (you can use [copyField|SchemaXml#Copy_Fields] directive to copy from one to the other). Then, if a user enters a query that contains a Cyrillic letter or any of 'č', 'ć', 'š', 'ž' or 'đ' (regexp: {{\[aбвгдђежзијклљмнњопрстћуфхцчџшčćđšž\]}}), search only the regular index; otherwise (the query might be "bald"), search the "bald" index.
      \\

Background

Serbian language is specific in that it uses two alphabets, Cyrillic and Latin; while Cyrillic alphabet is considered the primary, Latin alphabet is also common. Texts might contain both alphabets, users might enter queries in both alphabets, so it is important to be able to search both at the same time.

...