In most languages, however, an accented letter only has a secondary difference from the unaccented version of that letter. Note: In some languages (such as Danish), certain accented letters are considered to be separate base characters. This is also called the level-2 strength. A secondary difference is ignored when there is a primary difference anywhere in the strings. Other differences between letters can also be considered secondary differences, depending on the language. Secondary Level: Accents in the characters are considered secondary differences (for example, “as” < “às” < “at”). Differences between base characters take precedence over secondary differences. That is, collation performs comparisons of base characters (primary differences) and diacritics (secondary differences). Collation performs comparisons up to secondary differences, such as diacritics. Strength 2 – Secondary level of comparison. This is also called the level-1 strength. For example, dictionaries are divided into different sections by base character. Primary Level: Typically, this is used to denote differences between base characters (for example, “a” < “b”). Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case. Strength 1 – Primary level of comparison. International Components for Unicode (ICU) International Components for Unicode (ICU) Standards:.The level of comparison to perform, which conforms to the ICU Comparison Levels. You can specify collation for a collection or a view, an index, or specific operations that support collation. CollationsĬollation allows users to specify language-specific rules for string comparison, such as rules for letter case and accent marks. This is the information that influenced our testing and decisions. Used a collation to retrieve the results in case-insensitive order.īelow is the information we used to make these implementation decisions along with sample data and queries to demonstrate how various settings change the results.Used a case-insensitive regular expression query.Used the MongoDB aggregation pipeline pattern with “allowDiskUse” to support large data sets, millions of items.To satisfy the use case presented, we did the following: ![]() The sort order of results should be case-insensitive.Ignore diacritics when matching data for example, searching for “e” should also match “é”.Use an “includes” type search instead of an exact match for example, searching for “e” should match both “e” and “yes”.Data should be found regardless of character “case” for example, searching for “e” should also match “E”.The property to search in the collection is string data.User wants to find data in a collection based on a user-entered string value.We had the following use case on a recent project using MongoDB (version 4.2): I have a sample set of data, queries, and results, along with some comments, to support these conclusions. ![]() ![]() This post will demonstrate the collation settings used and the results of those settings. Introduced in version 3.4, these MongoDB settings influence how data is selected and the order in which it is returned.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |