| | 1039 | =item uninorm |
| | 1040 | |
| | 1041 | our Str multi method uninorm ( Str $string: Bool :$canonical = Bool::True, Bool :$recompose = Bool::False ) is export |
| | 1042 | |
| | 1043 | Performs a Unicode "normalization" operation on the string. This involves |
| | 1044 | decomposing the string into its most basic combining elements, and potentially |
| | 1045 | re-composing it. Full detail on the process of decomposing and |
| | 1046 | re-composing strings in a normalized form is covered in the Unicode |
| | 1047 | specification Sections 3.7, Decomposition and 3.11, |
| | 1048 | Canonical Ordering Behavior of the Unicode Standard, 4.0. |
| | 1049 | |
| | 1050 | Named parameters |
| | 1051 | affect the type of normalization. There are aliases that map to the |
| | 1052 | I<Unicode Standard Annex #15: Unicode Normalization Forms> document's |
| | 1053 | names for the various modes of normalization: |
| | 1054 | |
| | 1055 | our Str multi method uninorm_NFD ( Str $string: ) is export { |
| | 1056 | $string.uninorm(:cononical, :!recompose); |
| | 1057 | } |
| | 1058 | our Str multi mehtod uninorm_NFC ( Str $string: ) is export { |
| | 1059 | $string.uninorm(:canonical, :recompose); |
| | 1060 | } |
| | 1061 | our Str multi method uninorm_NFKD ( Str $string: ) is export { |
| | 1062 | $string.uninorm(:!canonical, :!recompose); |
| | 1063 | } |
| | 1064 | our Str multi method uninorm_NFKC ( Str $string: ) is export { |
| | 1065 | $string.uninorm(:!canonical, :recompose); |
| | 1066 | } |
| | 1067 | |
| | 1068 | Decomposing a string can be used to compare |
| | 1069 | Unicode strings in a binary form. Without decomposing first, two |
| | 1070 | Unicode strings may contain the same text, but not the same binary |
| | 1071 | data. The decomposition of a string is performed according to tables |
| | 1072 | in the Unicode standard, and should be compatible with decompositions |
| | 1073 | performed by any system. |
| | 1074 | |
| | 1075 | The C<:canonical> flag controls the use of "compatibility decompositions". |
| | 1076 | For example, in canonical mode, "fi" is left unaffected because it is |
| | 1077 | not a composition. However, in compatibility mode, it will be replaced |
| | 1078 | with "fi". Decomposed sequences will be ordered in a canonical way |
| | 1079 | in either mode. |
| | 1080 | |
| | 1081 | The C<:recompose> flag controls the re-composition of decomposed forms. |
| | 1082 | That is, a combining sequence will be re-composed into the canonical |
| | 1083 | composite where possible. |
| | 1084 | |
| | 1085 | These de-compositions and re-compositions are performed recursively, |
| | 1086 | until there is no further work to be done. |