Changeset 13790 for docs/Perl6/Spec

Show
Ignore:
Timestamp:
10/02/06 16:39:49 (2 years ago)
Author:
ajs
Message:

Unicode normalization as promised on IRC. This is not final, and may not even remain. Some determination should be made about comparioson and smart matching (do thye use normalization / do they compare normalized forms?)

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • docs/Perl6/Spec/Functions.pod

    r13789 r13790  
    10371037Performs a Unicode "titlecase" operation on the first character of the string. 
    10381038 
     1039=item uninorm 
     1040 
     1041 our Str multi method uninorm ( Str $string: Bool :$canonical = Bool::True, Bool :$recompose = Bool::False ) is export 
     1042 
     1043Performs a Unicode "normalization" operation on the string. This involves 
     1044decomposing the string into its most basic combining elements, and potentially 
     1045re-composing it. Full detail on the process of decomposing and 
     1046re-composing strings in a normalized form is covered in the Unicode 
     1047specification Sections 3.7, Decomposition and 3.11, 
     1048Canonical Ordering Behavior of the Unicode Standard, 4.0. 
     1049 
     1050Named parameters 
     1051affect the type of normalization. There are aliases that map to the 
     1052I<Unicode Standard Annex #15: Unicode Normalization Forms> document's 
     1053names for the various modes of normalization: 
     1054 
     1055 our Str multi method uninorm_NFD ( Str $string: ) is export { 
     1056   $string.uninorm(:cononical, :!recompose); 
     1057 } 
     1058 our Str multi mehtod uninorm_NFC ( Str $string: ) is export { 
     1059   $string.uninorm(:canonical, :recompose); 
     1060 } 
     1061 our Str multi method uninorm_NFKD ( Str $string: ) is export { 
     1062   $string.uninorm(:!canonical, :!recompose); 
     1063 } 
     1064 our Str multi method uninorm_NFKC ( Str $string: ) is export { 
     1065   $string.uninorm(:!canonical, :recompose); 
     1066 } 
     1067 
     1068Decomposing a string can be used to compare 
     1069Unicode strings in a binary form. Without decomposing first, two 
     1070Unicode strings may contain the same text, but not the same binary 
     1071data. The decomposition of a string is performed according to tables 
     1072in the Unicode standard, and should be compatible with decompositions 
     1073performed by any system. 
     1074 
     1075The C<:canonical> flag controls the use of "compatibility decompositions". 
     1076For example, in canonical mode, "fi" is left unaffected because it is 
     1077not a composition. However, in compatibility mode, it will be replaced 
     1078with "fi". Decomposed sequences will be ordered in a canonical way 
     1079in either mode. 
     1080 
     1081The C<:recompose> flag controls the re-composition of decomposed forms. 
     1082That is, a combining sequence will be re-composed into the canonical 
     1083composite where possible. 
     1084 
     1085These de-compositions and re-compositions are performed recursively, 
     1086until there is no further work to be done. 
    10391087 
    10401088=item capitalize 
     
    18901938a general laundry list, please separate messages by topic. 
    18911939 
     1940