Changeset 19479 for docs/Perl6/Spec

Show
Ignore:
Timestamp:
01/14/08 21:44:39 (10 months ago)
Author:
lwall
Message:

[Functions] did away with the Uni type

Files:
1 modified

Legend:

Unmodified
Added
Removed
  • docs/Perl6/Spec/Functions.pod

    r19478 r19479  
    1414 Date:          12 Mar 2005 
    1515 Last Modified: 14 Jan 2008 
    16  Version:       17 
     16 Version:       18 
    1717 
    1818This document attempts to document the list of builtin functions in Perl 6. 
     
    9494=item Grapheme (language-independent graphemes) 
    9595 
    96 =item Codepoint or Uni (Unicode codepoints) 
     96=item Codepoint 
    9797 
    9898=item Byte 
     
    103103 
    104104The short name for C<Grapheme> is typically C<Char> since that's the 
    105 default Unicode level.  A grapheme is defined as a base C<Uni> plus 
    106 any subsequent "combining" C<Uni>s that apply to that base C<Uni>. 
     105default Unicode level.  A grapheme is defined as a base codepoint plus 
     106any subsequent "combining" codepoints that apply to that base codepoint. 
     107Graphemes are always assigned a unique integer id which, in the case of 
     108a grapheme that has a precomposed codepoint, happens to be the same as 
     109that codepoint. 
    107110 
    108111There is no short name for C<CharLingua> because the type is meaningless 
     
    15311534=item ord 
    15321535 
    1533  role Uni { 
    1534      our Uni multi chr( Uni $codepoint ) 
    1535      our Uni multi ord( Uni $character ) 
    1536  } 
    1537  multi method ord( Str $string: ) is export 
    1538  
    1539 These functions are available for purposes of backward compatibility. 
    1540 C<chr> takes a C<Uni> and returns the exact same value with no change. 
    1541 This is because, in Perl 6, a C<Uni> is both an integer codepoint when 
    1542 numified and a single character when stringified. Thus, C<chr> is just: 
    1543  
    1544  our Uni multi chr( Uni $codepoint) { $codepoint; } 
    1545  
    1546 C<ord> is almost the same, but it also has a form that takes a string. 
    1547 In a scalar context, the return value is the C<Uni> representing 
    1548 the first codepoint in the string. In a list context, the return 
    1549 value is the list of C<Uni>s representing the entire string. 
    1550  
    1551 An integer can be passed to C<chr>, but it will automatically 
    1552 be upgraded to a C<Uni> (by interpreting it as a Unicode codepoint). 
    1553  
    1554 Be aware that the stringification of certain C<Uni>s will 
    1555 fail because they have no stand-alone stringified interpretation. 
    1556 Similarly, the creation of a C<Uni> from an integer might fail 
    1557 due to the integer being out of range. If that 
    1558 happens, an undefined C<Uni> is always returned. Similarly, 
    1559 C<chr(undef)> or C<ord(undef)> will force the reutrn of an 
    1560 undefined C<Uni>. 
     1536 multi Char method chr( Int $grid: ) is export 
     1537 multi Char sub chr( Int *@grid ) 
     1538 multi Int method ord( Str $string: ) is export 
     1539 
     1540C<chr> takes zero or more integer grapheme ids and returns the 
     1541corresponding characters as a string.  If any grapheme id is used 
     1542that represents a higher abstraction level than the current 
     1543lexical scope supports, that grapheme is converted to the 
     1544corresponding lower-level string of codepoints/bytes that would 
     1545be appropriate to the current context, just as any other Str 
     1546would be downgraded in context. 
     1547 
     1548C<ord> goes the other direction; it takes a string value and returns 
     1549character values as integers.  In a scalar context, the return value 
     1550is the just the integer value of the first character in the string. In 
     1551a list context, the return value is the list of integers representing 
     1552the entire string.  The definition of character is context dependent. 
     1553Normally it's a grapheme id, but under codepoints or bytes scopes, 
     1554the string is coerced to the appropriate low-level view and interpreted 
     1555as codepoints or bytes.  Hence, under "use bytes" you will never see a 
     1556value larger than 256, and under "use codepoints" you will never see a 
     1557value larger than 0x10ffff.  The only guarantee under "use graphemes" 
     1558(the default) is that the number returned will correspond to the 
     1559codepoint of the precomposed codepoint representing the grapheme, if 
     1560there is such a codepoint.  Otherwise, the implementation is free to 
     1561return any unique id that larger than 0x10ffff.  (The C<chr> function 
     1562will know how to backtranslate such ids properly to codepoints or 
     1563bytes in any context.  Note that we are assuming that every codepoints 
     1564context knows its normalization preferences, and every bytes context 
     1565also knows its encoding preferences. (These are knowable in the 
     1566lexical scope via the $?NF and $?ENC compile-time constants).) 
    15611567 
    15621568=item list