| 1533 | | role Uni { |
| 1534 | | our Uni multi chr( Uni $codepoint ) |
| 1535 | | our Uni multi ord( Uni $character ) |
| 1536 | | } |
| 1537 | | multi method ord( Str $string: ) is export |
| 1538 | | |
| 1539 | | These functions are available for purposes of backward compatibility. |
| 1540 | | C<chr> takes a C<Uni> and returns the exact same value with no change. |
| 1541 | | This is because, in Perl 6, a C<Uni> is both an integer codepoint when |
| 1542 | | numified and a single character when stringified. Thus, C<chr> is just: |
| 1543 | | |
| 1544 | | our Uni multi chr( Uni $codepoint) { $codepoint; } |
| 1545 | | |
| 1546 | | C<ord> is almost the same, but it also has a form that takes a string. |
| 1547 | | In a scalar context, the return value is the C<Uni> representing |
| 1548 | | the first codepoint in the string. In a list context, the return |
| 1549 | | value is the list of C<Uni>s representing the entire string. |
| 1550 | | |
| 1551 | | An integer can be passed to C<chr>, but it will automatically |
| 1552 | | be upgraded to a C<Uni> (by interpreting it as a Unicode codepoint). |
| 1553 | | |
| 1554 | | Be aware that the stringification of certain C<Uni>s will |
| 1555 | | fail because they have no stand-alone stringified interpretation. |
| 1556 | | Similarly, the creation of a C<Uni> from an integer might fail |
| 1557 | | due to the integer being out of range. If that |
| 1558 | | happens, an undefined C<Uni> is always returned. Similarly, |
| 1559 | | C<chr(undef)> or C<ord(undef)> will force the reutrn of an |
| 1560 | | undefined C<Uni>. |
| | 1536 | multi Char method chr( Int $grid: ) is export |
| | 1537 | multi Char sub chr( Int *@grid ) |
| | 1538 | multi Int method ord( Str $string: ) is export |
| | 1539 | |
| | 1540 | C<chr> takes zero or more integer grapheme ids and returns the |
| | 1541 | corresponding characters as a string. If any grapheme id is used |
| | 1542 | that represents a higher abstraction level than the current |
| | 1543 | lexical scope supports, that grapheme is converted to the |
| | 1544 | corresponding lower-level string of codepoints/bytes that would |
| | 1545 | be appropriate to the current context, just as any other Str |
| | 1546 | would be downgraded in context. |
| | 1547 | |
| | 1548 | C<ord> goes the other direction; it takes a string value and returns |
| | 1549 | character values as integers. In a scalar context, the return value |
| | 1550 | is the just the integer value of the first character in the string. In |
| | 1551 | a list context, the return value is the list of integers representing |
| | 1552 | the entire string. The definition of character is context dependent. |
| | 1553 | Normally it's a grapheme id, but under codepoints or bytes scopes, |
| | 1554 | the string is coerced to the appropriate low-level view and interpreted |
| | 1555 | as codepoints or bytes. Hence, under "use bytes" you will never see a |
| | 1556 | value larger than 256, and under "use codepoints" you will never see a |
| | 1557 | value larger than 0x10ffff. The only guarantee under "use graphemes" |
| | 1558 | (the default) is that the number returned will correspond to the |
| | 1559 | codepoint of the precomposed codepoint representing the grapheme, if |
| | 1560 | there is such a codepoint. Otherwise, the implementation is free to |
| | 1561 | return any unique id that larger than 0x10ffff. (The C<chr> function |
| | 1562 | will know how to backtranslate such ids properly to codepoints or |
| | 1563 | bytes in any context. Note that we are assuming that every codepoints |
| | 1564 | context knows its normalization preferences, and every bytes context |
| | 1565 | also knows its encoding preferences. (These are knowable in the |
| | 1566 | lexical scope via the $?NF and $?ENC compile-time constants).) |