Changeset 10050
- Timestamp:
- 04/21/06 17:39:01 (3 years ago)
- Location:
- docs/Perl6
- Files:
-
- 2 added
- 5 modified
-
Makefile.PL (modified) (2 diffs)
-
Spec/Functions.pod (added)
-
Spec/Object.pod (modified) (2 diffs)
-
Spec/Rule.pod (modified) (109 diffs)
-
Spec/Subroutine.pod (modified) (5 diffs)
-
Spec/Syntax.pod (modified) (3 diffs)
-
Spec/Threads.pod (added)
Legend:
- Unmodified
- Added
- Removed
-
docs/Perl6/Makefile.PL
r9216 r10050 2 2 use lib "../..", "../../inc"; 3 3 use inc::Module::Install prefix => '../../inc'; 4 use File::Path; 4 5 5 6 name ('Perl6-Doc'); … … 7 8 license ('perl'); 8 9 10 11 my $svn_path = "/usr/bin/svn"; 12 my $location = "http://svn.perl.org/perl6/doc/trunk/design/syn"; 13 my $synopsis_svn = "synopsis-svn"; 14 15 my %commands = ( 'checkout' => "$svn_path co $location $synopsis_svn", 16 'update' => "$svn_path update $synopsis_svn" 17 ); 18 19 my %table = ( 'S01' => 'Overview', 20 'S02' => 'Syntax', 21 'S03' => 'Operator', 22 'S04' => 'Block', 23 'S05' => 'Rule', 24 'S06' => 'Subroutine', 25 # 'S07' => '', 26 # 'S08' => '', 27 'S09' => 'Structure', 28 'S10' => 'Package', 29 'S11' => 'Module', 30 'S12' => 'Object', 31 'S13' => 'Overload', 32 'S17' => 'Threads', 33 'S29' => 'Functions' 34 ); 35 36 print "Checkout the newest Synopsis from $location\n"; 37 if ( -e $svn_path ) { 38 if( -d $synopsis_svn && ! -f $synopsis_svn) { 39 # system( $commands{'update'} ); 40 } else { 41 system( $commands{'checkout'} ); 42 } 43 } 44 print "Moving newest synopsis to Spec\n"; 45 for (keys %table) { 46 rename "$synopsis_svn/$_.pod", "Spec/$table{$_}.pod"; 47 } 48 print "Removing temporary download directory.\n"; 49 rmtree $synopsis_svn, 0, 0; 50 51 9 52 install_script( 'p6doc' ); 10 53 makemaker_args( PMLIBDIRS => [ grep { -d } glob("[A-Z]*") ]); -
docs/Perl6/Spec/Object.pod
r9989 r10050 222 222 .doit () # ILLEGAL (two terms in a row) 223 223 .doit.() # okay, no arguments, same as .doit() 224 .doit .() # okay, no arguments, same as .doit()224 .doit. .() # okay, no arguments, same as .doit() (long dot form) 225 225 226 226 However, you can turn any of the legal forms above into a list … … 231 231 .doit (): 1,2,3 # ILLEGAL (two terms in a row) 232 232 .doit.(1): 2,3 # okay, same as .doit(1,2,3) 233 .doit .(1,2): 3 # okay, same as .doit(1,2,3)233 .doit. .(1,2): 3 # okay, same as .doit(1,2,3) 234 234 235 235 In particular, this allows us to pass a closure in addition to the -
docs/Perl6/Spec/Rule.pod
r9989 r10050 12 12 =head1 VERSION 13 13 14 Maintainer: Patrick Michaud <pmichaud@pobox.com> 14 Maintainer: Patrick Michaud <pmichaud@pobox.com> and 15 Larry Wall <larry@wall.org> 15 16 Date: 24 Jun 2002 16 Last Modified: 6Apr 200617 Last Modified: 20 Apr 2006 17 18 Number: 5 18 Version: 1 519 Version: 18 19 20 20 21 This document summarizes Apocalypse 5, which is about the new regex 21 syntax. We now try to call them "rules"because they haven't been22 regular expressions for a long time. (The term "regex" is still23 a cceptable.)22 syntax. We now try to call them I<regex> because they haven't been 23 regular expressions for a long time. When referring to their use in 24 a grammar, the term I<rule> is preferred. 24 25 25 26 =head1 New match state and capture variables … … 31 32 C<$1>, etc.) are just elements of C<$/>. 32 33 33 By the way, the numbered capture variables now start at C<$0> , C<$1>,34 C<$ 2>, etc.See below.34 By the way, the numbered capture variables now start at C<$0> rather than 35 C<$1>. See below. 35 36 36 37 =head1 Unchanged syntactic features … … 69 70 70 71 The extended syntax (C</x>) is no longer required...it's the default. 72 (In fact, it's pretty much mandatory--the only way to get back to 73 the old syntax is with the C<:Perl5>/C<:P5> modifier.) 71 74 72 75 =item * … … 79 82 There is no C</e> evaluation modifier on substitutions; instead use: 80 83 81 s/pattern/{ code() }/ 84 s/pattern/{ doit() }/ 85 86 Instead of C</ee> say: 87 88 s/pattern/{ eval doit() }/ 82 89 83 90 =item * … … 88 95 89 96 Every modifier must start with its own colon. The delimiter must be 90 separated from the final modifier by a colon or whitespace if it would 91 be taken as an argument to the preceding modifier. 97 separated from the final modifier by whitespace if it would be taken 98 as an argument to the preceding modifier (which is true for any 99 bracketing character). 92 100 93 101 =item * … … 120 128 Since this is implicitly anchored to the position, it's suitable for 121 129 building parsers and lexers. The pattern you supply to a Perl macro's 122 "is parsed"trait has an implicit C<:p> modifier.130 C<is parsed> trait has an implicit C<:p> modifier. 123 131 124 132 Note that … … 128 136 is roughly equivalent to 129 137 130 m:p/.*? pattern/ 131 132 =item * 133 134 The new C<:once> modifier replaces the Perl 5 C<?...?> syntax: 135 136 m:once/ pattern / # only matches first time 137 138 =item * 139 140 [Note: We're still not sure if :w is ultimately going to work exactly 141 as described below. But this is how it works for now.] 138 m:p/.*? <( pattern )> / 139 140 Also note that any regex called as a subrule is implicitly anchored to the 141 current position anyway. 142 143 =item * 142 144 143 145 The new C<:w> (C<:words>) modifier causes whitespace sequences to be … … 164 166 C<< <?ws> >> can't decide what to do until it sees the data. It still does 165 167 the right thing. If not, define your own C<< <?ws> >> and C<:w> will use that. 168 169 In general you don't need to use C<:w> within grammars because 170 the parser rules automatically handle whitespace policy for you. 166 171 167 172 =item * … … 178 183 =item * 179 184 180 The new C<: perl5> modifier allows Perl 5 regex syntax to be used instead:181 182 m: perl5/(?mi)^[a-z]{1,2}(?=\s)/185 The new C<:Perl5> modifier allows Perl 5 regex syntax to be used instead: 186 187 m:Perl5/(?mi)^[a-z]{1,2}(?=\s)/ 183 188 184 189 (It does not go so far as to allow you to put your modifiers at … … 195 200 general form. So 196 201 197 s:4x { (<?ident>) = (\N+) $$}{$0 => $1};202 s:4x [ (<?ident>) = (\N+) $$] [$0 => $1]; 198 203 199 204 is the same as: 200 205 201 s:x(4) { (<?ident>) = (\N+) $$}{$0 => $1};206 s:x(4) [ (<?ident>) = (\N+) $$] [$0 => $1]; 202 207 203 208 which is almost the same as: 204 209 205 210 $_.pos = 0; 206 s:c { (<?ident>) = (\N+) $$}{$0 => $1}for 1..4;211 s:c [ (<?ident>) = (\N+) $$] [$0 => $1] for 1..4; 207 212 208 213 except that the string is unchanged unless all four matches are found. … … 231 236 =item * 232 237 233 With the new C<:ov> (C<:overlap>) modifier, the current r ulewill238 With the new C<:ov> (C<:overlap>) modifier, the current regex will 234 239 match at all possible character positions (including overlapping) 235 240 and return all matches in a list context, or a disjunction of matches … … 239 244 240 245 if $str ~~ m:overlap/ a (.*) a / { 241 @substrings = $/.matches(); # bracadabr cadabr dabr br242 } 243 244 =item * 245 246 With the new C<:ex> (C<:exhaustive>) modifier, the current r ulewill match246 @substrings = @;(); # bracadabr cadabr dabr br 247 } 248 249 =item * 250 251 With the new C<:ex> (C<:exhaustive>) modifier, the current regex will match 247 252 every possible way (including overlapping) and return all matches in a list 248 253 context, or a disjunction of matches in a scalar context. … … 251 256 252 257 if $str ~~ m:exhaustive/ a (.*) a / { 253 @substrings = $/.matches(); # br brac bracad bracadabr 254 # c cad cadabr d dabr br 255 } 256 257 258 =item * 259 260 The new C<:rw> modifier causes this rule to "claim" the current 258 say "@()"; # br brac bracad bracadabr c cad cadabr d dabr br 259 } 260 261 Note that the C<~~> above can return as soon as the first match is found, 262 and the rest of the matches may be performed lazily by C<@()>. 263 264 [Conjecture: the C<:exhaustive> modifier should have an optional argument 265 specifying how many seconds to run before giving up, since it's trivially 266 easy to ask for the heat death of the universe to happen first.] 267 268 =item * 269 270 The new C<:rw> modifier causes this regex to I<claim> the current 261 271 string for modification rather than assuming copy-on-write semantics. 262 272 All the bindings in C<$/> become lvalues into the string, such … … 269 279 =item * 270 280 271 The new C<:keepall> modifier causes this r uleand all invoked subrules281 The new C<:keepall> modifier causes this regex and all invoked subrules 272 282 to remember everything, even if the rules themselves don't ask for 273 283 their subrules to be remembered. This is for forcing a grammar that … … 276 286 =item * 277 287 278 The C<:i>, C<:w>, C<:perl5>, and Unicode-level modifiers can be 279 placed inside the rule (and are lexically scoped): 288 The new C<:ratchet> modifier causes this regex to not backtrack by default. 289 (Generally you do not use this modifier directly, since it's implied by 290 C<token> and C<rule> declarations.) The effect of this modifier is 291 to imply a C<:> after every construct that could backtrack, including 292 bare C<*>, C<+>, and C<?> quantifiers, as well as alternations. 293 294 =item * 295 296 The new C<:panic> modifier causes this regex and all invoked subrules 297 to try to backtrack on any rules that would otherwise default to 298 not backtracking because they have C<:ratchet> set. Never panic 299 unless you're desperate and want the pattern matcher to do a lot of 300 unnecessary work. If you have an error in your grammar, it's almost 301 certainly a bad idea to fix it by backtracking. 302 303 =item * 304 305 The C<:i>, C<:w>, C<:Perl5>, and Unicode-level modifiers can be 306 placed inside the regex (and are lexically scoped): 280 307 281 308 m/:w alignment = [:i left|right|cent[er|re]] / … … 298 325 299 326 m:fuzzy (pattern); 300 m:fuzzy:(pattern);301 327 302 328 or you'll end up with: … … 347 373 =item * 348 374 349 An unescaped C<#> now always introduces a comment. 375 An unescaped C<#> now always introduces a comment. If followed 376 by an opening bracket character (and if not in the first column), 377 it introduces an embedded comment that terminates with the closing 378 bracket. Otherwise the comment terminates at the newline. 350 379 351 380 =item * … … 367 396 =item * 368 397 369 C<.> matches an "anything", while C<\N> matches an "anything except370 newline ". (The C</s> modifier is gone.) In particular, C<\N> matches398 C<.> matches an I<anything>, while C<\N> matches an I<anything except 399 newline>. (The C</s> modifier is gone.) In particular, C<\N> matches 371 400 neither carriage return nor line feed. 372 401 … … 401 430 =item * 402 431 403 You can call Perl code as part of a r ulematch by using a closure.432 You can call Perl code as part of a regex match by using a closure. 404 433 Embedded code does not usually affect the match--it is only used 405 434 for side-effects: … … 424 453 with a corresponding C<**{...}?> for minimal matching. Space is 425 454 allowed on either side of the asterisks. The curlies are taken to 426 be a closure returning a number or a range.455 be a closure returning an Int or a Range object. 427 456 428 457 / value was (\d ** {1..6}?) with ([\w]**{$m..$n}) / … … 432 461 / [foo]**{1,3} / 433 462 434 (At least, it fails in the absence of "C<use rx :listquantifier>",463 (At least, it fails in the absence of C<use rx :listquantifier>, 435 464 which is likely to be unimplemented in Perl 6.0.0 anyway). 436 465 … … 439 468 a closure that must be run in the general case, so you can use 440 469 it to generate a range on the fly based on the earlier matching. 441 (Of course, bear in mind the closure isrun I<before> attempting to470 (Of course, bear in mind the closure must be run I<before> attempting to 442 471 match whatever it quantifies.) 443 472 444 473 =item * 445 474 446 C<< <...> >> are now extensible metasyntax delimiters or "assertions"475 C<< <...> >> are now extensible metasyntax delimiters or I<assertions> 447 476 (i.e. they replace Perl 5's crufty C<(?...)> syntax). 448 477 … … 455 484 =item * 456 485 457 In Perl 6 r ules, variables don't interpolate.458 459 =item * 460 461 Instead they're passed "raw" to the ruleengine, which can then decide486 In Perl 6 regexes, variables don't interpolate. 487 488 =item * 489 490 Instead they're passed I<raw> to the regex engine, which can then decide 462 491 how to handle them (more on that below). 463 492 … … 474 503 / \Q$var\E / 475 504 476 (To get rule interpolation use an assertion - see below) 505 However, if C<$var> contains a Regex object, rather attempting to 506 convert it to a string, it is called as a subrule, as if you said 507 C<< <$var> >>. (See assertions below.) This form does not capture, 508 and it fails if C<$var> is tainted. 477 509 478 510 =item * … … 487 519 488 520 489 As with a scalar variable, each element is matched as a literal. 521 As with a scalar variable, each element is matched as a literal 522 unless it happens to be a Regex object, in which case it is matched 523 as a subrule. As with scalar subrules, a tainted subrule always fails. 524 All values pay attention to the current C<:ignorecase> setting. 490 525 491 526 =item * … … 504 539 =item * 505 540 506 If it is a string or rule object, it is executed as a subrule. 507 508 =item * 509 510 If it has the value 1, nothing special happens beyond the match. 541 If the value is a string, it is matched literally, starting after where 542 the key left off matching. As a natural consequence, if the value is 543 C<"">, nothing special happens except that the key match succeeds. 544 545 =item * 546 547 If it is a Regex object, it is executed as a subrule, with an initial 548 position I<after> the matched key. As with scalar subrules, a tainted 549 subrule always fails, and no capture is attempted. 550 551 =item * 552 553 If the value is a number, the key is rematched ignoring any keys 554 longer than the number. (This is measured in the default Unicode 555 level in effect where the hash was declared, usually graphemes. If 556 the current Unicode level is lower, the results are as if the string 557 to be matched had been upconverted to the hash's Unicode level. If 558 the current Unicode level is higher, the results are undefined if the 559 string contains any characters whose interpretation would be changed 560 by the higher Unicode level, such as language-dependent ligatures.) 511 561 512 562 =item * … … 516 566 =back 517 567 568 All hash keys, and values that are strings, pay attention to the 569 C<:ignorecase> setting. (Subrules maintain their own case settings.) 570 518 571 =back 519 572 … … 524 577 =item * 525 578 526 The first character after C<< < >> determines the behavio ur of the assertion.579 The first character after C<< < >> determines the behavior of the assertion. 527 580 528 581 =item * … … 540 593 / <after pattern> / # was /(?<pattern)/ 541 594 542 / <ws> / # match whitespace by :w rules595 / <ws> / # match whitespace by :w policy 543 596 544 597 / <sp> / # match a space char … … 548 601 It is illegal to do lookbehind on a pattern that cannot be reversed. 549 602 603 Note: the effect of a forward-scanning lookbehind at the top level 604 can be achieved with: 605 606 / .*? prestuff <( mainpat )> / 607 550 608 =item * 551 609 … … 557 615 / <?ident> <?ws> / # nothing captured 558 616 559 =item * 560 561 A leading C<$> indicates an indirect rule. The variable must contain 562 either a hard reference to a rule, or a string containing the rule. 563 564 =item * 565 566 A leading C<::> indicates a symbolic indirect rule: 567 568 / <::($somename)> 569 570 The variable must contain the name of a rule. 571 572 =item * 573 574 A leading C<@> matches like a bare array except that each element 575 is treated as a rule (string or hard ref) rather than as a literal. 576 577 =item * 578 579 A leading C<%> matches like a bare hash except that each key 580 is treated as a rule (string or hard ref) rather than as a literal. 581 582 =item * 583 584 A leading C<{> indicates code that produces a rule to be interpolated 585 into the pattern at that point: 617 The non-capturing behavior may be overridden with a C<:keepall>. 618 619 =item * 620 621 A leading C<$> indicates an indirect subrule. The variable must contain 622 either a Regex object, or a string to be compiled as the regex. The 623 string is never matched literally. 624 625 By default C<< <$foo> >> is captured into C<< $<foo> >>, but you can 626 use the C<< <?$foo> >> form to suppress capture, and you can always say 627 C<< $<$foo> := <$foo> >> if you prefer to include the sigil in the key. 628 629 =item * 630 631 A leading C<::> indicates a symbolic indirect subrule: 632 633 / <::($somename)> / 634 635 The variable must contain the name of a subrule. By the rules of 636 single method dispatch this is first searched for in the current 637 grammar and its ancestors. If this search fails an attempt is made 638 to dispatch via MMD, in which case it can find subrules defined as 639 multis rather than methods. This form is not captured by default. 640 641 =item * 642 643 A leading C<@> matches like a bare array except that each element is 644 treated as a subrule (string or Regex object) rather than as a literal. 645 That is, a string is forced to be compiled as a subrule rather than 646 matched literally. (There is no difference for a Regex object.) 647 648 By default C<< <@foo> >> is captured into C<< $<foo> >>, but you can 649 use the C<< <?@foo> >> form to suppress capture, and you can always say 650 C<< $<@foo> := <@foo> >> if you prefer to include the sigil in the key. 651 652 =item * 653 654 A leading C<%> matches like a bare hash except that each value is 655 always treated as a subrule, even if it is a string that must be compiled 656 to a regex at match time. 657 658 By default C<< <%foo> >> is captured into C<< $<foo> >>, but you can 659 use the C<< <?%foo> >> form to suppress capture, and you can always say 660 C<< $<%foo> := <%foo> >> if you prefer to include the sigil in the key. 661 662 With both bare hash and hash in angles, the key is always skipped 663 over before calling any subrule in the value. That subrule may, however, 664 magically access the key anyway as if the subrule had started before the 665 key and matched with C<< <KEY> >> assertion. That is, C<< $<KEY> >> 666 will contain the keyword or token that this subrule was looked up under, 667 and that value will be returned by the current match object even if 668 you do nothing special with it within the match. (This also works 669 for the name of a macro as seen from an C<is parsed> regex, since 670 internally that turns into a hash lookup.) 671 672 As with bare hash, the longest key matches according to the venerable 673 I<longest token rule>, but in addition, you may combine multiple hashes 674 under the same longest-token consideration like this: 675 676 <%statement|%prefix|%term> 677 678 This means that, despite being in a later hash, C<< %term<food> >> 679 will be selected in preference to C<< %prefix<foo> >> because it's 680 the longer token. However, if there is a tie, the earlier hash wins, 681 so C<< %statement<if> >> hides any C<< %prefix<if> >> or C<< %term<if> >>. 682 683 In contrast, if you say 684 685 [ <%prefix> | <%term> ] 686 687 a C<< %prefix<foo> >> would be selected in preference to a C<< %term<food> >>. 688 (Which is not what you usually want if your language is to do longest-token 689 consistently.) 690 691 =item * 692 693 A leading C<{> indicates code that produces a regex to be interpolated 694 into the pattern at that point as a subrule: 586 695 587 696 / (<?ident>) <{ %cache{$0} //= get_body($0) }> / … … 590 699 591 700 As with an ordinary embedded closure, an B<explicit> return from a 592 r uleclosure binds the I<result object> for this match, ignores the593 rest of the current r ule, and reports success:594 595 / (\d) <{ return $0.sqrt }> NotReached /;701 regex closure binds the I<result object> for this match, ignores the 702 rest of the current regex, and reports success: 703 704 / (\d) <{ return $0.sqrt }> NotReached /; 596 705 597 706 This has the effect of capturing the square root of the numified string, … … 604 713 605 714 A leading C<&> interpolates the return value of a subroutine call as 606 a r ule. Hence715 a regex. Hence 607 716 608 717 <&foo()> … … 614 723 =item * 615 724 616 In any case of r uleinterpolation, if the value already happens to be617 a ruleobject, it is not recompiled. If it is a string, the compiled725 In any case of regex interpolation, if the value already happens to be 726 a Regex object, it is not recompiled. If it is a string, the compiled 618 727 form is cached with the string so that it is not recompiled next 619 728 time you use it unless the string changes. (Any external lexical … … 621 730 interpolated with unbalanced bracketing. An interpolated subrule 622 731 keeps its own inner C<$/>, so its parentheses never count toward the 623 outer r ules groupings. (In other words, parenthesis numbering is always732 outer regexes groupings. (In other words, parenthesis numbering is always 624 733 lexically scoped.) 625 734 … … 654 763 / <after foo> \d+ <before bar> / 655 764 656 except that the scan for " foo" can be done in the forward direction,657 while a lookbehind assertion would presumably scan for \d+and then658 match " foo" backwards. The use of C<< <(...)> >> affects only the659 meaning of the "result object"and the positions of the beginning and765 except that the scan for "C<foo>" can be done in the forward direction, 766 while a lookbehind assertion would presumably scan for C<\d+> and then 767 match "C<foo>" backwards. The use of C<< <(...)> >> affects only the 768 meaning of the I<result object> and the positions of the beginning and 660 769 ending of the match. That is, after the match above, C<$()> contains 661 770 only the digits matched, and C<.pos> is pointing to after the digits. … … 663 772 through C<$/>. 664 773 774 It is a syntax error to use an unbalanced C<< <( >> or C<< )> >>. 775 665 776 =item * 666 777 … … 718 829 / <!before _ > / # We aren't before an _ 719 830 831 Note that C<< <!alpha> >> is different from C<< <-alpha> >> because the 832 latter matches C</./> when it is not an alpha. 833 834 =item * 835 836 Conjecture: Multiple opening angles are matched by a corresponding 837 number of closing angles, and otherwise function as single angles. 838 This can be used to visually isolate unmatched angles inside: 839 840 <<<Ccode: a >> 1>>> 841 720 842 =back 721 843 … … 732 854 733 855 The C<\L...\E>, C<\U...\E>, and C<\Q...\E> sequences are gone. In the 734 rare cases that need them you can use C<< <{ lc $r ule}> >> etc.856 rare cases that need them you can use C<< <{ lc $regex }> >> etc. 735 857 736 858 =item * … … 800 922 =back 801 923 802 =head1 Regexes are rules924 =head1 Regexes really are regexes now 803 925 804 926 =over … … 812 934 The Perl 6 equivalents are: 813 935 814 r ule{ pattern } # always takes {...} as delimiters815 rx / pattern / # can take (almost any) chars as delimiters936 regex { pattern } # always takes {...} as delimiters 937 rx / pattern / # can take (almost any) chars as delimiters 816 938 817 939 You may not use whitespace or alphanumerics for delimiters. Space is 818 940 optional unless needed to distinguish from modifier arguments or 819 941 function parens. So you may use parens as your C<rx> delimiters, 820 but only if you interpose a colon or whitespace: 821 822 rx:( pattern ) # okay 942 but only if you interpose whitespace: 943 823 944 rx ( pattern ) # okay 824 945 rx( 1,2,3 ) # tries t
