| 1 | =pod |
|---|
| 2 | |
|---|
| 3 | =head1 NAME |
|---|
| 4 | |
|---|
| 5 | Pugs Apocryphon 2 - Overview of Pugs Internals |
|---|
| 6 | |
|---|
| 7 | =head1 DATE |
|---|
| 8 | |
|---|
| 9 | This document could get out of date very quickly. If it seems that more than |
|---|
| 10 | a week has passed between the time there was an update to the time you read |
|---|
| 11 | these words, prod someone on C<#perl6>, or read |
|---|
| 12 | L<http://use.perl.org/~autrijus/journal> and see if there's been any big |
|---|
| 13 | change. |
|---|
| 14 | |
|---|
| 15 | The current copy was last revised on 2005-06-03. |
|---|
| 16 | |
|---|
| 17 | =head1 PRELUDE |
|---|
| 18 | |
|---|
| 19 | Pugs is written in the Haskell language. Before dabbling with Pugs internals |
|---|
| 20 | it may be wise to study a bit of Haskell. |
|---|
| 21 | |
|---|
| 22 | =head1 INTRODUCTION |
|---|
| 23 | |
|---|
| 24 | Pugs is a versatile project, tapping into the power of many other projects. |
|---|
| 25 | Pugs itself fits into a star topology, optionally using these projects to |
|---|
| 26 | gain more features. |
|---|
| 27 | |
|---|
| 28 | Each will be discussed later in more detail. |
|---|
| 29 | |
|---|
| 30 | =head1 PUGS RUNTIME OVERVIEW |
|---|
| 31 | |
|---|
| 32 | Pugs's own core is also componentized. The separation roughly coincides with |
|---|
| 33 | Perl 6's runtime and compile time. However, it is notable that the parts are |
|---|
| 34 | intermixed, since Perl 6 is a very dynamic language. |
|---|
| 35 | |
|---|
| 36 | The two parts are roughly: |
|---|
| 37 | |
|---|
| 38 | =head2 The Parser |
|---|
| 39 | |
|---|
| 40 | This part takes a string of Perl 6 and converts it into the AST, or Abstract |
|---|
| 41 | Syntax Tree. The AST represents the program's structure, which the evaluator |
|---|
| 42 | later executes. |
|---|
| 43 | |
|---|
| 44 | This part is responsible for the compilation of Perl 6 source code. |
|---|
| 45 | |
|---|
| 46 | =head2 The Evaluator |
|---|
| 47 | |
|---|
| 48 | The evaluator combines the program's AST with what is known as the I<Env>, |
|---|
| 49 | or roughly speaking, the current state of the program's execution. |
|---|
| 50 | |
|---|
| 51 | It walks the nodes of the tree, reducing them into values. Of course, the |
|---|
| 52 | interesting part is what happens during the reduction - this is the actual |
|---|
| 53 | execution of the code. |
|---|
| 54 | |
|---|
| 55 | This part is reponsible for the runtime - the execution of compiled Perl 6 |
|---|
| 56 | ASTs. |
|---|
| 57 | |
|---|
| 58 | =head1 SOURCE TREE OVERVIEW |
|---|
| 59 | |
|---|
| 60 | This section does not discuss the files in detail. Pugs is documented with |
|---|
| 61 | Haddock, and for reference that is the place to look. |
|---|
| 62 | |
|---|
| 63 | What this section B<does> provide is an overview of the responsibilities each |
|---|
| 64 | part has in overall structure of Pugs. |
|---|
| 65 | |
|---|
| 66 | =head2 F<src/Pugs/AST.hs> |
|---|
| 67 | |
|---|
| 68 | This file contains the definitions of the AST's types. |
|---|
| 69 | |
|---|
| 70 | It is more or less a description of how Perl 6 programs can look after |
|---|
| 71 | compilation. |
|---|
| 72 | |
|---|
| 73 | =head2 F<src/Pugs/Parser.hs> |
|---|
| 74 | |
|---|
| 75 | This file contains the parser for Perl 6 code. It is written using the Parsec |
|---|
| 76 | library. |
|---|
| 77 | |
|---|
| 78 | It produces Syn and Exp structures as defined in F<AST.hs>, and puts them in |
|---|
| 79 | the envBody of the env. |
|---|
| 80 | |
|---|
| 81 | =head2 F<src/Pugs/Eval.hs> |
|---|
| 82 | |
|---|
| 83 | This file implements the evaluation logic for the AST. Its main job is |
|---|
| 84 | reducing Exps into Vals. Most Exps require applying VCode objects, which |
|---|
| 85 | represent closures (blocks, subroutines, operators...), looking up |
|---|
| 86 | variables, or other basic features Perl 6 provides, and this is where most |
|---|
| 87 | of this is coded. |
|---|
| 88 | |
|---|
| 89 | =head2 F<src/Pugs/Prim.hs> |
|---|
| 90 | |
|---|
| 91 | This file contains the implementations of many of the primitive operators. |
|---|
| 92 | For example, the addition operator, C<< &infix:<+> >> is defined here. It |
|---|
| 93 | converts the two Perl values it gets into Haskell Nums, applies Haskell's |
|---|
| 94 | builtin addition operator to these, and then makes a Perl value out of the |
|---|
| 95 | result. |
|---|
| 96 | |
|---|
| 97 | The various operators and builtin functions are implemented using the opN |
|---|
| 98 | function, and the definition of their Perlish behavior is defined in the |
|---|
| 99 | table at the bottom. |
|---|
| 100 | |
|---|
| 101 | The table basically says whether the builtin is infix or not, how many |
|---|
| 102 | parameters it accepts, and so forth. |
|---|
| 103 | |
|---|
| 104 | =head2 F<src/Pugs/Run.hs> |
|---|
| 105 | |
|---|
| 106 | This is the file that ties it all together, it takes a Perl 6 file, slurps |
|---|
| 107 | the string out of it, hands it to the parser, then takes the AST out and |
|---|
| 108 | sends its envBody into the evaluator. |
|---|
| 109 | |
|---|
| 110 | =head1 A PROGRAM'S LIFE CYCLE IN DETAIL |
|---|
| 111 | |
|---|
| 112 | Earlier we discussed how eventually what the parser emits is fed to the |
|---|
| 113 | evaluator. Now we'll look at the details and special cases more closely. |
|---|
| 114 | |
|---|
| 115 | As we've seen before, the runtime calls the parser on the Perl code, and it, |
|---|
| 116 | in turn, generates an AST. Most parsed things result in trivial structures -- |
|---|
| 117 | just a representation of the program in something a bit more manipulable |
|---|
| 118 | than a string of source code. |
|---|
| 119 | |
|---|
| 120 | This basic structure, a node of the AST, is called an C<Exp> - an expression. |
|---|
| 121 | It represents the combination of values and operation, and the evaluator |
|---|
| 122 | knows to boil it down into a C<Val>. |
|---|
| 123 | |
|---|
| 124 | Matters get a little more complex when the code not only I<is> something at |
|---|
| 125 | compile time, but actually I<does> something, like declarations of variables |
|---|
| 126 | which create the variables, or C<BEGIN> blocks which execute code at compile |
|---|
| 127 | time. |
|---|
| 128 | |
|---|
| 129 | =head2 Enter unsafe unsafeEvalExp |
|---|
| 130 | |
|---|
| 131 | The parser is pure in that it does not affect the outside world when it does |
|---|
| 132 | its thing. It constructs the AST, but not much more. |
|---|
| 133 | |
|---|
| 134 | In order to execute things like C<BEGIN> blocks there are exceptions to |
|---|
| 135 | this. |
|---|
| 136 | |
|---|
| 137 | BEGIN { print "compile time" }; |
|---|
| 138 | |
|---|
| 139 | This operation has side effects - it causes the world outside the pugs |
|---|
| 140 | interpreter to change. However, it must happen within the "pure" parser, and |
|---|
| 141 | Haskell does not normally allow these things. |
|---|
| 142 | |
|---|
| 143 | The C<unsafe> in the name denotes that an effort was made to not care about |
|---|
| 144 | that bit of safety, and do IO in the pure parser anyway. |
|---|
| 145 | |
|---|
| 146 | But it does not strictly mean IO - what C<unsafeEvalExp> is just a short |
|---|
| 147 | circuit from the parser to the evaluator, allowing code to run at compile |
|---|
| 148 | time. |
|---|
| 149 | |
|---|
| 150 | C<BEGIN> blocks are evaluated by calling C<unsafeEvalExp> on the resulting |
|---|
| 151 | C<Exp> immediately after the block finished parsing, and then replacing that |
|---|
| 152 | point in the syntax tree with a the value the block was reduced to. |
|---|
| 153 | |
|---|
| 154 | Declarations of sorts create a node in the syntax tree called a C<Syn>. |
|---|
| 155 | C<Syn>s represents syntactic constructs of sorts, amongst which are variable |
|---|
| 156 | declarations. When evaluated, variable declarations create a type of C<Exp> |
|---|
| 157 | that will modify the C<Env>, adding a new symbol, and then roll back the |
|---|
| 158 | change later. They are also evaluated immediately using C<unsafeEvalExp>. |
|---|
| 159 | |
|---|
| 160 | Other C<Syn>s include control flow structure, and various keywords, but they |
|---|
| 161 | will be discussed later. |
|---|
| 162 | |
|---|
| 163 | =head2 reduce :: Exp -> Eval Val |
|---|
| 164 | |
|---|
| 165 | The heading of this section is the type declaration for the evaluator's |
|---|
| 166 | C<reduce> function. |
|---|
| 167 | |
|---|
| 168 | Let's break it down. |
|---|
| 169 | |
|---|
| 170 | The C<Exp> means that the single argument C<reduce> accepts is an expression. |
|---|
| 171 | The C<Eval> is the monadic fudgeting of the C<Val> type, indicating that the |
|---|
| 172 | reduction process of the C<Val> from the C<Exp> is controlled by the C<Eval> |
|---|
| 173 | monad. |
|---|
| 174 | |
|---|
| 175 | Lets try to explain this with an example: |
|---|
| 176 | |
|---|
| 177 | reduce (Val v) = reduceVal v |
|---|
| 178 | reduceVal v = retVal v |
|---|
| 179 | |
|---|
| 180 | This form of reduce takes the expression that is just a value, like C<3> or |
|---|
| 181 | C<"foo"> and encapsulates the data into the C<Eval> monad using C<retVal>. |
|---|
| 182 | |
|---|
| 183 | reduce (Var name) = reduceVar name |
|---|
| 184 | reduceVar name = do |
|---|
| 185 | v <- findVar name |
|---|
| 186 | case v of |
|---|
| 187 | Just var -> evalRef var |
|---|
| 188 | _ | (':':rest) <- name -> return $ VType (mkType rest) |
|---|
| 189 | _ -> retError "Undeclared variable" name |
|---|
| 190 | |
|---|
| 191 | This reduction takes an expression like C<$a> and reduces it into a value. Here |
|---|
| 192 | the C<Eval> monad's purpose comes into play a bit more clearly. |
|---|
| 193 | |
|---|
| 194 | The first line finds the container named by C<name> using the C<findVar> |
|---|
| 195 | function. |
|---|
| 196 | |
|---|
| 197 | The C<Eval> monad is in use because such an operation might fail - in this |
|---|
| 198 | case, the variable does not exist. The second line throws an exception when |
|---|
| 199 | that happens. |
|---|
| 200 | |
|---|
| 201 | Lastly, if everything went OK, the container is dereferenced to return the |
|---|
| 202 | value it contains. Here is the type signature of C<evalRef>, for reference: |
|---|
| 203 | |
|---|
| 204 | evalRef :: VRef -> Eval Val |
|---|
| 205 | |
|---|
| 206 | =head2 Many different reductions |
|---|
| 207 | |
|---|
| 208 | F<src/Pugs/Eval.hs> contains 11 different variants of reduce, which dispatch |
|---|
| 209 | to over 60 sub-variations. Each one serves a different purpose, and most are |
|---|
| 210 | pretty straightforward. |
|---|
| 211 | |
|---|
| 212 | For example C<reduce (Syn "env" [])> is the reduction that takes care of |
|---|
| 213 | variable declaration using C<VControl>, while C<reduce (Cxt cxt exp)> forces |
|---|
| 214 | the subexpression C<exp> to be evaluated in the context C<cxt>. |
|---|
| 215 | |
|---|
| 216 | Let's look at some of the more interesting C<reduce>s. My personal favourite is |
|---|
| 217 | C<for>. |
|---|
| 218 | |
|---|
| 219 | It is defined in the C<reduceSyn name exps> declaration, which is the |
|---|
| 220 | reduction of the various syntatic constructs. It uses Haskell's pattern |
|---|
| 221 | matching to invoke the appropriate reduction variant for the values of C<name>. |
|---|
| 222 | Here's the case C<for>, annotated: |
|---|
| 223 | |
|---|
| 224 | reduceSyn "for" [list, body] = do |
|---|
| 225 | |
|---|
| 226 | This takes the two expressions to the C<for> syntax thingy, the list part, and |
|---|
| 227 | the body. C<for (@list) { print "i'm the body" }>. |
|---|
| 228 | |
|---|
| 229 | The body is actually a subroutine; we'll look at that in a bit. After that |
|---|
| 230 | line are some details which we don't care about right now. Let's pretend they |
|---|
| 231 | don't exist and jump down to |
|---|
| 232 | |
|---|
| 233 | let arity = max 1 $ length (subParams sub) |
|---|
| 234 | |
|---|
| 235 | That part determines how many elements to take out of C<list> for each |
|---|
| 236 | iteration of C<body>. After that comes a lexically scoped function |
|---|
| 237 | definition, C<runBody>. Let's analyze it. |
|---|
| 238 | |
|---|
| 239 | runBody [] _ = retVal undef |
|---|
| 240 | |
|---|
| 241 | This takes care the case where there are no more elements in C<list>. |
|---|
| 242 | Contrast it with: |
|---|
| 243 | |
|---|
| 244 | runBody vs sub' = do |
|---|
| 245 | let (these, rest) = arity `splitAt` vs |
|---|
| 246 | genSymCC "&next" $ \symNext -> do |
|---|
| 247 | genSymPrim "&redo" (const $ runBody vs sub') $ \symRedo -> do |
|---|
| 248 | apply (updateSubPad sub' (symRedo . symNext)) Nothing $ |
|---|
| 249 | map (Val . VRef . MkRef) these |
|---|
| 250 | runBody rest sub' |
|---|
| 251 | |
|---|
| 252 | Which matches C<vs> that isn't an empty list (the first C<runBody> matched |
|---|
| 253 | that case). The C<splitAt> takes C<arity> elements out of C<vs>, that is the |
|---|
| 254 | number of parameters the body subroutine wants, and puts them in C<these>. The |
|---|
| 255 | rest go into C<rest>. |
|---|
| 256 | |
|---|
| 257 | The lines after that set the C<&redo> and C<&next> variables so that they |
|---|
| 258 | contain functions which will control the flow of the current iteration of the |
|---|
| 259 | loop. |
|---|
| 260 | |
|---|
| 261 | The line starting with C<apply> applies the subroutine currently in C<sub'>, |
|---|
| 262 | and gives it C<these> as its parameters on the line starting with C<map>. |
|---|
| 263 | |
|---|
| 264 | Lastly, after the subroutine is applied, C<runBody> is run again on C<rest>. |
|---|
| 265 | |
|---|
| 266 | genSymCC "&last" $ \symLast -> do |
|---|
| 267 | let munge sub | subParams sub == [defaultArrayParam] = |
|---|
| 268 | munge sub{ subParams = [defaultScalarParam] } |
|---|
| 269 | munge sub = updateSubPad sub symLast |
|---|
| 270 | |
|---|
| 271 | Outside of C<runBody>'s definition, the C<&last> variable is also defined. It |
|---|
| 272 | controls the whole loop, not only a single step, so it doesn't need to be in |
|---|
| 273 | C<&runBody>. |
|---|
| 274 | |
|---|
| 275 | When all the auxiliary functions have been defined, we can run the body with |
|---|
| 276 | the list passed into the for (munging into C<elms> omitted): |
|---|
| 277 | |
|---|
| 278 | runBody elms $ munge sub |
|---|
| 279 | |
|---|
| 280 | =head2 VCode application |
|---|
| 281 | |
|---|
| 282 | Now that we've seen a nice example of how a subroutine (which might be |
|---|
| 283 | masquerading as a simple block) is used, lets see how C<VCode>, the value |
|---|
| 284 | representing closures (subroutines, blocks, coroutines, etc) is called. |
|---|
| 285 | |
|---|
| 286 | |
|---|
| 287 | Subroutine application can be very simple, in the case of a C<Prim>. At other |
|---|
| 288 | times it involves entering a lexical scope, due to block open. Sometimes |
|---|
| 289 | parameter binding is involved too. |
|---|
| 290 | |
|---|
| 291 | But have no fear, we will soon see that like most parts of pugs, these |
|---|
| 292 | things are actually pretty simple. |
|---|
| 293 | |
|---|
| 294 | =cut |
|---|