root/docs/notes/laziness.txt

Revision 10213, 10.8 kB (checked in by fglock, 3 years ago)

docs/notes - added more details to generator/lazy array

  • Property svn:mime-type set to text/plain; charset=UTF-8
  • Property svn:eol-style set to native
Line 
1= Laziness
2
3A draft on the runtime view of how Lazy things (like Arrays) work.
4
5# by fglock
6
7The text currently has 2 main sections:
8- a review of the Perl 6 Bible, with some interpretation and guessing
9- implementation notes.
10
11Each section has 3 subsections:
12- Native data - about the internal representation of data types
13- Values - about the basic Perl 6 "OO" things
14- Containers - about the glue that hold things together in data structures
15
16== TODO
17- include drawings
18- comparing with Ruby/Python should help explaining some concepts
19- this text is supposed to talk about all these things:
20  Scalar / Array / Hash Container
21  Tuple / List / Generator / Iterator / Range
22  file iterator
23  Signature / Argument List
24  Array / Hash slice
25  enum
26  tie / proxy / rw / env
27  Constrained Types (eg: Int / Even / 0..100 where Even)
28  Piddle
29  lazy string / int / num
30
31
32= Part I - Perl 6 Bible Readings
33
34References:
35- S02 Bits and Pieces - for "native data"
36- S09 Data Structures - for Object types
37- A bit of each of the other AES
38
39= Native data
40
41Native data is built with a thin, (possibly non-OO) library over the virtual
42machine data types.
43
44For example, you may need a wrapper to implement 1-bit values, if the
45virtual machine only provides bytes.
46
47- native "singular" value
48
49Native values are: num, int, uint, bit, complex, ref.
50Native values can have a bit-size specification: int1, int16.
51
52A "native container" is whatever a "ref" value points to.
53
54A "num" value must support IEEE floating point conventions - which means there
55are native "Inf" and "NaN".
56Native values don't need to be able to represent "undef".
57
58- native "plural" value
59
60Native plural values are: array, str, hash.
61
62A "str" is an array of "int".
63
64There may be separate implementations for fixed-size and for
65auto-extending allocation.
66
67These are fast, in-memory structures provided by the virtual machine.
68These structures are assumed to be one-dimensional.
69
70Native hash is mostly used internally, for holding "namespaces": Classes, variables, labels.
71
72There may be separate implementations for string-indexed and for object-indexed hashes.
73
74= Value Class
75
76Values can be "singular" or "plural".
77
78Singular Values are: Str, Int, Num, Pair, Ref, Junction,
79Code, Block, Class, Grammar, Macro ...
80
81Plural Values are things that look like single-dimensional arrays:
82List, Generator, Iterator.
83
84-- List, Lazy List, Tuple classes
85
86List is implemented as a native Array, or as a contiguous Stack area if this
87is supported by the virtual machine.
88
89List is set of things.
90List is a subclass of "Value".
91List is one-dimensional.
92
93A List may be built with a mix of Values and Containers.
94
95A List might exist in 4 "states":
96
97  @a = ( 7, 20..30 );
98    - this is the Array that will be used in the examples
99  @a = (
100         ( (Cell = 7), ),
101         ( Generator 20..30)
102       );
103    - this is more-or-less what it looks like internally
104
105- unflattened List
106
107  ( 1, 5..10, @a )
108    - a List built from a Scalar, a Range, and an Array
109
110- lazily flattened with "*" operator
111
112  ( 1, 5..10, (Scalar := @a[0]), (Generator (Scalar := @a[1])..(Scalar := @a[10])) )
113    - the Array was "flattened", but we still have generators.
114
115- non-lazily flattened (eager) with "**" operator
116
117  ( 1, 5, 6, 7, 8, 9, 10, (Scalar := @a[0]), (Scalar := @a[1]), (Scalar := @a[2]),  ..... )
118    - everything is flattened
119
120- fully evaluated (Tuple)
121
122  ( 1, 5, 6, 7, 8, 9, 10, @a[0].value, @a[1].value, @a[2].value,  ..... )
123    - fully evaluated and immutable
124
125List only implements a subset of the Array API.
126A List slice is a new List - elements are shallow copied.
127List don't support adding, removing or substituting elements.
128
129List has no Cells - thus you can't bind a Container to a List element or slice. ???
130
131  $a := (1,2,3)[2];  # compile error - or is this creating a Constant ???
132
133If a List is built with Generators, it is a Lazy List.
134
135-- Generator / Range Classes
136
137Generator is an object that creates Values on demand.
138Generator is "functional" rather than "physical memory"
139
140Range is a special case, in which there is a start, an end, and a step.
141
142#At runtime, you only access Generator elements through Containers.
143
144Generator also implements "match":
145
146  5 ~~ 1..10
147
148The "*" operator flattens a Generator eagerly.
149
150TODO - detail the behaviour of "Str Range"
151
152-- Iterator Class
153
154TODO
155
156  =<>
157
158The "*" operator flattens an Iterator eagerly.
159
160-- List Slice
161
162A List Slice is a shallow copy of a part of a List.
163
164A List Slice is just a List - there is not a separate Class for it.
165
166= Container Class
167
168Containers are the glue that hold Values together in data structures.
169
170The container classes are: Scalar, Array, Hash.
171
172Array, like List, is Lazy by default.
173Hash and Scalar can be lazy too, but they are not Lazy by default.
174
175== Scalar Container Class
176
177Scalar is a "singular Value" Container.
178
179One of the reasons to use a container is that you can easily change
180it's semantics by replacing what's inside it - that is, a Container implements
181a "variable":
182
183   value '1' - always means 'one'
184   
185   container ('1') - it means 'one' too, but you can increment it to mean 'two'
186
187== Array Container Class
188
189The Array class defines all objects that implement the Array API.
190
191Array is a subclass of "Container"
192Array contains an ordered set of Cells.
193Array is multi-dimensional.
194
195Array is not meant to be subclassed - use "roles" to make different implementations.
196
197The actual implementation depends on the "Eager" or the "Lazy" roles.
198Other implementations can be specified.
199
200--- Eager Role
201
202An Array object that wraps a native array.
203
204--- Lazy Role
205
206An Array object that generates Cells on demand.
207This is the default implementation.
208
209--- Shape (Role or parameter? trait?)
210
211--- Compact Array Role (piddle)
212
213TODO - piddle is different from Compact
214
215-- Container Slice
216
217A Container Slice is a specialized Array/Hash, in which all the elements
218are bound to objects inside other Containers (Array, Hash or Scalar).
219
220A Lazy Container Slice contains pointers to Generators that are inside other
221Containers.
222
223Container Slice is multi-dimensional.
224
225== Hash
226
227The Hash class defines all objects that implement the hash API.
228
229Hash is like an Array, in which the Cell indexes can be of any type
230(not just numbers).
231
232Hash is multi-dimensional.
233
234
235= Part II - Internals
236
237== Generator Internals
238
239- TODO
240
241
242== Generator API
243
244- match
245
246This operator is used in a match statement:
247
248    5 ~~ 1..10
249
250- shift / pop
251
252TODO - not sure if "Generator" is an iterator (read-only) or uses destructive
253read (shift/pop)
254
255- sum / reverse / map / zip
256
257- flatten
258
259Takes all Values from the Generator, and returns an Array (possibly Native).
260
261- elems
262
263The number of elements
264
265=== This part of the Generator API is only used by the runtime internals.
266
267- head(n)
268
269This operator is used by the Array splice operator.
270
271Takes "n" Values from the Generator, like "shift" does.
272The result can be a Generator (when n is big), an Array (possibly Native),
273a single Value (when n == 1), or Void (when n <= 0).
274
275TODO - not sure if "Generator" is an iterator (read-only) or uses destructive
276read (head/tail)
277
278- tail(n)
279
280This operator is used by the Array splice operator.
281
282Takes "n" Values from the back of the Generator, like "pop" does.
283The result can be a Generator (when n is big), an Array (possibly Native),
284a single Value (when n == 1), or Void (when n <= 0).
285
286TODO - check if "Generator" is an iterator (read-only) or uses destructive
287read (head/tail)
288
289- is_contiguous
290
291This is used for the optimization of Slice, when the sequence is used
292as a slice index.
293
294is_contiguous() is True is the generated sequence is like "2 .. 10".
295
296
297= Array internals
298
299== Array (base Class)
300
301TODO
302
303== Eager (Role)
304
305An Eager Array contains a Native Array.
306
307TODO - explain "Cell"
308
309   Contents:  1 2 3
310   Internals: [ 1, 2, 3 ]
311
312== Lazy (Role)
313
314A Lazy Array contains a "specs" Array, which contains Generators, or Native Arrays.
315
316   Contents:  1 2 3
317   Internals: [
318                [ 1, 2, 3 ]
319              ]
320
321   Contents:  10 20 30 100..10000 50 400..900
322   Internals: [
323                [ 10, 20, 30 ],
324                Generator( 100, 10000 ),
325                [ 50 ],
326                Generator( 400, 900 )
327              ]
328
329
330= Array API
331
332- TODO
333Array can do "zip", "map", "grep", splice, push, pop, store, fetch.
334Array can be "bound" and "tied".
335
336
337== map()
338
339map() applies a 'Code' to an Array state:
340
341  1:   @a = 1..100;            # 1, 2, 3, ...
342  2:   @b = @a.map:{ $_ * 2 }; # 2, 4, 6, ...
343  3: 
344  4:   @a[2] = 0;              # 1, 0, 3, ...  - original has changed
345  5:   say @b[1..3];           # 2, 4, 6, ...  - result didn't change
346
347Line 1:
348
349   Container: @a
350   Internals: [
351                Generator( 1, 100 ),
352              ]
353
354Line 2:
355
356   Container: @_TEMP_
357   Internals: [
358                Generator( 1, 100 ),
359              ]
360
361   Container: @a
362   Internals: [
363                RefArray( @_TEMP_ ),
364              ]
365
366   Container: @b
367   Internals: [
368                MapArray( @_TEMP_, { $_ * 2 } ),
369              ]
370
371Line 4:
372
373   Container: @_TEMP_
374   Internals: [
375                [ 1, 2 ]
376                Generator( 3, 100 ),
377              ]
378
379   Container: @a
380   Internals: [
381                [ 1, 0 ],
382                SpliceArray( @_TEMP_, 2 ),
383              ]
384
385   Container: @b
386   Internals: [
387                MapArray( @_TEMP_, { $_ * 2 } ),
388              ]
389
390== splice()
391
392This pseudo-code implements splice().
393It assumes that the internal representation of Array can do .head(n) and .tail(n) -
394see "Generator" for details.
395
396    sub splice ( @array, $offset? = 0, $length? = Inf, @list? = () ) {
397        my ( @head, @body, @tail );
398        if ( $offset >= 0 ) {
399            @head = @array.head( $offset );
400            if ( $length >= 0 ) {
401                @body = @array.head( $length );
402                @tail = @array;
403            }
404            else {
405                @tail = @array.tail( -$length );
406                @body = @array;
407            }
408        }
409        else {
410            @tail = @array.tail( -$offset );
411            @head = @array;
412            if ( $length >= 0 ) {
413                @body = $tail.head( $length );
414            }
415            else {
416                @body = @tail;
417                @tail = $body.tail( -$length );
418            }
419        };
420        @array = ( @head, @list, @tail );
421        return @body;
422    }
423
424= Hash internals
425
426A Hash contains a Native Hash.
427
428Hash keys can be objects, but the objects must be immutable
429
430   Contents:  a:1 b:2 c:3
431   Internals: { a:1 b:2 c:3 }
432
433A Hash may contain Lazy items.
434This structure contains a Native Array, which contains Generators or Native Hashes.
435This structure permits near O(1) complexity for most operations.
436
437   Contents:  a:1 b:2 c:3
438   Internals: [
439                { a:1, b:2, c:3 }
440              ]
441
442   Contents:  a:10 b:20 c:30 (1:100 .. 10000:10100) d:50
443   Internals: [
444                { a:10 b:20 c:30 },
445                Generator( 1:100, 10000:10100 ),
446                { d:50 },
447              ]
448
449
450= Hash API
451
452- TODO
453
454
455= Slice API
456
457- TODO
458
459
460= Slice Internals
461
462- TODO - hash slice / array slice
463
464
465__END__
Note: See TracBrowser for help on using the browser.