Previous ... Next

Caching on Mutable objects

(1/5) Cache invalidation

The main advantage of caching methods of immutable objects is that the cache stays valid. L42 can also cache methods of mutable objects, and discovers on its own when the cache needs to be invalidated. Consider this trivial variation of the «» example above where the fields can be updated:

When a setter for «» or «» is invoked, then the two «» methods are recomputed. In other programming languages, this behaviour can be encoded by making the fields private and customizing the implementations of the setters to recompute the distance when needed. This pattern can grow very complex very fast. L42 guarantees that a cached value is always structurally equivalent to the value that would be returned by calling the method again. Moreover, for «», L42 also guarantees that if the computation was re-run then it would terminate without errors. Thus, when «» is used to emulate invariants, those invariants are guaranteed to hold for all observable objects, that is, all objects where the annotated method could possibly be called. This is possible thanks to the strong L42 type system, and we believe this property can not be broken. That is, we believe this property to hold even in the presence of exceptions, errors, aliasing, input output and non deterministic behaviour. It is possible to make L42 work together with Java or even with (possibly broken) native code, and we believe our property will continue to hold.

(2/5) Deeply mutable objects

As discussed above, a deeply mutable object is a mutable object with some mutable fields. Also deeply mutable objects can support «», but such objects must have encapsulated state, as we have seen before for the class «».

The field «» is an encapsulated mutable field. It can be accessed as «» by doing «», but can not be directly accessed as «». However, we can write capsule mutator methods by using «». Similarly to «», a class method can be annotated with «» and can take parameters representing the object's fields. In addition, more parameters can be present encoding extra arguments. To clarify, consider this richer example, where our «» has an invariant and another capsule mutator method:
maxD (
        maxI:=i
        maxD:=currentD
        )
      )
    if maxI!=I"-1" path.remove(maxI)
    )

  @Cache.Now class method
  Void invariant(read Points path, Point location) = 
    if path.contains(location) error X"..."
  }
]]>
We added a method to remove the farthest away point if it is over a certain distance. As you can see, the parameters «» and «» corresponds to fields, while the parameter «» is extra needed information. When we call «» we pass only «»; the other parameters are passed automatically.
As an invariant, we require that the current location is not in the «». This code, in the current form, has a bug; can you spot it? Look carefully at the method «»:
Here we first set up the location, then we remove it from the path. The method «» recomputes the invariant twice: one after the field setter and one after the call to the «» method. This first check is going to fail, since the leftmost element of the path has not been removed yet. In this case we can solve the problem by swapping the lines:
However, this kind of solution does not scale in the general case. Next, we will see a programming pattern that allows the invariant checks (and more generally the recomputation of «» methods) to be delayed in a controlled way.

Cache.Lazy and Cache.LazyRead

As we have seen before, we can annotate «» and «» methods with «» so that the result will be computed once, the first time that the method is called. We can also annotate «» methods in the same way. However, the cache is now stored in the actual objects and not in the normalized versions. This happens because a «» reference can refer to either mutable or immutable objects, and only immutable objects can have normalized versions. If anything in the ROG of the «» object is mutated, then the cache is invalidated, and the result will be recomputed the next time the method is called. Indeed this annotation enables lazy caching on mutable data-structures, where the cache is automatically invalidated and removed when a «» method terminates. Finally, since the type system can not track when the ROG from «» fields is mutated, a «» can only be applied to classes whose fields are all «», «» or «»; that is, their instance all have encapsulated state.

If a class has «» fields, but those are not actually used to compute the cached value, we can apply the «» annotation to an opportune «» method instead. Every method annotated as «» could instead be annotated as «». This annotation is a point in the middle between «» and «»; it produces the same behaviour as «» but works similarly to «»: it is applied to «» methods whose parameters represent fields, and «» generates a correspondent no-arg «» method. «», «» and «» methods all behave as if they where recomputing the result, but with a different performance.

Cache invalidation is considered one of the great challenges in writing correct programs; L42 can handle it correctly and automatically. However, there is a cost: you have to encode the algorithm so that the type system accepts your code and so that the caching annotations can be applied.

(3/5) Box patten

As we have seen, in order to write mutable objects with encapsulated state, we need to design them well, using «» to initialize the mutable data, using «» to mutate such state, and «» for the invariant. However, we can also program a naive deeply mutable object and box it up as a second step. This can require a little more code, but it is more intuitive, and works very well for arbitrarily complex cases. Consider the following code:

As you can see, the «» is a deeply mutable class, designed with no attention to correctness: if the programmer is not careful, the same «» may end up being used for multiple bikes at the same time. Also, the method called «» only represents a programmer intention, but it is not enforced in any way, so it could be silently broken. We can easy create a «» class containing and encapsulating, such a «»:
As you can see, no matter how complex some class code is, we can simply wrap it into a box and apply «» and «» on top of it. In this way we end up with two types: «», that does not offers any guarantee, and «», ensuring the invariant and encapsulating the state. The methods «», «» and «» will check for the invariant exactly one time, at the end of their execution. Following this pattern, we can perform an arbitrarily long computation before the checks are triggered. When writing other classes, we can choose to use «» or «», depending on the specific details of our code. If we chose to use «» as a field of another class, we can still check the «» invariant inside the invariant of the composite class:

As we have seen, with the box pattern we can have the flexibility of temporarily open invariants without any of the drawbacks. Of course, programmers will need to keep in mind which values are protected by invariants and which values are unsupervised by invariants. In 42 this is under the control of the type system: a value of type «» has no special guarantees, while a value of type «» will ensure the invariant.

(4/5) Controlling the ROG shape

An attentive reader may have notice that we would allow for fields «» and «» to point to the same «» object. A Java programmer may be tempted to just add «» in the invariant, but this would just use the user defined «» operator, that on classes created using «» is likely to check for structural equality instead of pointer equality. AdamsTowel offers «» to check for reference equality, but this method only works for «» objects. The wheels are indeed «» objects, but the invariant method takes a «» receiver; thus we can only see the wheels as «». In this case, the inability to use pointer equality is actually a good thing, since it does not correspond to what we really wanted to express: what if the two wheels are different objects but they share the same «» object? What we want is to check that the mutable objects are not aliased in physically unreasonable ways. Generally, what we often want is to ensure the tree shape of the mutable part of the object graph. In 42, we can create classes where all of the instances are guaranteed to follow this property, by making all fields either «» types of classes that recursively respect this property or «»/«». However, according to what we have seen up to now, «» fields can only be mutated by defining «» methods, and those methods will be unable to mutate any other «» field. Consider the following code:

Here the «» can rub onto the wheel, damaging it. The parameter of method «» is «». This guarantees that the object graphs of the chain and the wheel will not be mangled together by the method «». Can we assemble a «» while being able to both use «» and guaranteeing the tree structure? The trick is to declare «» exposers manually:
That is, by declaring lent exposers manually we gain the possibility of writing methods that mutate the capsule fields without using the «» pattern. In exchange for this extra flexibility, those fields do not count as «» fields for the sake of «», «» or «» annotations. However, we can still use through the box pattern, as show before.

(5/5) Summary

Representation invariants and lazy caching can be applied to mutable objects as well as immutable ones. Proving properties on mutable objects requires us to know and apply various patterns. Historically, in software verification, representation invariants where small units of code, mostly focusing on the direct content of the fields and mostly relying on either pointer equality or the binary value of primitive datatypes, where the invariants could be deliberately broken while the program was still able to observe the broken object. None of this is possible in 42; in particular, 42 guarantees that no broken objects can be observed. The box pattern allows us to divide the value into two types: the one with an enforced invariant and the raw object state. This recovers the flexibility of temporarily open invariants without any of the drawbacks. Note that a sound language with normalization and caching/invariants can not offer pointer equality tests on immutable objects. Consider the example of a list whose invariant requires all of its elements to have a distinct pointer values. A list with such an invariant may contain two structurally equal but not pointer equal elements. Suppose such a list became immutable and was normalized. Now those two elements would be pointer equal, and the invariant would have been broken by normalization.

It can be hard to remember the differences between all of the «» annotations. Below, we show a table summarizing them.

  annotation     recType     parameters     transformedInto     storage     timing  
Cache.Lazy class zero class first call
Cache.Lazy imm zero norm first call
Cache.Lazy read* zero instance invalidation
Cache.Eager imm* zero norm parallel
Cache.LazyRead class fields read0 instance invalidation
Cache.Now class fields read0 instance invalidation+
Cache.Clear class fields+ mut+ instance- when called

Notes:

  • imm* = applicable only on classes with all fields «»/«» and not «».
  • read* = applicable only on classes with all fields «»/«».
  • fields = method parameters are field names. Capsule fields are seen as «».
  • fields+ = the first parameter is a «» field, seen as «». Additional parameters can be «»,«» or «».
  • invalidation = the method is executed on the first call, or the first call after invalidation.
  • invalidation+ = the method is executed during the factory, and immediately after any invalidation.
  • instance- = caches in the instance are invalidated.
  • read0 = a «» method with zero parameters.
  • mut+ = a «» method with the additional parameters as for fields+.
  • parallel = executed in a parallel worker starting after the factory.

Digressions / Expansions

As we have seen, parallel programming can be viewed as a form of caching. In some cases, we need parallel programming on mutable data. In our experience, this is not very common; the cost of copying data around is much smaller that most programmers assume. Let us repeat this very clearly: there are many other ways to optimize software, and they are much easier and much, much more rewarding than avoiding copying the few mutable parts of your data structure a couple of times. We think that only highly skilled and motivated programmers can discover and hunt down all of those other much more pressing algorithmic issues that often spoil performance. Only after approaching the performance limits of «» with good algorithms, it could make sense to adopt parallel programming on mutable data to avoid some extra clones. However, if you are in such a situation, you can use the annotation «» as shown below. Again, the 42 type system will ensure that parallelism is not observable.

Like other «» annotations, «» is translated by «» into an actual implementation. «» works only on methods whose body is exactly a round parenthesis block. The initialization expressions for «», «», and «» are run in parallel, and the final expression is run only when all of the initialization expressions are completed. The method itself can take any kind of parameters, and they can all be used in the final expression, but the initialization expressions need to fit one of the following three safe parallel patterns:

Non-Mutable computation

In this pattern, none of the initialization expressions can use «» or «» parameters. In this way nothing can be mutated while the forkjoin is open, thus parallelism is safe. This is more expressive than «» since it allows us to run parallel code on «» references of mutable objects.

Single-Mutable computation

In this pattern, a single initialization expression can use any kind of parameter, while the other ones can not use «», «» or «» parameters. This pattern allows the initialization expression that can use «» to recursively explore a complex mutable data structure and to command updates to immutable elements arbitrarily nested inside of it. Consider for example this code computing in parallel new immutable string values for all of the entries in a mutable list:

As you can see, we do not need to ever copy the whole list. We can update the elements in place one by one. If the operation «» is complex enough, running it in parallel could be beneficial. As you can see, it is trivial to adapt that code to explore other kinds of collections, like for example a binary tree. Or, in other words, if you are unsure on how to adapt that code to work on a tree, you should stick with «» and accept that you are not (yet) one of the few elite programmers with enough skill to take advantage of the 42 fork join. AdamsTowel could use metaprogramming features to define code that parallelises user defined operations on lists, maps and other common datastructures. However, we think this is beyond the responsibility of AdamsTowel, and should instead be handled by some user defined library.

This-Mutable computation

In this pattern, the 'this' variable is considered specially. The method must be declared «», and the initialization expressions can not use «», «» or «» parameters. However, the parameter «» can be used to directly call «» methods. As we have seen before, «» methods can mutate a «» field; thus different initialization expressions must use «» methods updating different «» fields. In this way, we can make parallel computation processing arbitrary complex mutable objects inside well encapsulated data structures . Consider the following example, where «»s could be arbitrarily complex; containing complex (possibly circular) graphs of mutable objects.

This pattern relies on the fact that using «» fields we can define arbitrary complex data structures composed of disjointed mutable object graphs. Note that «» aliases to parts of the data structure can be visible outside. This is accepted since we can not access them when the forkjoin is open. The declarations can not use «» parameters.
      Previous ... Next