Previous ... Next

Caching

(1/5) Normalization

One of the big advantages of deeply immutable objects is that two structurally identical objects are referentially transparent, that is, you can not distinguish whether they are represented in memory by one object or two. This means that it is possible to reuse the same objects to save memory. While in other languages the programmer would have to implement some specific support to reuse objects, in L42 this is supported directly in the language, by a process called normalization . An immutable object can be turned into its normalized version using «».

As you can see, objects are born not normalized and can be normalized manually by the programmer. Normalization is cheap but it is not free, so it is not convenient to apply it on short lived objects. The programmer can express that all objects of a class are normalized during creation by passing parameters to «»:
Normalization starts by checking if another structurally equivalent object has ever been normalized. Normalization normalizes all the sub objects and also supports circular objects.
Consider the following richer example:
Normalizing an object normalizes the whole ROG. In the example, normalizing the two dogs also normalizes their owners, to the same normalized object. All of the dogs' fields are then replaced with the normalized versions, so the two dogs now share an owner object. Note that bob1 and bob2 are still two different objects.

The logic needed for normalization is the same needed to check if two arbitrary objects are structurally equal, to print an object to a readable string and to clone objects. Thus data allows for all of those operations indirectly relying on normalization. Those are all operations requiring to scan the whole ROG of the object, so the cost of normalization is acceptable in context.

(2/5) Lazy Caching

Some methods may take a long time to compute, but they are deterministic, and thus we could cache the result and reuse it many times. A typical example is Fibonacci:

This Fibonacci implementation would take a very long time to run, since it would require recomputing the same results an exponential amount of times. This tweaked implementation relying on caching is much faster.
As you can see, instead of a method with parameters we can declare a class with fields and an empty named method doing the actual computation. «» is an annotation recognized by «» that works only on «» or «» methods with no arguments and with an «» result. That is, 42 does not directly recognize the annotation «». Decorating the surrounding library with «» translates «» into an actual implementation. «» is a computation object : an imm object whose only goal is to support one (or more) computationally intense methods. Thanks to normalization, the cache of computation objects is centrally stored, and thus recursive calls computing Fibonacci will be able to reuse the cache from other objects. That is, the method result is cached on the normalized version of the receiver. In this way, all the redundant Fibonacci calls are avoided.

As you can see, the caching is is completely handled by the language and is not connected with the specific algorithm. This pattern is general enough to support any method from immutable data to an immutable result.

(3/5) Automatic parallelism

When decorated by «», «» caches the results of methods after they have been called the first time. However, why wait for the methods to be called? Once the receiver object is created, the method could be computed eagerly in a separate worker, so that when we call the method, we may get the result without waiting at all. That is, if we use «» we can get automatic parallelism: the language will handle a set of parallel workers to execute such method bodies.

An important consideration here is that both «» and «» are unobservable; that is, the observed result of the code is not impacted by lazy or eager cache. Consider the following code:

Here we declare a Task object with a string field and two eager methods: one will check if the text in the string is polite and another will check if the string is grammatically correct. This can take quite a while. By using eager methods, it is sufficient to just create the objects to start those computations in parallel. When we need the results, we can just iterate on those task objects and call the methods. Objects with eager methods are automatically normed during creation, as if we used «» instead of «». As you can see, in 42, parallelism and caching are just two sides of the same coin.

(4/5) Invariants and derived fields

We have seen that cached behaviour can be computed lazily or eagerly on immutable objects. But we can bring caching even earlier and compute some behaviour at the same time as object instantiation. This allows us to encode derived fields: fields whose values are completely determined by other values in the same object. Consider the following example:

where the class «» has 3 fields, but the value of the third one should depend only on the other two. In 42, the code above would simply define a class with three unrelated fields, and while we are offering a factory that conveniently takes «» and «» and initialize the third field with the computed value, the user could easy create invalid instances by calling the factory method with three arguments. As we will see later, in 42 we can prevent this from happening by making such a method private. However, we would still be able to create an invalid «» inside of other «» methods. Ideally, we would like to truly have only two fields, and have the third one as a precomputed derived value. In 42, we can encode such concept using «»:
The «» class defined above has a single factory method taking just «» and «». In this way there is no need to have multiple ways to build the object and then hide the dangerous ones after the fact. The method «» is computed when a «» object is created. Moreover, «» adds a method «», allowing us to read the computed/cached value as we would if it were a field. Note that «» makes a «» method calling the «» one. If the method «» leaks an error, it will be propagated out as if the method were manually called during object construction. This means that any time you receive a «», it has a valid distance.

We can build on this behaviour to encode class invariants: «» methods with «» return type designed to simply throw error if the state is incorrect. For example, consider this updated version of «»:

=0Double && y>=0Double) error X"""%
      | Invalid state:
      | x = %x
      | y = %y
      """
  }
]]>
Now, every time user code receives a «», they can rely on the fact that «» and «» are non-negative and not NaN.

(5/5) Summary

In 42 immutable objects, can be normalized in order to save memory. This works also on circular object graphs. In case you are interested in the details, it relies on a variation of DFA normalization. As a special case, objects without fields (immutable or not) are always represented in memory as a single object. Results of «» and «» are attached to the normalized version of an object, thus making it possible to recover them simply by building a structurally identical object.

There are three kinds of caching, depending on the time the caching behaviour activates:

  • «» computes the cached value when the annotated method is first called. It works on «» and «» no-arg methods. An obvious workaround for the no-arg limitation is to define computation objects; this also works well with normalization: computation objects will retrieve the cached results of any structurally equivalent object.
  • «» computes the cached value in a separate parallel worker, starting when the object is created. It only works on «» no-arg methods of classes whose objects are all deeply immutable. Those classes will automatically normalize their instances upon creation.
  • «» computes the cached value during object construction. Since the object does not exist yet, the annotation can only be placed on a «» method whose parameters represent the needed object fields. This annotation does influence the observable behaviour. If there is no error computing the «» methods, then the fully initialized object is returned. But, if an error is raised computing the cache, instead of returning the broken object, the error is leaked during object construction.
    This, in turn, allows us to encode class invariants and to provide a static guarantee that users of a class can rely upon.

      Previous ... Next