Running | Data Representation

This tutorial will pick up where the introduction left off. It will discuss the data representation aspects of the miniboxing transformation, as explained in the Unifying Data Representation Transformations paper.

In the introduction, we created a class C:

class C[@miniboxed T](val t: T)

and noticed that its specialized variants C_L and C_J included a weird Tsp @storage[Long] type in the minibox-inject compiler phase, which was later transformed to Long in minibox-commit. Let us look at this process.

Compiler Phases

First of all, there are several miniboxing phases in the compiler pipeline:

$ mb-scalac -Xshow-phases
    phase name  id  description
    ----------  --  -----------
           ...  ..  ...
       uncurry  13  uncurry, translate function values to anonymous classes
minibox-inject  14
minibox-coerce  15
minibox-commit  16
     tailcalls  17  replace tail calls by jumps
           ...  ..  ...

Here we see the main three phases introduced by the miniboxing plugin (there are another 3 which are introduced for purely technical reasons, to maintain compatibility with the rest of the compiler: pretyper, posttyper and hijacker). The main tree phases map directly to the data representation mechanism phases:

minibox-inject duplicates methods and classes and adds the @storage annotation
minibox-coerce introduces explicit coercions between boxed and miniboxed values
minibox-commit gives the final semantics to annotated types and coercions

Motivation

This sounds like a lot of work for an otherwise trivial task: transforming a type T to Long. To see why this is necessary, let us take an example:

object DR1 extends App {
  def foo[@miniboxed T](t: T): Unit = {
    val a: Any = t
    println(a)
  }
  foo(3.14)
}

Compiling this code will produce two versions of the method: foo, the generic variant and foo_n_J, that encodes primitive types in a long integer. The last call in the object, to foo(3.24) will be rewritten to use foo_n_J.

Yet, the more interesting part is how the val a: Any = t statement is translated. If we simply replace T by Long, a call to foo would not print 3.14 as expected, but the long integer encoding of the floating-point number, which is not desirable.

Let us see how the miniboxing transformation handles this case (output simplified for readability):

$ mb-scalac DR1.scala -Xprint:uncurry,minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of                   uncurry]] // DR1.scala
package <empty> {
  object DR1 extends Object with App {
    def foo[@miniboxed T](t: T): Unit = {
      val a: Any = t;
      println(a)
    };
    DR1.this.foo[Double](3.14)
  }
}

[[syntax trees at end of            minibox-inject]] // DR1.scala
package <empty> {
  object DR1 extends Object with App {
    def foo[@miniboxed T](t: T): Unit = {
      val a: Any = t;
      println(a)
    };
    def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = {
      val a: Any = t;
      println(a)
    };
    DR1.this.foo_n_J[Double](8, 3.14)
  }
}

[[syntax trees at end of            minibox-coerce]] // DR1.scala
package <empty> {
  object DR1 extends Object with App {
    def foo[@miniboxed T](t: T): Unit = {
      val a: Any = t;
      println(a)
    };
    def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = {
      val a: Any = marker_minibox2box[T, Long](t);
      println(a)
    };
    DR1.this.foo_n_J[Double](8, marker_box2minibox[Double, Long](3.14))
  }
}

[[syntax trees at end of            minibox-commit]] // DR1.scala
package <empty> {
  object DR1 extends Object with App {
    def foo[@miniboxed T](t: T): Unit = {
      val a: Any = t;
      println(a)
    };
    def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = {
      val a: Any = MiniboxConversions.this.minibox2box[T](t, T_TypeTag);
      println(a)
    };
    DR1.this.foo_n_J[Double](8, MiniboxConversions.this.double2minibox(3.14))
  }
}

The first phase, minibox-inject creates the two versions of foo and redirects the call foo(3.14) to foo_n_J(DOUBLE, 3.14). At this point in the transformation, the signature of foo_n_J includes T @storage[Long]. This annotation signals that the type will be later represented as a long integer, but, at this stage, remains a generic type T. Therefore the code val a: Any = t is still correct, since T @storage[Long] is compatible to Any, the top type in the Scala hierarchy. So far so good...

But the minibox-coerce phase makes annotated types incompatible with their direct counterparts, which, in turn, requires the introduction of explicit coercions between the two. Specifically, the code val a: Any = t is rewritten to val a: Any = marker_minibox2box[T, Long](t), since at this stage of the transformation, T @storage[Long] is no longer a subtype of Any, which is not annotated. As the name suggests, the coercion introduced at this point is a marker, not the final coercion.

The minibox-commit phase commits to the actual alternative representation, which, in this case, is Long. The signature of foo_n_J becomes def foo_n_J[T](T_TypeTag: Byte, t: Long) and the marker coercion is replaced by MiniboxConversions.this.minibox2box[T](t, T_TypeTag).

This three stage-transformation allows the miniboxing plugin to robustly, correctly and optimally transform any code, from simple examples to very complex library collection code, which uses all the language features, such as higher-kinded types, closures and implicits.

Object Methods

...

To start, it is crucial to understand that simply transforming Tsp to Long in a specialized variant of a class is not a trivial transformation, since coercions (conversions from one representation to the other) need to be introduced correctly and optimally:

def foo[@miniboxed T](t: T): Unit = {
  println(t.toString)
}

Since the miniboxed version of the code, where T is replaced by Long can be used for all primitive types, including Double, simply printing t would not produce the double-precision floating point we expect, but its long integer encoding. This is why there is a need for a more refined translation for the miniboxed variant foo_J:

def foo_J(T_Tag: Byte, t: Long): Unit = {
  println(/* what should be here? */)
}

To test the miniboxing plugin, we need to wrap the foo method in an object:

object DR1 {
  def foo[@miniboxed T](t: T): Unit = {
    println(t.toString)
  }
}

Compiling this example with -Xprint:minibox will produce (the output has been simplified to improve readability):

$ mb-scalac DR1.scala -Xprint:minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of            minibox-inject]] // DR1.scala
package <empty> {
  object DR1 extends Object {
    ...
    def foo[@miniboxed T](t: T): Unit = println(t.toString());
    def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(t.toString())
  }
}

[[syntax trees at end of            minibox-coerce]] // DR1.scala
package <empty> {
  object DR1 extends Object {
    ...
    def foo[@miniboxed T](t: T): Unit = println(t.toString());
    def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(marker_minibox2box[T, Long](t).toString())
  }
}

[[syntax trees at end of            minibox-commit]] // DR1.scala
package <empty> {
  object DR1 extends Object {
    ...
    def foo[@miniboxed T](t: T): Unit = scala.this.Predef.println(t.toString());
    def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = println(MiniboxDispatch.mboxed_toString(t, T_TypeTag))
  }
}

It is now clear that miniboxing works in three steps:

minibox-inject duplicates the method and adds the @storage[Long] annotation to types that need to be later transformed into Long
minibox-coerce introduces explicit coercions such as marker_minibox2box[T, Long]
which the minibox-commit phase rewrites to MiniboxDispatch.mboxed_toString which is an optimized to_String implementation.

So far, this example has shown that miniboxing is indeed structured according to the Data Representation Mechanism, into three phases, which gradually introduce and interpret conversions between different representations.

Next Steps

You can continue with the following resources:

Welcome to the **Miniboxing** **Wiki**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running | Data Representation

Compiler Phases

Motivation

Object Methods

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally