Running | Data Representation

This tutorial will pick up where the introduction left off. It will discuss the data representation aspects of the miniboxing transformation, as explained in the Unifying Data Representation Transformations paper.

In the introduction, we created a class C:

class C[@miniboxed T](val t: T)

and noticed that its specialized variants C_L and C_J included a weird Tsp @storage[Long] type in the minibox-inject compiler phase, which was later transformed to Long in minibox-commit. Let us look at this process.

First of all, there are several miniboxing phases in the compiler pipeline:

$ mb-scalac -Xshow-phases
    phase name  id  description
    ----------  --  -----------
           ...  ..  ...
       uncurry  13  uncurry, translate function values to anonymous classes
minibox-inject  14
minibox-coerce  15
minibox-commit  16
     tailcalls  17  replace tail calls by jumps
           ...  ..  ...

Here we see the main three phases introduced by the miniboxing plugin (there are another 3 which are introduced for purely technical reasons, to maintain compatibility with the rest of the compiler: pretyper, posttyper and hijacker). The main tree phases map exactly to the data representation mechanism phases:

minibox-inject duplicates methods and classes and adds the @storage annotation
minibox-coerce introduces explicit coercions between boxed and miniboxed values
minibox-commit gives the final semantics to annotated types and coercions

This sounds like a lot of work for an otherwise trivial task: transforming a type T to Long. To see why this is necessary, let us take an example:

object DR1 extends App {
  def foo[@miniboxed T](t: T): Unit = {
    val a: Any = t
    println(a)
  }
  foo(3.14)
}

Compiling this code will produce two versions of the method: foo, the generic variant and foo_J, that encodes primitive types in a long integer. The last call in the object, to foo(3.24) will be rewritten to use foo_J.

Yet, the more interesting part is how the val a: Any = t statement is translated. If we simply replaced T by Long, a call to foo would not print 3.14 as expected, but the long integer encoding of the floating-point number.

To start, it is crucial to understand that simply transforming Tsp to Long in a specialized variant of a class is not a trivial transformation, since coercions (conversions from one representation to the other) need to be introduced correctly and optimally:

def foo[@miniboxed T](t: T): Unit = {
  println(t.toString)
}

Since the miniboxed version of the code, where T is replaced by Long can be used for all primitive types, including Double, simply printing t would not produce the double-precision floating point we expect, but its long integer encoding. This is why there is a need for a more refined translation for the miniboxed variant foo_J:

def foo_J(T_Tag: Byte, t: Long): Unit = {
  println(/* what should be here? */)
}

To test the miniboxing plugin, we need to wrap the foo method in an object:

object DR1 {
  def foo[@miniboxed T](t: T): Unit = {
    println(t.toString)
  }
}

Compiling this example with -Xprint:minibox will produce (the output has been simplified to improve readability):

$ mb-scalac DR1.scala -Xprint:minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of            minibox-inject]] // DR1.scala
package <empty> {
  object DR1 extends Object {
    ...
    def foo[@miniboxed T](t: T): Unit = println(t.toString());
    def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(t.toString())
  }
}

[[syntax trees at end of            minibox-coerce]] // DR1.scala
package <empty> {
  object DR1 extends Object {
    ...
    def foo[@miniboxed T](t: T): Unit = println(t.toString());
    def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(marker_minibox2box[T, Long](t).toString())
  }
}

[[syntax trees at end of            minibox-commit]] // DR1.scala
package <empty> {
  object DR1 extends Object {
    ...
    def foo[@miniboxed T](t: T): Unit = scala.this.Predef.println(t.toString());
    def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = println(MiniboxDispatch.mboxed_toString(t, T_TypeTag))
  }
}

It is now clear that miniboxing works in three steps:

minibox-inject duplicates the method and adds the @storage[Long] annotation to types that need to be later transformed into Long
minibox-coerce introduces explicit coercions such as marker_minibox2box[T, Long]
which the minibox-commit phase rewrites to MiniboxDispatch.mboxed_toString which is an optimized to_String implementation.

So far, this example has shown that miniboxing is indeed structured according to the Data Representation Mechanism, into three phases, which gradually introduce and interpret conversions between different representations.

Next Steps

You can continue with the following resources:

Welcome to the **Miniboxing** **Wiki**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running | Data Representation

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally