-
Notifications
You must be signed in to change notification settings - Fork 16
Running | Data Representation
This tutorial will pick up where the introduction left off. It will discuss the data representation aspects of the miniboxing transformation, as explained in the Unifying Data Representation Transformations paper.
In the introduction, we created a class C:
class C[@miniboxed T](val t: T)and noticed that its specialized variants C_L and C_J included a weird Tsp @storage[Long] type in the minibox-inject compiler phase, which was later transformed to Long in minibox-commit. Let us look at this process.
First of all, there are several miniboxing phases in the compiler pipeline:
$ mb-scalac -Xshow-phases
phase name id description
---------- -- -----------
... .. ...
uncurry 13 uncurry, translate function values to anonymous classes
minibox-inject 14
minibox-coerce 15
minibox-commit 16
tailcalls 17 replace tail calls by jumps
... .. ...Here we see the main three phases introduced by the miniboxing plugin (there are another 3 which are introduced for purely technical reasons, to maintain compatibility with the rest of the compiler: pretyper, posttyper and hijacker). The main tree phases map exactly to the data representation mechanism phases:
-
minibox-injectduplicates methods and classes and adds the@storageannotation -
minibox-coerceintroduces explicit coercions between boxed and miniboxed values -
minibox-commitgives the final semantics to annotated types and coercions
This sounds like a lot of work for an otherwise trivial task: transforming a type T to Long. To see why this is necessary, let us take an example:
object DR1 extends App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t
println(a)
}
foo(3.14)
}Compiling this code will produce two versions of the method: foo, the generic variant and foo_J, that encodes primitive types in a long integer. The last call in the object, to foo(3.24) will be rewritten to use foo_J.
Yet, the more interesting part is how the val a: Any = t statement is translated. If we simply replaced T by Long, a call to foo would not print 3.14 as expected, but the long integer encoding of the floating-point number.
To start, it is crucial to understand that simply transforming Tsp to Long in a specialized variant of a class is not a trivial transformation, since coercions (conversions from one representation to the other) need to be introduced correctly and optimally:
def foo[@miniboxed T](t: T): Unit = {
println(t.toString)
}Since the miniboxed version of the code, where T is replaced by Long can be used for all primitive types, including Double, simply printing t would not produce the double-precision floating point we expect, but its long integer encoding. This is why there is a need for a more refined translation for the miniboxed variant foo_J:
def foo_J(T_Tag: Byte, t: Long): Unit = {
println(/* what should be here? */)
}To test the miniboxing plugin, we need to wrap the foo method in an object:
object DR1 {
def foo[@miniboxed T](t: T): Unit = {
println(t.toString)
}
}Compiling this example with -Xprint:minibox will produce (the output has been simplified to improve readability):
$ mb-scalac DR1.scala -Xprint:minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of minibox-inject]] // DR1.scala
package <empty> {
object DR1 extends Object {
...
def foo[@miniboxed T](t: T): Unit = println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(t.toString())
}
}
[[syntax trees at end of minibox-coerce]] // DR1.scala
package <empty> {
object DR1 extends Object {
...
def foo[@miniboxed T](t: T): Unit = println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(marker_minibox2box[T, Long](t).toString())
}
}
[[syntax trees at end of minibox-commit]] // DR1.scala
package <empty> {
object DR1 extends Object {
...
def foo[@miniboxed T](t: T): Unit = scala.this.Predef.println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = println(MiniboxDispatch.mboxed_toString(t, T_TypeTag))
}
}It is now clear that miniboxing works in three steps:
-
minibox-injectduplicates the method and adds the@storage[Long]annotation to types that need to be later transformed intoLong -
minibox-coerceintroduces explicit coercions such asmarker_minibox2box[T, Long] - which the
minibox-commitphase rewrites toMiniboxDispatch.mboxed_toStringwhich is an optimizedto_Stringimplementation.
So far, this example has shown that miniboxing is indeed structured according to the Data Representation Mechanism, into three phases, which gradually introduce and interpret conversions between different representations.
You can continue with the following resources: