-
Notifications
You must be signed in to change notification settings - Fork 16
Running | Data Representation
This tutorial will pick up where the introduction left off. It will discuss the data representation aspects of the miniboxing transformation, as explained in the Unifying Data Representation Transformations paper.
In the introduction, we created a class C:
class C[@miniboxed T](val t: T)and noticed that its specialized variants C_L and C_J included a weird Tsp @storage[Long] type in the minibox-inject compiler phase, which was later transformed to Long in minibox-commit. Let us look at this process.
First of all, there are several miniboxing phases in the compiler pipeline:
$ mb-scalac -Xshow-phases
phase name id description
---------- -- -----------
... .. ...
uncurry 13 uncurry, translate function values to anonymous classes
minibox-inject 14
minibox-coerce 15
minibox-commit 16
tailcalls 17 replace tail calls by jumps
... .. ...Here we see the main three phases introduced by the miniboxing plugin (there are another 3 which are introduced for purely technical reasons, to maintain compatibility with the rest of the compiler: pretyper, posttyper and hijacker). The main tree phases map directly to the data representation mechanism phases:
-
minibox-injectduplicates methods and classes and adds the@storageannotation -
minibox-coerceintroduces explicit coercions between boxed and miniboxed values -
minibox-commitgives the final semantics to annotated types and coercions
This sounds like a lot of work for an otherwise trivial task: transforming a type T to Long. To see why this is necessary, let us take an example:
object DR1 extends App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t
println(a)
}
foo(3.14)
}Compiling this code will produce two versions of the method: foo, the generic variant and foo_n_J, that encodes primitive types in a long integer. The last call in the object, to foo(3.24) will be rewritten to use foo_n_J.
Yet, the more interesting part is how the val a: Any = t statement is translated. If we simply replace T by Long, a call to foo would not print 3.14 as expected, but the long integer encoding of the floating-point number, which is not desirable.
Let us see how the miniboxing transformation handles this case (output simplified for readability):
$ mb-scalac DR1.scala -Xprint:uncurry,minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of uncurry]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
DR1.this.foo[Double](3.14)
}
}
[[syntax trees at end of minibox-inject]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = {
val a: Any = t;
println(a)
};
DR1.this.foo_n_J[Double](8, 3.14)
}
}
[[syntax trees at end of minibox-coerce]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = {
val a: Any = marker_minibox2box[T, Long](t);
println(a)
};
DR1.this.foo_n_J[Double](8, marker_box2minibox[Double, Long](3.14))
}
}
[[syntax trees at end of minibox-commit]] // DR1.scala
package <empty> {
object DR1 extends Object with App {
def foo[@miniboxed T](t: T): Unit = {
val a: Any = t;
println(a)
};
def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = {
val a: Any = MiniboxConversions.this.minibox2box[T](t, T_TypeTag);
println(a)
};
DR1.this.foo_n_J[Double](8, MiniboxConversions.this.double2minibox(3.14))
}
}The first phase, minibox-inject creates the two versions of foo and redirects the call foo(3.14) to foo_n_J(DOUBLE, 3.14). At this point in the transformation, the signature of foo_n_J includes T @storage[Long]. This annotation signals that the type will be later represented as a long integer, but, at this stage, remains a generic type T. Therefore the code val a: Any = t is still correct, since T @storage[Long] is compatible to Any, the top type in the Scala hierarchy. So far so good...
But the minibox-coerce phase makes annotated types incompatible with their direct counterparts, which, in turn, requires the introduction of explicit coercions between the two. Specifically, the code val a: Any = t is rewritten to val a: Any = marker_minibox2box[T, Long](t), since at this stage of the transformation, T @storage[Long] is no longer a subtype of Any, which is not annotated. As the name suggests, the coercion introduced at this point is a marker, not the final coercion.
The minibox-commit phase commits to the actual alternative representation, which, in this case, is Long. The signature of foo_n_J becomes def foo_n_J[T](T_TypeTag: Byte, t: Long) and the marker coercion is replaced by MiniboxConversions.this.minibox2box[T](t, T_TypeTag).
This three stage-transformation allows the miniboxing plugin to robustly, correctly and optimally transform any code, from simple examples to very complex library collection code, which uses all the language features, such as higher-kinded types, closures and implicits.
...
To start, it is crucial to understand that simply transforming Tsp to Long in a specialized variant of a class is not a trivial transformation, since coercions (conversions from one representation to the other) need to be introduced correctly and optimally:
def foo[@miniboxed T](t: T): Unit = {
println(t.toString)
}Since the miniboxed version of the code, where T is replaced by Long can be used for all primitive types, including Double, simply printing t would not produce the double-precision floating point we expect, but its long integer encoding. This is why there is a need for a more refined translation for the miniboxed variant foo_J:
def foo_J(T_Tag: Byte, t: Long): Unit = {
println(/* what should be here? */)
}To test the miniboxing plugin, we need to wrap the foo method in an object:
object DR1 {
def foo[@miniboxed T](t: T): Unit = {
println(t.toString)
}
}Compiling this example with -Xprint:minibox will produce (the output has been simplified to improve readability):
$ mb-scalac DR1.scala -Xprint:minibox
warning: 'minibox' selects 3 phases
[[syntax trees at end of minibox-inject]] // DR1.scala
package <empty> {
object DR1 extends Object {
...
def foo[@miniboxed T](t: T): Unit = println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(t.toString())
}
}
[[syntax trees at end of minibox-coerce]] // DR1.scala
package <empty> {
object DR1 extends Object {
...
def foo[@miniboxed T](t: T): Unit = println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: T @storage[Long]): Unit = println(marker_minibox2box[T, Long](t).toString())
}
}
[[syntax trees at end of minibox-commit]] // DR1.scala
package <empty> {
object DR1 extends Object {
...
def foo[@miniboxed T](t: T): Unit = scala.this.Predef.println(t.toString());
def foo_n_J[T](T_TypeTag: Byte, t: Long): Unit = println(MiniboxDispatch.mboxed_toString(t, T_TypeTag))
}
}It is now clear that miniboxing works in three steps:
-
minibox-injectduplicates the method and adds the@storage[Long]annotation to types that need to be later transformed intoLong -
minibox-coerceintroduces explicit coercions such asmarker_minibox2box[T, Long] - which the
minibox-commitphase rewrites toMiniboxDispatch.mboxed_toStringwhich is an optimizedto_Stringimplementation.
So far, this example has shown that miniboxing is indeed structured according to the Data Representation Mechanism, into three phases, which gradually introduce and interpret conversions between different representations.
You can continue with the following resources: