This package provides a querying and transformation facility for abstract syntax trees (ASTs). The language, XForm, loosely follows the syntax and semantics of the XPath 2.0 XML query language, but with added support for AST transformations. However, there are some key differences between the XForm language and XPath, so a careful reading of this document is essential.
Also note that example XForm queries can be found here.
Additional examples can be found in the xform/samples
subdirectory.
XForm is a declarative language that can be used to query or transform an AST. An XForm script, or query, is composed of a series of one or more comma-separated expressions, each returning a sequence of tree items. If the query is composed of a single expression, its result is the value of the expression. However, if a query is composed of more than one expression, its result will be a list of the values of each expression (a list of lists). Note that internally, the query engine implicitly flattens all lists of lists.
It is also important to note that the result of each expression in a list of expressions provides the focus (discussed below) for the expression that follows itAll XForm expressions generate ordered sets of tree items, called sequences
A tree item may be a node within the AST, a string, or null
. Nested
sequences are not permitted.
The most basic expression in XForm is the path expression. A path expression produces a sequence of all items in an AST that conform to a specified template. The format of a path expression template is:
Path expression<- "/" RelativePathExpression? / "inside_out"? "//" RelativePathExpression / RelativePathExpression RelativePathExpression <- RelativePathExpression ("/" / "//") StepExpression / StepExpression StepExpression <- ItemTest PredicateList / ".." ItemTest <- PrimaryExpression / Identifier / "*" PredicateList <- Predicate* Predicate <- "[" (IntegerLiteral / Expression / FunctionCall ) "]" PrimaryExpression <- "." / StringLiteral / VariableReference / FunctionCall / ParenthesizedExpression
The template begins with an optional path specifier - the initial focus of
an expression. The focus of an expression can be thought of as the environment
that an expression is evaluated in - it represents the context for each step
of an expression. So in E1/E2
or E1 [E2]
, the value
of E1
is computed, which then acts as the focus for computing the
value of E2
.
If the specifier is /
, then the search will take place relative
to the AST root. For example, the template/FOO/BAR
will return
all BAR
nodes whose parent is a FOO
node and whose
grandparent is the AST root node. If the specifier is //
, then
all items that satisfy the template will be considered (so //FOO
will collect all FOO
nodes in the AST). If the keyword inside_out
is prepended to a path expression using //
, items will be chosen
based on a bottom-up, breadth-first traversal of the AST. If the path specifier
is omitted, then the nodes collected will have to be relative to the first step
expression in the template (so $x/FOO
will collect all FOO
children of the nodes in the sequence $x
). If a step expressions
are separated by//
, the latter expression will be matched against
all descendents of the former, rather than just its children. (So, //FOO//BAR
will collect all FOO
descendents of all BAR
nodes).
The individual steps of a path expression are known as step expressions. There
are two different kinds of step expressions -forward and reverse. A forward
step expression filters the children or the descendents of the current focus,
whereas the reverse step expression, ..
, moves back up the tree,
to the level of the focus' parents. (E.g.. //FOO/..
would collect
the parents of every FOO
node in the AST).
Of the forward step expressions, tests include matching against the names of
an item (as in /FOO/BAR
or/FOO/"j"
), matching all
nodes (/FOO/*
), or matching against a primary expression.
A primary expression represents a value. There are five kinds.
.
, which represents the current tree item
being processed (so //FOO/.
would resolve to all FOO
nodes in the tree)."baz"
.$stat
. Variable references begin
with a dollar-sign, and represent a previously bound sequence of items.test($a,$b)
, which can be used to
call an external function, and whose value and parameters are all sequences.
Instructions on how to add external functions programmed in Java are described
later.(//CompoundStatement)
whose
value is that of the expressions it contains, concatenated together.A step expression may contain one or more predicates. A predicate takes the
form of [ Expression ]
, and represents a sequence. Each predicate
will intersect its sequence with the current focus, producing a new focus for
the evaluation of any subsequent predicates. The value of the final intersection
is the value of the step expression.
A predicate containing an integer literal (beginning at one) represents an
index into the current focus. So //FOO [1]
would return the first
FOO
node in the AST.
When using predicates, it is important to note that within a predicate, the
focus becomes that of the current step expression. So, for example, the
value of //FOO/BAR [BAZ]
, is a sequence containing all BAR
nodes with FOO
parents and BAZ
children.
Also note that tree traversals are done in a depth-first, in-order manner.
If you would like to do an "inner" traversal, that is, traverse the tree in
a bottom-up manner, you may use the inner
keyword. For example,
replace inner //ForStatement with FOO<>
will replace all ForStatement
nodes in the AST with FOO
nodes, but the order of replacement will be inside-out.
In addition to searching an AST, XForm provides facilities for manipulating sequences. These facilities include binding variables, looping through sequences, conditionals, creating new items and evaluating logical operations over sequences.
To bind a sequence to a variable for later use, one can use a let
expression. The syntax of a let expression is as follows:
LetExpression <- "let" VariableBindingList "return" SingleExpression VariableBindingList <- VariableBinding ("," VariableBinding)* VariableBinding <- VariableReference "be" SingleExpression
A let expression binds one or more sequences to variables for the duration of its single expression. For example,
let $f be //ForStatement, $cs be //CompoundStatement return ($f, $cs)would return a sequence composed of all the for statements and all the compound statements in an AST (a parenthesized expression concatenates the results of the expressions within).
Two options exist to iterate through sequences - for
and cfor
.
Their formats are:
ForExpression <- "for" IterativeBindingList "return" SingleExpression CForExpression <- "cfor" IterativeBindingList "return" SingleExpression IterativeBindingList <- IterativeBinding ("," IterativeBinding)* IterativeBinding <- VariableReference "in" SingleExpression
A for statement iterates through one or more sequences, binding the resultant singletons to the specified variables (the bindings hold for the duration of the expression's single expression) and evaluating its single expression. If more than one sequence is specified to iterate through, the iterations will be implicitly nested. So,
for $x in //FOO, $y in //BAR return $x union $yis equivalent to
for $x in //FOO return for $y in //BAR return $x union $y
A cfor statement acts like a for statement, except that it iterates each of its bound variables concurrently, halting when one of them reaches its end. In either case, the resulting sequence of a looping expression is the concatenation of each sequence that it returns.
Conditional expressions in XForm take the following form:
IfExpression <- "if" SingleExpression "then" SingleExpression "else" SingleExpression
Conditional expressions evaluate the first single expression. If that expression is a non-empty sequence, the second single expression is evaluated and its value is returned. Otherwise, the third single expression is evaluated and its value is returned.
Set operations XForm take the following form:
UnionExpression <- UnionExpression "union" IntersectionExpression IntersectionExpression <- IntersectionExpression "intersect" PathExpression DifferExpression <- DifferExpression "differ" LogicalExpression
A union expression returns the union of two sequences, whereas an intersection expression returns their intersection and the differ expression returns their difference. Note that these operations are based on identity (so union is not a concatenation). If you wish to concatenate two single expressions, wrap them in a parenthesized expression
or
and .
An or
expression allows you to group together a series of pathexpressions
and select the value of the first path expression that returns a non-null sequence.
For example, the query:
//Foo or //Bar
will select the sequence returned by //Foo
if that sequence is non-null.
Should the sequence be null, the results of //Bar
will be used.
An and
expression also lets you group together a series of pathexpressions.
If the value of each path expression in the series is non-null, all of the values
are concatenated together into a single sequence and then returned. Otherwise,
a null value is returned. For example, the query:
//Foo and //Bar
will return each Foo
and Bar
node in the tree, should the tree contain both Foo
and Bar
nodes. Otherwise,
it will return a null value.
Note that these logical operations can be embedded inside of a pathexpression
by way of parenthesized expressions. For example,
//Foo/(Bar or Zoo)
will return all Bar
children of Foo
nodes, should any
exist. Otherwise, it will attempt to return any Zoo
children of Foo
nodes. Whereas,
//Foo/(Bar and Zoo)
will return any Bar
nodes with Foo
parents and Zoo
siblings, and any Zoo
nodes with Foo
parents and Bar
siblings, should both sequences be non-null.
New tree items may be generated with a new item expression. A newitem expression creates a singleton sequence containing a new tree item. It takes the following form:
NewItemExpression <- "null" / StringLiteral / NewNodeExpression NewNodeExpression <- Identifier "<" Children ">" Children <- Child ("," Child)* Child <- "null" / NewNodeExpression / SingleExpression / /* empty */
Note that the last type of child is blank, which means that the newly created
node will have no children (which is not the same as having a null child).
So FOO<>
would create a FOO
node with no children,
whereas FOO<null>
would create a FOO
node with
a single null child.
Modifications to abstract syntax trees are done using replacement expressions, removal expressions, addition expressions, or insertion expressions. Users transform ASTs in one of two ways. The first is to query the AST for items to replace and then replacing them with new or existing items. The second way is to query the AST for insertion position markers and then inserting new or existing items.
The format of a replacement expression is:
ReplacementExpression <- "replace" SingleExpression "with" SingleExpression
The value of a replacement expression is a sequence containing the items that have been inserted (or moved) within the tree, otherwise, if no replacements have been made, it's an empty sequence. For example,
for $f in //ForStatement replace $f with CompoundStatement<>would replace all of the for statements in an AST with new compoundstatements, whereas
for $f in //ForStatement replace $f with //CompoundStatement [1]would replace each for statement in the tree with one the tree's first compound statement. Note that a sequence need not be bound to a variable for its items to be replaced, so
replace //BAR with FOO<>would replace any
BAR
nodes in the AST with FOO
nodes.
Note that in the case of a replacement expression, in the context of a for expression, you can omit the "return".
The format of a remove Expression is:
"remove" SingleExpression
For example, remove //IfStatement
removes all IfStatements from
the AST.
The format of an add Expression is:
"add" SingleExpression "to" SingleExpression
For example, add Foo<> to //Bar
adds a Foo node (as the last child)
to all Bar nodes.
The format of an insert expression is:
insert SingleExpression before SingleExpression
insert SingleExpression after SingleExpression
For example, insert Foo<> before //Bar<>
would create
and insert new Foo<> items before every Bar node in the AST.
The value of an insertion operation is a sequence containing the items inserted,
or an empty sequence if no insertions are made. insert expression operate only on
none empty sequences. To illustrate, insert Foo<> before //Block/*[1]
adds a Foo<> node before the first child in a Block node. It will not, however, add
a Foo<> node to a an empty Block. This behaviour, if desired, can be implemented with
addtional commands; for example, replace empty(//Block<>) with Block< Foo<> >
A user may add functionality to XForm by way of external functions. To add an external function to XForm, one must do the following:
getName
method return the name of the function
as it will be called, and have the apply
method return a list
of items.import
statement
at the beginning of your query. The import statement takes a list of quoted
class names, and loads the named classes. For example,
import "xtc.xform.IsNullFunction","xtc.xform.TestFunction"
would load
the external function classes"IsNullFunction" and "TestFunction".
Example0: Using Xform queries to find empty IfStatements
1. //example0.java
2. class example1{
3. public void bar(){
4. Boolean foo = true;
5. if( foo ){
6. //empty!
7. }
8. int i = 0;
9. if( foo ) //no braces!
10. i++; //not empty
11. if( !foo ); //empty!!
12. }
13. }
In example0.java above, the IfStatements
on lines 5 and
6 are empty. To find them using xform, we must first identify the AST structure
of empty IfStatement
s. Using the xform driver's -preAST
function, we deduce that empty IfStatements
come in two flavours:
either an IfStatement
item with an empty Block child or an
IfStatement
item with an EmptyStatement child. The query
1. #find all the emptyIfStatements
2. empty(//IfStatement/Block)/.. union //IfStatement[ EmptyStatement ]
will return all emptyIfStatement
. The XForm library function empty
returns sequence items that have no children so empty(//IfStatement/Block)
will return all empty blocks belong to IfStatement
. Adding /..
will return the parents of those blocks. //IfStatement[ EmptyStatement
]
returns all IfStatement items that have an EmptyStatement
child. The union of the two expressions gives all Statements in the program.
For additional feedback, the library function lines() can be used to report the line and column information of the items identified. Executing the with query
1. #example0.xform
2. #find all the emptyIfStatements
3. lines( empty(//IfStatement/Block)/.. union //IfStatement[ EmptyStatement ] )
with java xtc.xform.Driver -java example0.xform example0.java would display:
example0.java: 5:7
example0.java: 11:7
Example1: Using XForm transformations to ensure that all IfStatements in a Java program have braces.
1. //example1.java
2. class example1{
3. public void bar(){
4. Boolean foo = true;
5 if( foo ){
6. //this one is good
7. }
8. int i = 0;
9. if( foo ) //no braces!
10. i++;
11. }
12. }
In example1.java above the IfStatement on line 8 violates the coding
convention that states all if statements should have curly braces. To correct
this using xform transformations, we must first query the program's AST to find
all braceless IfStatement
. Using the xform Driver's preAST command
line option (or alternately figuring it out from the grammar), we realize that
the AST representation of a braceless if Statement does not have a Block enclosing
it's body.
The IfStatement on line 5 is represented as
IfStatement<
PrimaryIdentifier<"foo">,
Block<>
>
while the one on line 8 is represented as
IfStatement<
PrimaryIdentifier<"foo">,
ExpressionStatement<
PostfixExpression<
PrimaryIdentifier<"i">,
PostincrementTail<>
>
>
>
We can find braceless IfStatements
with the XForm query
//IfStatement differ IfStatement[ Block ]
The complete query to add missing braces is
1. #example1.xform
2 #add missing braces toIfStatement
3. for $i in //IfStatement differ //IfStatement[ Block ] return
4. replace $i with IfStatement< $i/*[1], Block< $i/*[2] > >
Line 3 in the addifbrace.xform iterates over each Blockless if statement
in the AST Line 4 creates a new IfStatement item with two children and replaces
the original. The first child in the new item is the same as the first child
of the original IfStatement - meaning the if( foo )
part of the
statement is rewritten/preserved as if( foo )
. The second child
is a new Block item containing the second child of the original IfStatement.
The command java xtc.xform.Driver -java -printJava example1.xform
example1.java
produces the output
1. class example1{
2. public void bar(){
3. Boolean foo = true;
4 if ( foo ){
5. }
6. int i = 0;
7. if ( foo ){
8. i++;
9. }
10. }
11. }
Example2: Transforming for loops to enforce 'Must Have Braces' rule.
The procedure for transforming ForStatement
s similar to the IfStatement
transformation in Example1. The difference, however, is that the first 3 children
of each Blockless ForStatement
must be preserved in the rewrite.
This can be done with the query
1. #add missing forStatement braces
2. for $f in //ForStatement differ //ForStatement[ Block ] return
3. replace $f with ForStatement< $f/*[1], $f/*[2], $f/*[3], Block< $f/*[4]> >
or by using XForms subsequence()
library function in the query
1. #example2.xform
2. for $f in //ForStatement differ //ForStatement[ Block ] return
3. replace $f with ForStatement< subsequence( $f/*, 1, 3 ), Block< $f/*[4] > >
1. //example2.java
2. class example2{3. public void bar(){
4. int j = 0;
5. for( int i = 0; i < 10; i++ )
6. j++;
7.
8. for(;;)
9. j--;
10. }
11.}
On the file example2.java (shown above), the query example2.xform produces the following transformation.
1. class example2{
2. public void bar(){
3. int j = 0;
4. for( int i = 0; i < 10; i++ ){
5. j++;
6. }
7. for (;;){
8. j--;
9. }
10. }
11.}
Example3: Using XTC and Xform to create an simple extension to Java
(JavaProperty), that adds C# like properties. Note that this example does not
have the full set of C# property features. Note also that all code in this example
can be found in xform/samples/javaproperty
In our extension, we wish to write property declarations in the form "property int foo". A desugaring transformation convert the JavaProperty code to Java code that contains generate accessors methods for this variable in this declaration. For example the JavaProperty code:
1. //sample.jprop2. class sample{3. property int foo;4. }
should desugared to the following Java code
1. //sample.java2. class sample{
3. private int foo;
4. public void setfoo( int val ){
5. foo = val;
6. }
7. public int getfoo(){
8. return foo;
9. }
10. }
First we use xtc to define a grammar for the extension. This involves adding
"property" as a keyword and as a Modifer.
The JavaPropertyKW.Rats file below adds 'property' as a keyword to the existing
list of Java keywords.
1. //JavaPropertyKW.rats
2. module xtc.xform.samples.javaproperty.JavaPropertyKW;
3. import xtc.lang.JavaIdentifier(xtc.lang.JavaSymbol, xtc.util.Spacing);
4.
5. body {
6. static{
7. add(JAVA_KEYWORDS, new Object[] { "property" });
8. }
9. }
The JavaProperty.Rats file below modifies the existing Java core grammar (JavaCore)
to add 'property' as a Modifier. For more details on grammar modification see:
Rats!
1. //JavaProperty.Rats
2. module xtc.xform.samples.javaproperty.JavaProperty;
3. instantiate xtc.lang.JavaType(xtc.lang.JavaIdentifier, xtc.lang.JavaSymbol);
4. instantiate xtc.lang.JavaConstant(xtc.lang.JavaIdentifier, xtc.util.Spacing);
5. instantiate xtc.lang.JavaIdentifier(xtc.lang.JavaSymbol, xtc.util.Spacing);
6. instantiate xtc.util.Symbol(xtc.util.Spacing);
7. instantiate xtc.lang.JavaSymbol(xtc.util.Symbol);
8. instantiate xtc.util.Spacing;
9. import xtc.xform.samples.javaproperty.JavaPropertyKW;
10.
11. modify xtc.lang.JavaCore(xtc.lang.JavaType, xtc.lang.JavaConstant, xtc.lang.JavaIdentifier, xtc.lang.JavaSymbol,xtc.util.Spacing, xtc.util.Null );
12. option withLocation, constant, parser(xtc.xform.JavaPropertyParser);
13.
14. String Modifier += <Strictfp> ... / "property":Word;
We can now generate a parser for our JavaProperty grammar by typing 'make'
in xform/samples/javaproperty
to execute the following commands:
java xtc.parser.Rats -in ~/java/src JavaProperty.rats
javac -source 1.4 -d ~/java/classes -sourcepath ~java/src JavaPropertyParser.java
Finally, using the Xform query JPTrans.xform shown below, we can transform
JavaProperty code to Java code with the command:
java xtc.xform.Driver -printJava -parser xtc.xform.samples.javaproperty.JavaPropertyParser
JPTrans.xform sample.jprop
This command line instructs the XForm engine to 1. parse sample.jprop
using the JavaPropertyParser
2. Execute the query JPTrans.xform
3. Pretty print the resulting code using xtc's Java PrettyPrinter
JPTrans.xform is explained as follows: Line 5 finds all property declarations. Lines 6-18 inserts getter methods after each property declaration. Line 7 makes the method public. Line 8 sets the methods return type to be the same as the field type. Line 9 sets the method name as the concatenation of "get" and the field's name. Lines 14-16 add a return statement with the field name as an identifier. Lines 21-45 add a setter method after each property declaration. This is similar to the xform code for adding getter methods, the main differences are the return types and an assignment statement instead of a Return statement. Last, lines 49-55 rewrite all property declarations to private field declarations with the same name and type.
1. #JPTrans.xform 2. #XFrom desugaring from JavaProperty to pure Java. 3. 4. #add a getter method for the field 5. for $i in //FieldDeclaration/Modifiers/"property"/../.. return 6. insert MethodDeclaration<7. Modifiers<"public">, 8. $i/*[2], 9. concat( "get", $i/Declarators/Declarator/*[1] ), 10. FormalParameters<>, 11. null, 12. null, 13. Block< 14. ReturnStatement< 15. PrimaryIdentifier<$i/Declarators/Declarator/*[1]> 16. > 17. > 18. > after $i, 19. 20. #add a setter method for the field 21. for $i in //FieldDeclaration/Modifiers/"property"/../.. return 22. insert MethodDeclaration< 23. Modifiers<"public">, 24. VoidTypeSpecifier<>, 25. concat( "set", $i/Declarators/Declarator/*[1] ) , 26. FormalParameters< 27. FormalParameter< 28. null, 29. $i/*[2], 30. "val", 31. null 32. > 33. >, 34. null, 35. null, 36. Block< 37. ExpressionStatement< 38. Expression< 39. PrimaryIdentifier<$i/Declarators/Declarator/*[1]>, 40. "=", 41. PrimaryIdentifier<"val"> 42. > 43. > 44. > 45. > after $i, 46. 47. #replace the property declaration with a private field declaration 48. for $i in //FieldDeclaration/Modifiers/"property"/../.. return 49. replace $i with FieldDeclaration< 50. Modifiers<"private">, 51. $i/*[2], 52. Declarators< 53. Declarator< $i/Declarators/Declarator/*[1], null, null > 54. > 55. > 56.