Tree automaton
A tree automaton is a type of state machine. Tree automata deal with tree structures, rather than the strings of more conventional state machines.
The following article deals with branching tree automata, which correspond to regular languages of trees.
As with classical automata, finite tree automata (FTA) can be either a deterministic automaton or not. According to how the automaton processes the input tree, finite tree automata can be of two types: (a) bottom up, (b) top down. This is an important issue, as although nondeterministic (ND) topdown and ND bottomup tree automata are equivalent in expressive power, deterministic topdown automata are strictly less powerful than their deterministic bottomup counterparts, because tree properties specified by deterministic topdown tree automata can only depend on path properties. (Deterministic bottomup tree automata are as powerful as ND tree automata.)
Definitions
A bottomup finite tree automaton over F is defined as a tuple (Q, F, Q_{f}, Δ), where Q is a set of states, F is a ranked alphabet (i.e., an alphabet whose symbols have an associated arity), Q_{f} ⊆ Q is a set of final states, and Δ is a set of transition rules of the form f(q_{1}(x_{1}),...,q_{n}(x_{n})) → q(f(x_{1},...,x_{n})), for an nary f ∈ F, q, q_{i} ∈ Q, and x_{i} variables denoting subtrees. That is, members of Δ are rewrite rules from nodes whose childs' roots are states, to nodes whose roots are states. Thus the state of a node is deduced from the states of its children.
For n=0, that is, for a constant symbol f, the above transition rule definition reads f() → q(f()); often the empty parentheses are omitted for convenience: f → q(f). Since these transition rules for constant symbols (leaves) do not require a state, no explicitly defined initial states are needed. A bottomup tree automaton is run on a ground term over F, starting at all its leaves simultaneously and moving upwards, associating a run state from Q with each subterm. The term is accepted if its root is associated to an accepting state from Q_{f}.^{[1]}
A topdown finite tree automaton over F is defined as a tuple (Q, F, Q_{i}, Δ), with two differences with bottomup tree automata. First, Q_{i} ⊆ Q, the set of its initial states, replaces Q_{f}; second, its transition rules are oriented conversely: q(f(x_{1},...,x_{n})) → f(q_{1}(x_{1}),...,q_{n}(x_{n})), for an nary f ∈ F, q, q_{i} ∈ Q, and x_{i} variables denoting subtrees. That is, members of Δ are here rewrite rules from nodes whose roots are states to nodes whose childs' roots are states. A topdown automaton starts in some of its initial states at the root and moves downward along branches of the tree, associating along a run a state with each subterm inductively. A tree is accepted if every branch can be gone through this way.^{[2]}
A tree automaton is called deterministic (abbreviated DFTA) if no two rules from Δ have the same left hand side; otherwise it is called nondeterministic (NFTA).^{[3]} Nondeterministic topdown tree automata have the same expressive power as nondeterministic bottomup ones;^{[4]} the transition rules are simply reversed, and the final states become the initial states.
In contrast, deterministic topdown tree automata^{[5]} are less powerful than their bottomup counterparts, because in a deterministic tree automaton no two transition rules have the same lefthand side. For tree automata, transition rules are rewrite rules; and for topdown ones, the lefthand side will be parent nodes. Consequently, a deterministic topdown tree automaton will only be able to test for tree properties that are true in all branches, because the choice of the state to write into each child branch is determined at the parent node, without knowing the child branches contents.
Examples
Bottomup automaton accepting boolean lists
Employing coloring to distinguish members of F and Q, and using the ranked alphabet F={ false,true,nil,cons(.,.) }, with cons having arity 2 and all other symbols having arity 0, a bottomup tree automaton accepting the set of all finite lists of boolean values can be defined as (Q, F, Q_{f}, Δ) with Q={ Bool,BList }, Q_{f}={ BList }, and Δ consisting of the rules
false  →  Bool(false)  (1), 
true  →  Bool(true)  (2), 
nil  →  BList(nil)  (3), and 
cons(Bool(x_{1}),BList(x_{2}))  →  BList(cons(x_{1},x_{2}))  (4). 
In this example, the rules can be understood intuitively as assigning to each term its type in a bottomup manner; e.g. rule (4) can be read as "A term cons(x_{1},x_{2}) has type BList, provided x_{1} and x_{2} has type Bool and BList, respectively". An accepting example run is
cons(  false,  cons(  true,  nil  ))  
⇒  cons(  false,  cons(  true,  BList(nil)  ))  by (3) 
⇒  cons(  false,  cons(  Bool(true),  BList(nil)  ))  by (2) 
⇒  cons(  false,  BList(cons(  true,  nil  )))  by (4) 
⇒  cons(  Bool(false),  BList(cons(  true,  nil  )))  by (1) 
⇒  BList(cons(  false,  cons(  true,  nil  )))  by (4), accepted. 
Cf. the derivation of the same term from a regular tree grammar corresponding to the automaton, shown at Regular tree grammar#Examples.
An rejecting example run is
cons(  false,  true  )  
⇒  cons(  false,  Bool(true)  )  by (1) 
⇒  cons(  Bool(false),  Bool(true)  )  by (2), no further rule applicable. 
Intuivitvely, this corresponds to the term cons(false,true) not being welltyped.
Topdown automaton accepting multiples of 3 in binary notation
(A)  (B)  (C)  (D)  

String grammar rules 
String automaton transitions 
Tree automaton transitions 
Tree grammar rules 






Using the same colorization as above, this example shows how tree automata generalize ordinary string automata. The finite deterministic string automaton shown in the picture accepts all strings of binary digits that denote a multiple of 3. Using the notions from Deterministic finite automaton#Formal definition, it is defined by:
 the set Q of states being { S_{0}, S_{1}, S_{2} },
 the input alphabet being { 0, 1 },
 the initial state being S_{0},
 the set of final states being { S_{0} }, and
 the transitions being as shown in column (B) of the table.
In the tree automaton setting, the input alphabet is changed such that the symbols 0 and 1 are both unary, and a nullary symbol, say nil is used for tree leaves. For example, the binary string "110" in the string automaton setting corresponds to the term "1(1(0(nil)))" in the tree automaton setting; this way, strings can be generalized to trees, or terms. The topdown finite tree automaton accepting the set of all terms corresponding to multiples of 3 in binary string notation is then defined by:
 the set Q of states being still { S_{0}, S_{1}, S_{2} },
 the ranked input alphabet being { 0, 1, nil }, with Arity(0)=Arity(1)=1 and Arity(nil)=0, as explained,
 the set of initial states being { S_{0} }, and
 the transitions being as shown in column (C) of the table.
For example, the tree "1(1(0(nil)))" is accepted by the following tree automaton run:
S_{0}(  1(  1(  0(  nil  ))))  
⇒  1(  S_{1}(  1(  0(  nil  ))))  by 2  
⇒  1(  1(  S_{0}(  0(  nil  ))))  by 4  
⇒  1(  1(  0(  S_{0}(  nil  ))))  by 1  
⇒  1(  1(  0(  nil  )))  by 0 
In contrast, the term "1(0(nil))" leads to following nonaccepting automaton run:
⇒ S_{0}(  1(  0(  nil  )))  
⇒  1(  S_{1}(  0(  nil  ))))  by 2  
⇒  1(  0(  S_{2}(  nil  ))))  by 3, no further rule applicable 
Since there are no other initial states than S_{0} to start an automaton run with, the term "1(0(nil))" is not accepted by the tree automaton.
For comparison purposes, the table gives in column (A) and (D) a (right) regular (string) grammar, and a regular tree grammar, respectively, each accepting the same language as its automaton counterpart.
Properties
Recognizability
For a bottomup automaton, a ground term t (that is, a tree) is accepted if there exists a reduction that starts from t and ends with q(t), where q is a final state. For a topdown automaton, a ground term t is accepted if there exists a reduction that starts from q(t) and ends with t, where q is an initial state.
The tree language L(A) accepted, or recognized, by a tree automaton A is the set of all ground terms accepted by A. A set of ground terms is recognizable if there exists a tree automaton that accepts it.
A linear (that is, aritypreserving) tree homomorphism preserves recognizability.^{[6]}
Completeness and reduction
A nondeterministic finite tree automaton is complete if there is at least one transition rule available for every possible symbolstates combination. A state q is accessible if there exists a ground term t such that there exists a reduction from t to q(t). An NFTA is reduced if all its states are accessible.^{[7]}
Pumping lemma
Every sufficiently large^{[8]} ground term t in a recognizable tree language L can be vertically tripartited^{[9]} such that arbitrary repetition ("pumping") of the middle part keeps the resulting term in L.^{[10]}^{[11]}
For the language of all finite lists of boolean values from the above example, all terms beyond the height limit k=2 can be pumped, since they need to contain an occurrence of cons. For example,
cons(false,  cons(true,nil)  )  , 
cons(false,cons(false,  cons(true,nil)  ))  , 
cons(false,cons(false,cons(false,  cons(true,nil)  )))  , ... 
all belong to that language.
Closure
The class of recognizable tree languages is closed under union, under complementation, and under intersection.^{[12]}
Myhill–Nerode theorem
A congruence on the set of all trees over a ranked alphabet F is an equivalence relation such that u_{1} ≡ v_{1} and ... and u_{n} ≡ v_{n} implies f(u_{1},...,u_{n}) ≡ f(v_{1},...,v_{n}), for every f ∈ F. It is of finite index if its number of equivalenceclasses is finite.
For a given treelanguage L, a congruence can be defined by u ≡_{L} v if C[u] ∈ L ⇔ C[v] ∈ L for each context C.
The Myhill–Nerode theorem for tree automaton states that the following three statements are equivalent:^{[13]}
 L is a recognizable tree language
 L is the union of some equivalence classes of a congruence of finite index
 the relation ≡_{L} is a congruence of finite index
See also
 Courcelle's theorem  an application of tree automata to prove an algorithmic metatheorem about graphs
 Tree transducers  extend tree automata in the same way that word transducers extend word automata.
 Alternating tree automata
 Infinitetree automata
Notes
 ^ Comon et al. 2008, sect. 1.1, p. 20.
 ^ Comon et al. 2008, sect. 1.6, p. 38.
 ^ Comon et al. 2008, sect. 1.1, p. 23.
 ^ Comon et al. 2008, sect. 1.6, theorem 1.6.1, p. 38.
 ^ In a strict sense, deterministic topdown automata are not defined by Comon et al. (2008) but they are used there (sect. 1.6, proposition 1.6.2, p. 38). They accept the class of pathclosed tree languages (sect. 1.8, exercise 1.6, p. 4344).
 ^ The notion in Comon et al. (2008, sect. 1.4, theorem 1.4.3, p. 3132) of tree homomorphism is more general than that of the article "tree homomorphism".
 ^ Comon et al. 2008, sect. 1.1, p. 2324.
 ^ Formally: height(t) > k, with k > 0 depending only on L, not on t
 ^ Formally: there is a context C[.], a nontrivial context C’[.], and a ground term u such that t = C[C’[u]]. A "context" C[.] is a tree with one hole (or, correspondingly, a term with one occurrence of one variable). A context is called "trivial" if the tree consists only of the hole node (or, correspondingly, if the term is just the variable). The notation C[t] means the result of inserting the tree t into the hole of C[.] (or, correspondingly, instantiating the variable to t). Comon et al. 2008, p. 17, gives a formal definition.
 ^ Formally: C[C’^{n}[u]] ∈ L for all n ≥ 0. The notation C^{n}[.] means the result of stacking n copies of C[.] one in another, cf. Comon et al. 2008, p. 17.
 ^ Comon et al. 2008, sect. 1.2, p. 29.
 ^ Comon et al. 2008, sect. 1.3, theorem 1.3.1, p. 30.
 ^ Comon et al. 2008, sect. 1.5, p .36.
References
 Comon, Hubert; Dauchet, Max; Gilleron, Rémi; Jacquemard, Florent; Lugiez, Denis; Löding, Christof; Tison, Sophie; Tommasi, Marc (November 2008). Tree Automata Techniques and Applications (PDF). Retrieved 11 February 2014.
 Hosoya, Haruo (4 November 2010). Foundations of XML Processing: The TreeAutomata Approach. Cambridge University Press. ISBN 9781139492362.
External links
Implementations
 Grappa  ranked and unranked tree automata libraries (OCaml)
 Timbuk  tools for reachability analysis and tree automata calculations (OCaml)
 LETHAL  library for working with finite tree and hedge automata (Java)
 Machinechecked tree automata library (Isabelle [OCaml, SML, Haskell])
 VATA  a library for efficient manipulation of nondeterministic tree automata (C++)