Unordered Tree Edit Distance

Implements the constrained unordered tree edit distance of Zhang (1996) and its backtracing in cython.

edist.uted.adjmat_()

Converts an adjacency list into an int array

edist.uted.munkres()

This calls the Munkres algorithm to find a minimal matching for the given cost matrix C. Note that this function is more for debugging purposes. If you want to call this algorithm from Python, you are better served by calling scipy.optimize.linear_sum_assignment.

Parameters:C (ndarray) – An m x m cost matrix.
Returns:pi – An m-element array where pi[i] is the index to which i is assigned.
Return type:ndarray
edist.uted.uted()
Computes the constrained, unordered tree edit distance between the
trees x and y, each described by a list of nodes and an adjacency list

adj, where adj[i] is a list of indices pointing to children of node i.

Note that we assume a proper depth-first-search order of adj, i.e. for every node i, the following indices are all part of the subtree rooted at i until we hit the index of i’s right sibling or the end of the tree.

Parameters:
  • x_nodes (list or tuple) – a list of nodes for tree x OR a tuple of the form (x_nodes, x_adj).
  • x_adj (list or tuple) – an adjacency list for tree x OR a tuple of the form (y_nodes, y_adj).
  • y_nodes (list (default = x_adj[0])) – a list of nodes for tree y.
  • y_adj (list (default = x_adj[1])) – an adjacency list for tree y.
  • delta (function (default = None)) – a function that takes two nodes as inputs and returns their pairwise distance, where delta(x, None) should be the cost of deleting x and delta(None, y) should be the cost of inserting y. If undefined, this method uses unit costs.
Returns:

d – the tree edit distance between x and y according to delta.

Return type:

float

edist.uted.uted_backtrace()

Computes the unordered tree edit distance between the trees x and y, each described by a list of nodes and an adjacency list adj, where adj[i] is a list of indices pointing to children of node i. This function returns an alignment representation of the distance.

Note that we assume a proper depth-first-search order of adj, i.e. for every node i, the following indices are all part of the subtree rooted at i until we hit the index of i’s right sibling or the end of the tree.

Parameters:
  • x_nodes (list or tuple) – a list of nodes for tree x OR a tuple of the form (x_nodes, x_adj).
  • x_adj (list or tuple) – an adjacency list for tree x OR a tuple of the form (y_nodes, y_adj).
  • y_nodes (list (default = x_adj[0])) – a list of nodes for tree y.
  • y_adj (list (default = x_adj[1])) – an adjacency list for tree y.
  • delta (function (default = None)) – a function that takes two nodes as inputs and returns their pairwise distance, where delta(x, None) should be the cost of deleting x and delta(None, y) should be the cost of inserting y. If undefined, this method calls standard_ted instead.
Returns:

alignment – A co-optimal alignment between x and y according to the unordered tree edit distance.

Return type:

class alignment.Alignment