Tree Utilities

Provides general utility functions to process trees.

edist.tree_utils.check_dfs_structure(adj, i=0)

Verifies that a given adjacency list is in depth-first-search order, in the sense that the descendants of a node i are all indices i+1, i+2, … orl[i], where orl[i] is the outermost right leaf of i.

We verify this by performing a depth-first search over the tree and checking whether the indices are equivalent

Parameters:adj (list) – a directed graph in adjacency list format.
Returns:size – the size of the input tree, i.e. the last DFS index.
Return type:int
Raises:ValueError – if the given adjacency list is not in DFS format.
edist.tree_utils.check_tree_structure(adj)

Verifies that a given adjacency list describes a tree.

In particular, we build a parent representation of the tree and throw an exception if that fails. This is valid because trees are the subclass of directed graphs with a unique parent.

Parameters:adj (list) – a directed graph in adjacency list format.
Raises:ValueError – if the given adjacency list does not form a tree.
edist.tree_utils.dataset_from_json(path)

Reads trees in node list/adjacency list format from all JSON files in the given directory.

Parameters:path (str) – a path to a directory which contains JSON files.
Returns:
  • X (list) – A list of tuples in node list/adjacency list format.
  • filenames (list) – A list of filenames from which we read the trees.
Raises:Exception – if the file access does not work, if some JSON data is malformed, if a parsed data is not a tree, or if a parsed tree is not in depth first search order.
edist.tree_utils.from_json(filename)

Loads a tree in node list/adjacency list format from a JSON file.

Parameters:filename (str) – A JSON filename containing tree data as written by the to_json method.
Returns:
  • nodes (list) – The node list of the tree.
  • adj (list) – The adjacency list of the tree.
Raises:Exception – if the given file is not accessible, if the JSON data is malformed, if the parsed data is not a tree, or if the parsed tree is not in depth first search order.
edist.tree_utils.parents(adj)

Returns the parent representation of the tree with the given adjacency list.

Parameters:adj (list) – The adjacency list of the tree.
Returns:par – a numpy integer array with len(adj) elements, where the ith element contains the index of the parent of the ith node. Nodes without children contain the entry -1.
Return type:int array
edist.tree_utils.root(adj)

Returns the root of a tree and raises an error if the input adjacency matrix does not correspond to a tree.

Parameters:adj (list) – a directed graph in adjacency list format.
Returns:root – the index of the root of the tree.
Return type:int
Raises:ValueError – if the given adjacency list does not form a tree.
edist.tree_utils.subtree(nodes, adj, i)

Returns the subtree rooted at node i in node list/adjacency list format.

Parameters:
  • nodes (list) – The node list of the tree.
  • adj (list) – The adjacency list of the tree.
  • i (int) – The index of the desired subtree.
Returns:

  • nodes_sub (list) – The node list of the subtree rooted at i.
  • adj_sub (list) – The adajency list of the subtree rooted at i.

edist.tree_utils.to_dfs_structure(nodes, adj)

Re-orders a tree to conform to a depth-first search structure.

Note that this method performs a copy and leaves the original tree untouched.

Parameters:
  • nodes (list) – A node list.
  • adj (list) – An adjacency list.
Raises:

ValueError – if the input is not a tree.

edist.tree_utils.to_json(filename, nodes, adj)

Writes a tree in node list/adjacency list format to a JSON file.

Parameters:
  • filename (str) – The filename for the resulting JSON file.
  • nodes (list) – The node list of the tree.
  • adj (list) – The adjacency list of the tree.
Raises:

Exception – if the file is not accessible or the JSON writeout fails.

edist.tree_utils.tree_to_string(nodes, adj, indent=False, with_indices=False)

Prints a tree in node list/adjacency list format as string.

Parameters:
  • nodes (list) – The node list of the tree.
  • adj (list) – The adjacency list of the tree.
  • indent (bool (default = False)) – A boolean flag; if True, each node is printed on a new line.
  • with_indices (bool (default = False)) – A boolean flag; if True, each node is printed with its index.
Raises:

ValueError – if the adjacency list does not correspond to a tree.