Set Edit Distance

Provides a set edit distance to compare two sets (each represented as lists for computational ease).

edist.seted.seted()

Computes the set edit distance between the input set x and the input set y, given the element-wise distance function delta.

In more detail, this function finds an alignment M = {(i, j)} subset of {1, …, len(x)} x {1, …, len(y)}, such that the following loss is minimized:

\[\sum_{(i, j) \in M} \delta(x_i, y_j) + \sum_{i \notin M} \delta(x_i, -) + \sum_{j \notin M} \delta(-, y_j)\]

where i is said to be not in M if there exists no tuple (i, j) in M for any j, and j is said to be not in M if there exists no tuple (i, j) in M for any i.

This problem can be solved via the Hungarian or Munkres algorithm in O([len(x) + len(y)]^3). So be advised that this method can become very slow for large input sets.

Parameters:
  • x (list-like) – a set of objects.
  • y (list-like) – another set of objects.
  • delta (function (default = None)) – a function that takes an element of x as first and an element of y as second input and returns the distance between them. If not given, standard_seted is called instead.
Returns:

d – the set edit distance between x and y according to delta.

Return type:

float

edist.seted.seted_backtrace()

Computes a co-optimal alignment between the two input sequences x and y, given the element-wise distance function delta. This mechanism is deterministic and depends on the implementation of scipy.optimize.linear_sum_assignment for the choice of co-optimal alignment.

Parameters:
  • x (list-like) – a set of objects.
  • y (list-like) – another set of objects.
  • delta (function (default = None)) – a function that takes an element of x as first and an element of y as second input and returns the distance between them. If None, this method calls standard_seted_backtrace instead.
Returns:

alignment – A co-optimal alignment between x and y according to the set edit distance.

Return type:

class alignment.Alignment

edist.seted.standard_seted()

Computes the standard set edit distance between the input set x and the input set y according to the Kronecker distance, i.e. the replacement cost is zero if two elements are equal and one otherwise.

In more detail, this function finds an alignment M = {(i, j)} subset of {1, …, len(x)} x {1, …, len(y)}, such that the following loss is minimized:

\[\sum_{(i, j) \in M} \delta(x_i, y_j) + |{i \notin M}| + |{j \notin M}|\]

where i is said to be not in M if there exists no tuple (i, j) in M for any j, and j is said to be not in M if there exists no tuple (i, j) in M for any i.

This problem can be solved via the Hungarian or Munkres algorithm in O([len(x) + len(y)]^3). So be advised that this method can become very slow for large input sets.

Parameters:
  • x (list-like) – a set of objects.
  • y (list-like) – another set of objects.
Returns:

d – the set edit distance between x and y according to delta.

Return type:

float

edist.seted.standard_seted_backtrace()

Computes a co-optimal alignment between the two input sequences x and y, given the element-wise distance function delta. This mechanism is deterministic and depends on the implementation of scipy.optimize.linear_sum_assignment for the choice of co-optimal alignment.

Parameters:
  • x (list-like) – a set of objects.
  • y (list-like) – another set of objects.
Returns:

alignment – A co-optimal alignment between x and y according to the standard set edit distance.

Return type:

class alignment.Alignment