Sequence Edit Distance

Implements the sequence edit distance of Levenshtein (1965) and its backtracing in cython.

edist.sed.sed()

Computes the sequence edit distance between the input sequence x and the input sequence y, given the element-wise distance function delta.

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
  • delta (function (default = None)) – a function that takes an element of x as first and an element of y as second input and returns the distance between them. If None, this method calls standard_sed instead.
Returns:

d – the sequence edit distance between x and y according to delta.

Return type:

float

edist.sed.sed_backtrace()

Computes a co-optimal alignment between the two input sequences x and y, given the element-wise distance function delta. This mechanism is deterministic and will always prefer replacements over other options.

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
  • delta (function (default = None)) – a function that takes an element of x as first and an element of y as second input and returns the distance between them. If None, this method calls standard_sed_backtrace instead.
Returns:

alignment – A co-optimal alignment between x and y according to the sequence edit distance.

Return type:

class alignment.Alignment

edist.sed.sed_backtrace_matrix()

Computes a matrix, summarizing all co-optimal alignments between x and y in a matrix P, where entry P[i, j] specifies the fraction of co-optimal alignments in which node x[i] has been aligned with node y[j].

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
  • delta (function (default = None)) – a function that takes an element of x as first and an element of y as second input and returns the distance between them. If None, this method calls standard_sed_backtrace_matrix instead.
Returns:

  • P (array_like) – a matrix, where entry P[i, j] specifies the fraction of co-optimal alignments in which node x[i] has been aligned with node y[j]. P[i, n] contains the fraction of deletions of node x[i] and P[m, j] the fraction of insertions of node y[j].
  • K (array_like) – a matrix that contains the counts for all co-optimal alignments in which node x[i] has been aligned with node y[j].
  • k (int) – the number of co-optimal alignments overall, such that P = K / k.

edist.sed.sed_backtrace_stochastic()

Computes a co-optimal alignment between the two input sequences x and y, given the element-wise distance function delta. This mechanism is stochastic and will return a random co-optimal alignment.

Note that the randomness does _not_ produce a uniform distribution over all co-optimal alignments because reandom choices at the start of the alignment process dominate. If you wish to characterize the overall distribution accurately, use sed_backtrace_matrix instead.

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
  • delta (function (default = None)) – a function that takes an element of x as first and an element of y as second input and returns the distance between them. If None, this method calls standard_sed_backtrace_stochastic instead.
Returns:

alignment – A co-optimal alignment between x and y according to the sequence edit distance.

Return type:

class alignment.Alignment

edist.sed.sed_string()

Computes the standard sequence edit distance/Levenshtein distance between two input strings x and y, using the Kronecker distance as element-wise distance measure.

Parameters:
  • x (str) – a string.
  • y (str) – another string.
Returns:

d – the standard sequence edit distance between x and y.

Return type:

int

edist.sed.standard_sed()

Computes the standard sequence edit distance/Levenshtein distance between the input sequence x and the input sequence y.

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
Returns:

d – the standard sequence edit distance between x and y.

Return type:

int

edist.sed.standard_sed_backtrace()

Computes a co-optimal alignment between the two input sequences x and y. This mechanism is deterministic and will always prefer replacements over other options.

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
Returns:

alignment – A co-optimal alignment between x and y according to the sequence edit distance.

Return type:

class alignment.Alignment

edist.sed.standard_sed_backtrace_matrix()

Computes a matrix, summarizing all co-optimal alignments between x and y in a matrix P, where entry P[i, j] specifies the fraction of co-optimal alignments in which node x[i] has been aligned with node y[j].

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
Returns:

  • P (array_like) – a matrix, where entry P[i, j] specifies the fraction of co-optimal alignments in which node x[i] has been aligned with node y[j]. P[i, n] contains the fraction of deletions of node x[i] and P[m, j] the fraction of insertions of node y[j].
  • K (array_like) – a matrix that contains the counts for all co-optimal alignments in which node x[i] has been aligned with node y[j].
  • k (int) – the number of co-optimal alignments overall, such that P = K / k.

edist.sed.standard_sed_backtrace_stochastic()

Computes a co-optimal alignment between the two input sequences x and y, given the element-wise distance function delta. This mechanism is stochastic and will return a random co-optimal alignment.

Note that the randomness does _not_ produce a uniform distribution over all co-optimal alignments because reandom choices at the start of the alignment process dominate. If you wish to characterize the overall distribution accurately, use sed_backtrace_matrix instead.

Parameters:
  • x (list) – a sequence of objects.
  • y (list) – another sequence of objects.
Returns:

alignment – A co-optimal alignment between x and y according to the sequence edit distance.

Return type:

class alignment.Alignment