Algebraic Dynamic Programming

Implements algebraic dynamic programming for edit distances.

class edist.adp.Grammar

Models an ADP grammar, consisting of a starting nonterminal, a set of accepting nonterminals, a set of permitted edit operations and a set of rules of the form

A -> delta B

where A and B are nonterminals and delta is an edit operation, either a replacement, a deletion, or an insertion. We define the set of possible edit scripts permitted by a given grammar inductively as all scripts that can be produced by starting from the self._start, replacing the current nonterminal with the right-hand-side of a matching rule arbitrary many times, and then deleting the current nonterminal if it is accepting.

_nonterminals

A list of nonterminals of this grammar. Defaults to the union of accepting and start.

Type:list (default = _accepting + {start})
_start

The starting nonterminal, which should be in _nonterminals.

Type:str
_accepting

A list of accepting nonterminals, all of which should be in self._nonterminals.

Type:list
_reps

A list of names of possible replacement operations. Defaults to an empty list.

Type:list (default = [])
_dels

A list of names of possible deletion operations. Defaults to an empty list.

Type:list (default = [])
_inss

A list of names of possible insertion operations. Defaults to an empty list.

Type:list (default = [])
_rules

A mapping of nonterminal symbols to RuleEntries. Refer to The documentation above for more details on RuleEntries. Defaults to an empty map.

Type:dictionary (default = {})
adjacency_lists

Returns the adjacency list format of this grammar.

Returns:
  • start_idx (int) – The index of the starting nonterminal symbol.
  • accpt_idxs (list) – A list of indices for all accepting nonterminals.
  • rep_adj (list) – An adjacency list covering all replacement operations, i.e. a list where the i-th entry contains all rules of the form A -> rep B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
  • del_adj (list) – An adjacency list covering all deletion operations, i.e. a list where the i-th entry contains all rules of the form A -> del B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
  • ins_adj (list) – An adjacency list covering all insertion operations, i.e. a list where the i-th entry contains all rules of the form A -> ins B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
append_deletion

Appends a rule to this grammar for a deletion operation, i.e. a rule of the form A -> del B.

Parameters:
  • source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
  • target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
  • operation (str) – The name of the deletion operation del. If not in self._dels yet, it is appended automatically.
append_insertion

Appends a rule to this grammar for a deletion operation, i.e. a rule of the form A -> ins B.

Parameters:
  • source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
  • target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
  • operation (str) – The name of the insertion operation ins. If not in self._inss yet, it is appended automatically.
append_replacement

Appends a rule to this grammar for a replacement operation, i.e. a rule of the form A -> rep B.

Parameters:
  • source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
  • target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
  • operation (str) – The name of the replacement operation rep. If not in self._reps yet, it is appended automatically.
inverse_adjacency_lists

Returns the inverse adjacency list format of this grammar.

Returns:
  • start_idx (int) – The index of the starting nonterminal symbol.
  • accpt_idxs (list) – A list of indices for all accepting nonterminals.
  • rep_adj (list) – An adjacency list covering all replacement operations, i.e. a list where the i-th entry contains all rules of the form A -> rep B, where B is the i-th nonterminal and (A, rep) is represented as a tuple tuples of operation index and nonterminal index.
  • del_adj (list) – An adjacency list covering all deletion operations, i.e. a list where the i-th entry contains all rules of the form A -> del B, where B is the i-th nonterminal and (A, del) is represented as a tuple tuples of operation index and nonterminal index.
  • ins_adj (list) – An adjacency list covering all insertion operations, i.e. a list where the i-th entry contains all rules of the form A -> ins B, where B is the i-th nonterminal and (A, ins) is represented as a tuple tuples of operation index and nonterminal index.
nonterminals

Returns the list of nonterminals of this grammar.

Returns:nonts – the list of nonterminals of this grammar.
Return type:list
size

Returns the number of nonterminals in this grammar.

Returns:size – the number of nonterminals in this grammar.
Return type:int
start

Returns the starting nonterminal of this grammar.

Returns:start – the starting nonterminal of this grammar.
Return type:str
validate

Ensures that this grammar is compatible with the given algebra deltas.

Parameters:deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions.
Raises:ValueError – If any of the operations of this grammar is not supported by the given algebra.
class edist.adp.RuleEntry

Models the set of all grammar rules for a single nonterminal symbol in an ADP grammar.

_reps
A list of possible replacements, stored as tuples of the form

(rep, B), where rep is the name of a replacement operation and B is the nonterminal we obtain after applying rep.

Type:list
_dels
A list of possible deletions, stored as tuples of the form

(del, B), where del is the name of a deletion operation and B is the nonterminal we obtain after applying del.

Type:list
_inss
A list of possible insertions, stored as tuples of the form

(ins, B), where ins is the name of a insertion operation and B is the nonterminal we obtain after applying ins.

Type:list
edist.adp.backtrace()

Computes a co-optimal alignment between the two input sequences x and y, given the given ADP grammar and algebra. This mechanism is deterministic and will always prefer replacements over other options.

Parameters:
  • x (list) – A list of objects.
  • y (list) – Another list of objects.
  • grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
  • deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:

alignment – a co-optimal alignment between x and y.

Return type:

class alignment.Alignment

edist.adp.backtrace_matrix()

Computes three tensors, P_rep, P_del, and P_ins, which summarize all co-optimal alignments between x and y.

In particular, P_rep[k, i, j] specifies the fraction of co-optimal alignments in which node x[i] has been replaced with node y[j] using operation grammar._reps[k]. Accordingly, P_del[k, i] specifies the fraction of co-optimal alignments in which x[i] has been deleted using operation grammar._dels[k], and P_ins[k, j] specifies the fraction of co-optimal alignments in which y[j] has been inserted using operation grammar._inss[k].

Parameters:
  • x (list) – A list of objects.
  • y (list) – Another list of objects.
  • grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
  • deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:

  • P_rep (array_like) – a tensor where P_rep[k, i, j] specifies the fraction of co-optimal alignments in which node x[i] has been replaced with node y[j] using operation grammar._reps[k].
  • P_del (array_like) – a matrix where P_del[k, i] specifies the fraction of co-optimal alignments in which x[i] has been deleted using operation grammar._dels[k].
  • P_ins (array_like) – a matrix where P_ins[k, j] specifies the fraction of co-optimal alignments in which y[j] has been inserted using operation grammar._inss[k].
  • k (int) – the number of co-optimal alignments.

edist.adp.backtrace_stochastic()

Computes a co-optimal alignment between the two input sequences x and y, given the given ADP grammar and algebra. This mechanism is stochastic and will return a random alignment.

Note that the randomness does _not_ produce a uniform distribution over all co-optimal alignments because random choices at the start of the alignment process dominate. If you wish to characterize the overall distribution accurately, use backtrace_matrix instead.

Parameters:
  • x (list) – A list of objects.
  • y (list) – Another list of objects.
  • grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
  • deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:

alignment – a co-optimal alignment between x and y.

Return type:

class alignment.Alignment

edist.adp.edit_distance()

Computes the edit distance between two sequences x and y, based on the given ADP grammar and the given algebra.

Parameters:
  • x (list) – A list of objects.
  • y (list) – Another list of objects.
  • grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
  • deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:

d – The edit distance between x and y.

Return type:

float

edist.adp.string_to_index_list()

Converts a list of objects to an index list, given a index mapping.

Parameters:
  • lst (list) – A list of objects [x_1, …, x_m].
  • dct (dictionary) – A mapping from objects to indices.
Returns:

idx_list – The list [dct[x_1], …, dct[x_m]]

Return type:

list

edist.adp.string_to_index_map()

Inverts a list of objects, i.e. converts a list of objects to a map from objects to indices in the list.

Parameters:lst (list) – A list of objects [x_1, …, x_m].
Returns:dct – A map where dct[x_i] = i for all i.
Return type:dictionary
edist.adp.string_to_index_tuple_list()

Converts a list of tuples to a list of tuple-indices.

Parameters:
  • lst (list) – A list of tuples [(x_1, y_1), …, (x_m, y_m)].
  • op_dct (dictionary) – A mapping from x-objects to indices.
  • nont_dct (dictionary) – A mapping from y-objects to indices.
Returns:

idx_list – The list [(op_dct[x_1], nont_dct[y_1]), …, (op_dct[x_m], nont_dct[y_m])]

Return type:

list