Algebraic Dynamic Programming¶
Implements algebraic dynamic programming for edit distances.
-
class
edist.adp.
Grammar
¶ Models an ADP grammar, consisting of a starting nonterminal, a set of accepting nonterminals, a set of permitted edit operations and a set of rules of the form
A -> delta B
where A and B are nonterminals and delta is an edit operation, either a replacement, a deletion, or an insertion. We define the set of possible edit scripts permitted by a given grammar inductively as all scripts that can be produced by starting from the self._start, replacing the current nonterminal with the right-hand-side of a matching rule arbitrary many times, and then deleting the current nonterminal if it is accepting.
-
_nonterminals
¶ A list of nonterminals of this grammar. Defaults to the union of accepting and start.
Type: list (default = _accepting + {start})
-
_start
¶ The starting nonterminal, which should be in _nonterminals.
Type: str
-
_accepting
¶ A list of accepting nonterminals, all of which should be in self._nonterminals.
Type: list
-
_reps
¶ A list of names of possible replacement operations. Defaults to an empty list.
Type: list (default = [])
-
_dels
¶ A list of names of possible deletion operations. Defaults to an empty list.
Type: list (default = [])
-
_inss
¶ A list of names of possible insertion operations. Defaults to an empty list.
Type: list (default = [])
-
_rules
¶ A mapping of nonterminal symbols to RuleEntries. Refer to The documentation above for more details on RuleEntries. Defaults to an empty map.
Type: dictionary (default = {})
-
adjacency_lists
¶ Returns the adjacency list format of this grammar.
Returns: - start_idx (int) – The index of the starting nonterminal symbol.
- accpt_idxs (list) – A list of indices for all accepting nonterminals.
- rep_adj (list) – An adjacency list covering all replacement operations, i.e. a list where the i-th entry contains all rules of the form A -> rep B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
- del_adj (list) – An adjacency list covering all deletion operations, i.e. a list where the i-th entry contains all rules of the form A -> del B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
- ins_adj (list) – An adjacency list covering all insertion operations, i.e. a list where the i-th entry contains all rules of the form A -> ins B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
-
append_deletion
¶ Appends a rule to this grammar for a deletion operation, i.e. a rule of the form A -> del B.
Parameters: - source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
- target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
- operation (str) – The name of the deletion operation del. If not in self._dels yet, it is appended automatically.
-
append_insertion
¶ Appends a rule to this grammar for a deletion operation, i.e. a rule of the form A -> ins B.
Parameters: - source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
- target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
- operation (str) – The name of the insertion operation ins. If not in self._inss yet, it is appended automatically.
-
append_replacement
¶ Appends a rule to this grammar for a replacement operation, i.e. a rule of the form A -> rep B.
Parameters: - source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
- target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
- operation (str) – The name of the replacement operation rep. If not in self._reps yet, it is appended automatically.
-
inverse_adjacency_lists
¶ Returns the inverse adjacency list format of this grammar.
Returns: - start_idx (int) – The index of the starting nonterminal symbol.
- accpt_idxs (list) – A list of indices for all accepting nonterminals.
- rep_adj (list) – An adjacency list covering all replacement operations, i.e. a list where the i-th entry contains all rules of the form A -> rep B, where B is the i-th nonterminal and (A, rep) is represented as a tuple tuples of operation index and nonterminal index.
- del_adj (list) – An adjacency list covering all deletion operations, i.e. a list where the i-th entry contains all rules of the form A -> del B, where B is the i-th nonterminal and (A, del) is represented as a tuple tuples of operation index and nonterminal index.
- ins_adj (list) – An adjacency list covering all insertion operations, i.e. a list where the i-th entry contains all rules of the form A -> ins B, where B is the i-th nonterminal and (A, ins) is represented as a tuple tuples of operation index and nonterminal index.
-
nonterminals
¶ Returns the list of nonterminals of this grammar.
Returns: nonts – the list of nonterminals of this grammar. Return type: list
-
size
¶ Returns the number of nonterminals in this grammar.
Returns: size – the number of nonterminals in this grammar. Return type: int
-
start
¶ Returns the starting nonterminal of this grammar.
Returns: start – the starting nonterminal of this grammar. Return type: str
-
validate
¶ Ensures that this grammar is compatible with the given algebra deltas.
Parameters: deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions. Raises: ValueError
– If any of the operations of this grammar is not supported by the given algebra.
-
-
class
edist.adp.
RuleEntry
¶ Models the set of all grammar rules for a single nonterminal symbol in an ADP grammar.
-
_reps
¶ - A list of possible replacements, stored as tuples of the form
(rep, B), where rep is the name of a replacement operation and B is the nonterminal we obtain after applying rep.
Type: list
-
_dels
¶ - A list of possible deletions, stored as tuples of the form
(del, B), where del is the name of a deletion operation and B is the nonterminal we obtain after applying del.
Type: list
-
_inss
¶ - A list of possible insertions, stored as tuples of the form
(ins, B), where ins is the name of a insertion operation and B is the nonterminal we obtain after applying ins.
Type: list
-
-
edist.adp.
backtrace
()¶ Computes a co-optimal alignment between the two input sequences x and y, given the given ADP grammar and algebra. This mechanism is deterministic and will always prefer replacements over other options.
Parameters: - x (list) – A list of objects.
- y (list) – Another list of objects.
- grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
- deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns: alignment – a co-optimal alignment between x and y.
Return type: class alignment.Alignment
-
edist.adp.
backtrace_matrix
()¶ Computes three tensors, P_rep, P_del, and P_ins, which summarize all co-optimal alignments between x and y.
In particular, P_rep[k, i, j] specifies the fraction of co-optimal alignments in which node x[i] has been replaced with node y[j] using operation grammar._reps[k]. Accordingly, P_del[k, i] specifies the fraction of co-optimal alignments in which x[i] has been deleted using operation grammar._dels[k], and P_ins[k, j] specifies the fraction of co-optimal alignments in which y[j] has been inserted using operation grammar._inss[k].
Parameters: - x (list) – A list of objects.
- y (list) – Another list of objects.
- grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
- deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns: - P_rep (array_like) – a tensor where P_rep[k, i, j] specifies the fraction of co-optimal alignments in which node x[i] has been replaced with node y[j] using operation grammar._reps[k].
- P_del (array_like) – a matrix where P_del[k, i] specifies the fraction of co-optimal alignments in which x[i] has been deleted using operation grammar._dels[k].
- P_ins (array_like) – a matrix where P_ins[k, j] specifies the fraction of co-optimal alignments in which y[j] has been inserted using operation grammar._inss[k].
- k (int) – the number of co-optimal alignments.
-
edist.adp.
backtrace_stochastic
()¶ Computes a co-optimal alignment between the two input sequences x and y, given the given ADP grammar and algebra. This mechanism is stochastic and will return a random alignment.
Note that the randomness does _not_ produce a uniform distribution over all co-optimal alignments because random choices at the start of the alignment process dominate. If you wish to characterize the overall distribution accurately, use backtrace_matrix instead.
Parameters: - x (list) – A list of objects.
- y (list) – Another list of objects.
- grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
- deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns: alignment – a co-optimal alignment between x and y.
Return type: class alignment.Alignment
-
edist.adp.
edit_distance
()¶ Computes the edit distance between two sequences x and y, based on the given ADP grammar and the given algebra.
Parameters: - x (list) – A list of objects.
- y (list) – Another list of objects.
- grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
- deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns: d – The edit distance between x and y.
Return type: float
-
edist.adp.
string_to_index_list
()¶ Converts a list of objects to an index list, given a index mapping.
Parameters: - lst (list) – A list of objects [x_1, …, x_m].
- dct (dictionary) – A mapping from objects to indices.
Returns: idx_list – The list [dct[x_1], …, dct[x_m]]
Return type: list
-
edist.adp.
string_to_index_map
()¶ Inverts a list of objects, i.e. converts a list of objects to a map from objects to indices in the list.
Parameters: lst (list) – A list of objects [x_1, …, x_m]. Returns: dct – A map where dct[x_i] = i for all i. Return type: dictionary
-
edist.adp.
string_to_index_tuple_list
()¶ Converts a list of tuples to a list of tuple-indices.
Parameters: - lst (list) – A list of tuples [(x_1, y_1), …, (x_m, y_m)].
- op_dct (dictionary) – A mapping from x-objects to indices.
- nont_dct (dictionary) – A mapping from y-objects to indices.
Returns: idx_list – The list [(op_dct[x_1], nont_dct[y_1]), …, (op_dct[x_m], nont_dct[y_m])]
Return type: list