Algebraic Dynamic Programming¶

Implements algebraic dynamic programming for edit distances.

class edist.adp.Grammar¶

Models an ADP grammar, consisting of a starting nonterminal, a set of accepting nonterminals, a set of permitted edit operations and a set of rules of the form

A -> delta B

where A and B are nonterminals and delta is an edit operation, either a replacement, a deletion, or an insertion. We define the set of possible edit scripts permitted by a given grammar inductively as all scripts that can be produced by starting from the self._start, replacing the current nonterminal with the right-hand-side of a matching rule arbitrary many times, and then deleting the current nonterminal if it is accepting.

_nonterminals¶

A list of nonterminals of this grammar. Defaults to the union of accepting and start.

Type:	list (default = _accepting + {start})

_start¶

The starting nonterminal, which should be in _nonterminals.

Type:	str

_accepting¶

A list of accepting nonterminals, all of which should be in self._nonterminals.

Type:	list

_reps¶

A list of names of possible replacement operations. Defaults to an empty list.

Type:	list (default = [])

_dels¶

A list of names of possible deletion operations. Defaults to an empty list.

Type:	list (default = [])

_inss¶

A list of names of possible insertion operations. Defaults to an empty list.

Type:	list (default = [])

_rules¶

A mapping of nonterminal symbols to RuleEntries. Refer to The documentation above for more details on RuleEntries. Defaults to an empty map.

Type:	dictionary (default = {})

adjacency_lists¶

Returns the adjacency list format of this grammar.

Returns:

start_idx (int) – The index of the starting nonterminal symbol.
accpt_idxs (list) – A list of indices for all accepting nonterminals.
rep_adj (list) – An adjacency list covering all replacement operations, i.e. a list where the i-th entry contains all rules of the form A -> rep B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
del_adj (list) – An adjacency list covering all deletion operations, i.e. a list where the i-th entry contains all rules of the form A -> del B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.
ins_adj (list) – An adjacency list covering all insertion operations, i.e. a list where the i-th entry contains all rules of the form A -> ins B, where A is the i-th nonterminal and the right-hand-sides are represented as tuples of operation indices and nonterminal indices.

append_deletion¶

Appends a rule to this grammar for a deletion operation, i.e. a rule of the form A -> del B.

Parameters:	source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically. target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically. operation (str) – The name of the deletion operation del. If not in self._dels yet, it is appended automatically.

append_insertion¶

Appends a rule to this grammar for a deletion operation, i.e. a rule of the form A -> ins B.

Parameters:	source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically. target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically. operation (str) – The name of the insertion operation ins. If not in self._inss yet, it is appended automatically.

append_replacement¶

Appends a rule to this grammar for a replacement operation, i.e. a rule of the form A -> rep B.

Parameters:

source (str) – The left-hand-side nonterminal A of the new rule. If not in self._nonterminals yet, it is appended automatically.
target (str) – The right-hand-side nonterminal B of the new rule. If not in self._nonterminals yet, it is appended automatically.
operation (str) – The name of the replacement operation rep. If not in self._reps yet, it is appended automatically.

inverse_adjacency_lists¶

Returns the inverse adjacency list format of this grammar.

Returns:

start_idx (int) – The index of the starting nonterminal symbol.
accpt_idxs (list) – A list of indices for all accepting nonterminals.
rep_adj (list) – An adjacency list covering all replacement operations, i.e. a list where the i-th entry contains all rules of the form A -> rep B, where B is the i-th nonterminal and (A, rep) is represented as a tuple tuples of operation index and nonterminal index.
del_adj (list) – An adjacency list covering all deletion operations, i.e. a list where the i-th entry contains all rules of the form A -> del B, where B is the i-th nonterminal and (A, del) is represented as a tuple tuples of operation index and nonterminal index.
ins_adj (list) – An adjacency list covering all insertion operations, i.e. a list where the i-th entry contains all rules of the form A -> ins B, where B is the i-th nonterminal and (A, ins) is represented as a tuple tuples of operation index and nonterminal index.

nonterminals¶

Returns the list of nonterminals of this grammar.

Returns:	nonts – the list of nonterminals of this grammar.
Return type:	list

size¶

Returns the number of nonterminals in this grammar.

Returns:	size – the number of nonterminals in this grammar.
Return type:	int

start¶

Returns the starting nonterminal of this grammar.

Returns:	start – the starting nonterminal of this grammar.
Return type:	str

validate¶

Ensures that this grammar is compatible with the given algebra deltas.

Parameters:	deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions.
Raises:	`ValueError` – If any of the operations of this grammar is not supported by the given algebra.

class edist.adp.RuleEntry¶

Models the set of all grammar rules for a single nonterminal symbol in an ADP grammar.

_reps¶

A list of possible replacements, stored as tuples of the form

(rep, B), where rep is the name of a replacement operation and B is the nonterminal we obtain after applying rep.

Type:	list

_dels¶

A list of possible deletions, stored as tuples of the form

(del, B), where del is the name of a deletion operation and B is the nonterminal we obtain after applying del.

Type:	list

_inss¶

A list of possible insertions, stored as tuples of the form

(ins, B), where ins is the name of a insertion operation and B is the nonterminal we obtain after applying ins.

Type:	list

edist.adp.backtrace()¶

Computes a co-optimal alignment between the two input sequences x and y, given the given ADP grammar and algebra. This mechanism is deterministic and will always prefer replacements over other options.

Parameters:	x (list) – A list of objects. y (list) – Another list of objects. grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information. deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:	alignment – a co-optimal alignment between x and y.
Return type:	class alignment.Alignment

edist.adp.backtrace_matrix()¶

Computes three tensors, P_rep, P_del, and P_ins, which summarize all co-optimal alignments between x and y.

In particular, P_rep[k, i, j] specifies the fraction of co-optimal alignments in which node x[i] has been replaced with node y[j] using operation grammar._reps[k]. Accordingly, P_del[k, i] specifies the fraction of co-optimal alignments in which x[i] has been deleted using operation grammar._dels[k], and P_ins[k, j] specifies the fraction of co-optimal alignments in which y[j] has been inserted using operation grammar._inss[k].

Parameters:

x (list) – A list of objects.
y (list) – Another list of objects.
grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information.
deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.

Returns:

P_rep (array_like) – a tensor where P_rep[k, i, j] specifies the fraction of co-optimal alignments in which node x[i] has been replaced with node y[j] using operation grammar._reps[k].
P_del (array_like) – a matrix where P_del[k, i] specifies the fraction of co-optimal alignments in which x[i] has been deleted using operation grammar._dels[k].
P_ins (array_like) – a matrix where P_ins[k, j] specifies the fraction of co-optimal alignments in which y[j] has been inserted using operation grammar._inss[k].
k (int) – the number of co-optimal alignments.

edist.adp.backtrace_stochastic()¶

Computes a co-optimal alignment between the two input sequences x and y, given the given ADP grammar and algebra. This mechanism is stochastic and will return a random alignment.

Note that the randomness does _not_ produce a uniform distribution over all co-optimal alignments because random choices at the start of the alignment process dominate. If you wish to characterize the overall distribution accurately, use backtrace_matrix instead.

Parameters:	x (list) – A list of objects. y (list) – Another list of objects. grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information. deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:	alignment – a co-optimal alignment between x and y.
Return type:	class alignment.Alignment

edist.adp.edit_distance()¶

Computes the edit distance between two sequences x and y, based on the given ADP grammar and the given algebra.

Parameters:	x (list) – A list of objects. y (list) – Another list of objects. grammar (class adp.Grammar) – An ADP grammar. Refer to the documentation above for more information. deltas (dictionary) – An algebra, i.e. a mapping from operation names to distance functions OR a single distance function if the grammar supports only a single replacement, deletion, and insertion operation.
Returns:	d – The edit distance between x and y.
Return type:	float

edist.adp.string_to_index_list()¶

Converts a list of objects to an index list, given a index mapping.

Parameters:	lst (list) – A list of objects [x_1, …, x_m]. dct (dictionary) – A mapping from objects to indices.
Returns:	idx_list – The list [dct[x_1], …, dct[x_m]]
Return type:	list

edist.adp.string_to_index_map()¶

Inverts a list of objects, i.e. converts a list of objects to a map from objects to indices in the list.

Parameters:	lst (list) – A list of objects [x_1, …, x_m].
Returns:	dct – A map where dct[x_i] = i for all i.
Return type:	dictionary

edist.adp.string_to_index_tuple_list()¶

Converts a list of tuples to a list of tuple-indices.

Parameters:	lst (list) – A list of tuples [(x_1, y_1), …, (x_m, y_m)]. op_dct (dictionary) – A mapping from x-objects to indices. nont_dct (dictionary) – A mapping from y-objects to indices.
Returns:	idx_list – The list [(op_dct[x_1], nont_dct[y_1]), …, (op_dct[x_m], nont_dct[y_m])]
Return type:	list

Algebraic Dynamic Programming¶

edist

Navigation

Related Topics