Parallel Pairwise Edit Distance Computation¶
Provides general utility functions to compute pairwise edit distances in parallel.
-
edist.multiprocess.
pairwise_backtraces
(Xs, Ys, dist_backtrace, delta=None, num_jobs=8)¶ Computes the pairwise backtraces between the objects in Xs and the objects in Ys. Each object in Xs and Ys needs to be a valid input for the given distance function, i.e. a sequence or a tree.
Optionally, it is possible to specify a component-wise distance function delta, which will then be forwarded to the input distance function
Parameters: - Xs (list) – a list of sequences or trees.
- Ys (list) – another list of sequences or trees.
- dist_backtrace (function) – a function that takes an element of Xs as first and an element of Ys as second input and returns an arbitrary object.
- delta (function (default = None)) – a function that takes two elements of the input sequences or trees as inputs and returns their pairwise distance, where delta(x, None) should be the cost of deleting x and delta(None, y) should be the cost of inserting y. If this is not None, dist needs to accept an optional argument ‘delta’ as well. Defaults to None.
- num_jobs (int (default = 8)) – The number of jobs to be used for parallel processing. Defaults to 8.
Returns: B – a len(Xs) x len(Ys) list of lists of pairwise backtraces.
Return type: list
-
edist.multiprocess.
pairwise_distances
(Xs, Ys, dist, delta=None, num_jobs=8)¶ Computes the pairwise edit distances between the objects in Xs and the objects in Ys. Each object in Xs and Ys needs to be a valid input for the given distance function, i.e. a sequence or a tree.
Optionally, it is possible to specify a component-wise distance function delta, which will then be forwarded to the input distance function
Parameters: - Xs (list) – a list of sequences or trees.
- Ys (list) – another list of sequences or trees.
- dist (function) – a function that takes an element of Xs as first and an element of Ys as second input and returns a scalar distance value between them.
- delta (function (default = None)) – a function that takes two elements of the input sequences or trees as inputs and returns their pairwise distance, where delta(x, None) should be the cost of deleting x and delta(None, y) should be the cost of inserting y. If this is not None, dist needs to accept an optional argument ‘delta’ as well. Defaults to None.
- num_jobs (int (default = 8)) – The number of jobs to be used for parallel processing. Defaults to 8.
Returns: D – a len(Xs) x len(Ys) matrix of pairwise edit distance values.
Return type: array_like
-
edist.multiprocess.
pairwise_distances_symmetric
(Xs, dist, delta=None, num_jobs=8)¶ Computes the pairwise edit distances between the objects in Xs, assuming that the distance measure is symmetric. Each object in Xs needs to be a valid input for the given distance function, i.e. a sequence or a tree. Due to symmetry, this method is about double as fast compared to pairwise_distances.
Optionally, it is possible to specify a component-wise distance function delta, which will then be forwarded to the input distance function
Parameters: - Xs (list) – a list of sequences or trees.
- dist (function) – a function that takes two elements of Xs as inputs and returns a scalar distance value between them.
- delta (function (default = None)) – a function that takes two elements of the input sequences or trees as inputs and returns their pairwise distance, where delta(x, None) should be the cost of deleting x and delta(None, y) should be the cost of inserting y. If this is not None, dist needs to accept an optional argument ‘delta’ as well. Defaults to None.
- num_jobs (int (default = 8)) – The number of jobs to be used for parallel processing. Defaults to 8.
Returns: D – a symmetric len(Xs) x len(Xs) matrix of pairwise edit distance values.
Return type: array_like