For Developers#

Models#

List of Tapqir models.

class tapqir.models.model.Model(S=1, K=2, Q=None, device='cpu', dtype='double', priors=None)[source]#

Bases: object

Base class for tapqir models.

Derived models must implement the methods

model()
guide()
init_parameters()
TraceELBO()

Parameters:

S (int) – Number of distinct molecular states for the binder molecules.
K (int) – Maximum number of spots that can be present in a single image.
Q (Optional[int]) – Number of fluorescent dyes.
device (str) – Computation device (cpu or gpu).
dtype (str) – Floating point precision.
priors (Optional[dict]) – Dictionary of parameters of prior distributions.

to(device, dtype='double')[source]#

Change tensor device and dtype.

Parameters:

device (str) – Computation device, either “gpu” or “cpu”.
dtype (str) – Floating point precision, either “double” or “float”.

Return type:

None

load(path, data_only=True)[source]#

Load data and optionally parameters from a specified path

Parameters:

path (Union[str, Path]) – Path to Tapqir analysis folder.
data_only (bool) – Load only data or both data and parameters.

Return type:

None

model()[source]#: Generative Model

guide()[source]#: Variational Distribution

TraceELBO(jit)[source]#: A trace implementation of ELBO-based SVI.

init_parameters()[source]#: Initialize variational parameters.

init(lr=0.005, nbatch_size=5, fbatch_size=512, jit=False)[source]#

Initialize SVI object.

Parameters:

lr (float) – Learning rate.
nbatch_size (int) – AOI batch size.
fbatch_size (int) – Frame batch size.
jit (bool) – Use JIT compiler.

Return type:

None

run(num_iter=0, progress_bar=<class 'tqdm.std.tqdm'>)[source]#

Run inference procedure for a specified number of iterations. If num_iter equals zero then run till model converges.

Parameters:: num_iter (int) – Number of iterations.
Return type:: None

save_checkpoint(writer=None)[source]#

Save checkpoint.

Parameters:: writer (Optional[SummaryWriter]) – SummaryWriter object.

load_checkpoint(path=None, param_only=False, warnings=False)[source]#

Load checkpoint.

Parameters:

path (Union[str, Path, None]) – Path to model checkpoint.
param_only (bool) – Load only parameters.
warnings (bool) – Give warnings if loaded model has not been fully trained.

compute_stats(CI=0.95, save_matlab=False)[source]#

Compute credible regions (CI) and other stats.

Parameters:

CI (float) – credible region.
save_matlab (bool) – Save output in Matlab format as well.

cosmos#

class tapqir.models.cosmos.cosmos(S=1, K=2, Q=None, device='cpu', dtype='double', use_pykeops=True, priors={'background_mean_std': 1000.0, 'background_std_std': 100.0, 'gain_std': 50.0, 'height_std': 10000.0, 'lamda_rate': 1.0, 'proximity_rate': 1.0, 'width_max': 2.25, 'width_min': 0.75})[source]#

Bases: Model

Multi-Color Time-Independent Colocalization Model

Reference:

Ordabayev YA, Friedman LJ, Gelles J, Theobald DL. Bayesian machine learning analysis of single-molecule fluorescence colocalization images. eLife. 2022 March. doi: 10.7554/eLife.73860.

Parameters:

K (int) – Maximum number of spots that can be present in a single image.
device (str) – Computation device (cpu or gpu).
dtype (str) – Floating point precision.
use_pykeops (bool) – Use pykeops as backend to marginalize out offset.
priors (dict) – Dictionary of parameters of prior distributions.

model()[source]#

Generative Model

Model parameters:

Parameter	Shape	Description
`gain` - $g$	(1,)	camera gain
`proximity` - $\sigma^{xy}$	(1,)	proximity
`lamda` - $\lambda$	(1,)	average rate of target-nonspecific binding
`pi` - $\pi$	(1,)	average binding probability of target-specific binding
`background` - $b$	(N, F)	background intensity
`z` - $z$	(N, F)	target-specific spot presence
`theta` - $\theta$	(N, F)	target-specific spot index
`m` - $m$	(K, N, F)	spot presence indicator
`height` - $h$	(K, N, F)	spot intensity
`width` - $w$	(K, N, F)	spot width
`x` - $x$	(K, N, F)	spot position on x-axis
`y` - $y$	(K, N, F)	spot position on y-axis
`data` - $D$	(N, F, P, P)	observed images

Full joint distribution:

\[\begin{split}\begin{aligned} p(D, \phi) =~&p(g) p(\sigma^{xy}) p(\pi) p(\lambda) \prod_{\mathsf{AOI}} \left[ p(\mu^b) p(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} p(b | \mu^b, \sigma^b) p(z | \pi) p(\theta | z) \vphantom{\prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}}} \cdot \right. \right. \\ &\prod_{\mathsf{spot}} \left[ \vphantom{\prod_{F}} p(m | \theta, \lambda) p(h) p(w) p(x | \sigma^{xy}, \theta) p(y | \sigma^{xy}, \theta) \right] \left. \left. \prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} \sum_{\delta} p(\delta) p(D | \mu^I, g, \delta) \right] \right] \end{aligned}\end{split}\]

$z$ and $\theta$ marginalized joint distribution:

\[\begin{split}\begin{aligned} \sum_{z, \theta} p(D, \phi) =~&p(g) p(\sigma^{xy}) p(\pi) p(\lambda) \prod_{\mathsf{AOI}} \left[ p(\mu^b) p(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} p(b | \mu^b, \sigma^b) \sum_{z} p(z | \pi) \sum_{\theta} p(\theta | z) \vphantom{\prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}}} \cdot \right. \right. \\ &\prod_{\mathsf{spot}} \left[ \vphantom{\prod_{F}} p(m | \theta, \lambda) p(h) p(w) p(x | \sigma^{xy}, \theta) p(y | \sigma^{xy}, \theta) \right] \left. \left. \prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} \sum_{\delta} p(\delta) p(D | \mu^I, g, \delta) \right] \right] \end{aligned}\end{split}\]

guide()[source]#: Variational Distribution

\[\begin{split}\begin{aligned} q(\phi \setminus \{z, \theta\}) =~&q(g) q(\sigma^{xy}) q(\pi) q(\lambda) \cdot \\ &\prod_{\mathsf{AOI}} \left[ q(\mu^b) q(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} q(b) \prod_{\mathsf{spot}} q(m) q(h | m) q(w | m) q(x | m) q(y | m) \right] \right] \end{aligned}\end{split}\]

init_parameters()[source]#: Initialize variational parameters.

TraceELBO(jit=False)[source]#: A trace implementation of ELBO-based SVI that supports - exhaustive enumeration over discrete sample sites, and - local parallel sampling over any sample site in the guide.

property z_probs: Tensor#: Probability of there being a target-specific spot $p(z=1)$

property theta_probs: Tensor#: Posterior target-specific spot probability $q(\theta = k)$ for $k \in \{1, \dots, K\}$.

property m_probs: Tensor#: Posterior spot presence probability $q(m=1)$.

property pspecific: Tensor#: Probability of there being a target-specific spot $p(\mathsf{specific})$

hmm#

class tapqir.models.hmm.hmm(S=1, K=2, device='cpu', dtype='double', use_pykeops=True, vectorized=True, priors={'background_mean_std': 1000.0, 'background_std_std': 100.0, 'gain_std': 50.0, 'height_std': 10000.0, 'lamda_rate': 1.0, 'proximity_rate': 1.0, 'width_max': 2.25, 'width_min': 0.75})[source]#

Bases: cosmos

Multi-Color Hidden Markov Colocalization Model

EXPERIMENTAL This model relies on Funsor backend.

Parameters:

K (int) – Maximum number of spots that can be present in a single image.
device (str) – Computation device (cpu or gpu).
dtype (str) – Floating point precision.
use_pykeops (bool) – Use pykeops as backend to marginalize out offset.
vectorized (bool) – Vectorize time-dimension.
priors (dict) – Dictionary of parameters of prior distributions.

model()[source]#: Generative Model

guide()[source]#: Variational Distribution

init_parameters()[source]#: Initialize variational parameters.

TraceELBO(jit=False)[source]#: A trace implementation of ELBO-based SVI that supports - exhaustive enumeration over discrete sample sites, and - local parallel sampling over any sample site in the guide.

property z_probs: Tensor#: Probability of there being a target-specific spot $p(z=1)$

property theta_probs: Tensor#: Posterior target-specific spot probability $q(\theta = k, z=z_\mathsf{MAP})$.

property pspecific: Tensor#: Probability of there being a target-specific spot $p(\mathsf{specific})$

property m_probs: Tensor#: Posterior spot presence probability $q(m=1, z=z_\mathsf{MAP})$.

Distributions#

KSMOGN#

class tapqir.distributions.ksmogn.KSMOGN(height, width, x, y, target_locs, background, gain, offset_samples, offset_logits, P, m=None, alpha=None, use_pykeops=True, validate_args=None)[source]#

Bases: TorchDistribution

K-Spots Marginalized Offset Gamma Noise Image Distribution.

\[\mu^S_{\mathsf{pixelX}(i), \mathsf{pixelY}(j)} = \dfrac{m \cdot h}{2 \pi w^2} \exp{\left( -\dfrac{(i-x-x^\mathsf{target})^2 + (j-y-y^\mathsf{target})^2}{2 w^2} \right)}\]

\[\mu^I = b + \sum_{\mathsf{spot}} \mu^S\]

\[p(D|\mu^I, g) = \sum_\delta p(\delta) p(D|\mu^I, g, \delta) = \sum_\delta \delta_\mathsf{weights} \cdot \mathrm{Gamma}(D - \delta_\mathsf{samples} | \mu^I, g)\]

Reference:

Ordabayev YA, Friedman LJ, Gelles J, Theobald DL. Bayesian machine learning analysis of single-molecule fluorescence colocalization images. bioRxiv. 2021 Oct. doi: 10.1101/2021.09.30.462536.

Parameters:

height (Tensor) – Integrated spot intensity. Should be broadcastable to batch_shape + (K,).
width (Tensor) – Spot width. Should be broadcastable to batch_shape + (K,).
x (Tensor) – Spot center on x-axis. Should be broadcastable to batch_shape + (K,).
y (Tensor) – Spot center on y-axis. Should be broadcastable to batch_shape + (K,).
target_locs (Tensor) – Target location. Should have the rightmost size 2 correspondnig to locations on x- and y-axes, and be broadcastable to batch_shape + (2,).
background (Tensor) – Background intensity. Should be broadcastable to batch_shape.
gain (Tensor) – Camera gain.
offset_samples (Tensor) – Offset samples from the empirical distribution.
offset_logits (Tensor) – Offset log weights corresponding to the offset samples.
P (int) – Number of pixels along the axis.
m (Optional[Tensor]) – Spot presence indicator. Should be broadcastable to batch_shape + (K,).
alpha (Optional[Tensor]) – Signal cross-talk coefficient matrix. Should be broadcastable to (Q, C).
use_pykeops (bool) – Use pykeops as backend to marginalize out offset.

rsample(sample_shape=torch.Size([]))[source]#: Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

log_prob(value)[source]#

Returns the log of the probability density/mass function evaluated at value.

Args:: value (Tensor):

Distribution utility functions#

tapqir.distributions.util.gaussian_spots(height, width, x, y, target_locs, P, m=None)[source]#

Calculates ideal shape of the 2D-Gaussian spots given spot parameters and target positions.

\[\mu^S_{\mathsf{pixelX}(i), \mathsf{pixelY}(j)} = \dfrac{m \cdot h}{2 \pi w^2} \exp{\left( -\dfrac{(i-x-x^\mathsf{target})^2 + (j-y-y^\mathsf{target})^2}{2 w^2} \right)}\]

Parameters:

height (Tensor) – Integrated spot intensity. Should be broadcastable to batch_shape.
width (Tensor) – Spot width. Should be broadcastable to batch_shape.
x (Tensor) – Spot center on x-axis. Should be broadcastable to batch_shape.
y (Tensor) – Spot center on y-axis. Should be broadcastable to batch_shape.
target_locs (Tensor) – Target location. Should have the rightmost size 2 correspondnig to locations on x- and y-axes, and be broadcastable to batch_shape + (2,).
P (int) – Number of pixels along the axis.
m (Optional[Tensor]) – Spot presence indicator. Should be broadcastable to batch_shape.

Return type:

Tensor

Returns:

A tensor of a shape batch_shape + (P, P) representing 2D-Gaussian spots.

tapqir.distributions.util.truncated_poisson_probs(lamda, K)[source]#

Probability of the number of non-specific spots.

\[\begin{split}\mathbf{TruncatedPoisson}(\lambda, K) = \begin{cases} 1 - e^{-\lambda} \sum_{i=0}^{K-1} \dfrac{\lambda^i}{i!} & \textrm{if } k = K \\ \dfrac{\lambda^k e^{-\lambda}}{k!} & \mathrm{otherwise} \end{cases}\end{split}\]

Parameters:

lamda (Tensor) – Average rate of target-nonspecific binding.
K (int) – Maximum number of spots that can be present in a single image.

Return type:

Tensor

Returns:

A tensor of a shape lamda.shape + (K+1,) of probabilities.

tapqir.distributions.util.probs_m(lamda, K)[source]#

Prior spot presence probability $p(m | \theta, \lambda)$.

\[\begin{split}p(m_{\mathsf{spot}(k)} | \theta, \lambda) = \begin{cases} \mathbf{Bernoulli}(1) & \text{$\theta = k$} \\ \mathbf{Bernoulli} \left( \sum_{l=1}^K \dfrac{l \cdot \mathbf{TruncPoisson}(l; \lambda, K)}{K} \right) & \text{$\theta = 0$} \rule{0pt}{4ex} \\ \mathbf{Bernoulli} \left( \sum_{l=1}^{K-1} \dfrac{l \cdot \mathbf{TruncPoisson}(l; \lambda, K-1)}{K-1} \right) & \text{otherwise} \rule{0pt}{4ex} \end{cases}\end{split}\]

Parameters:

lamda (Tensor) – Average rate of target-nonspecific binding.
K (int) – Maximum number of spots that can be present in a single image.

Return type:

Tensor

Returns:

A tensor of a shape lamda.shape + (1 + K, K) of probabilities.

tapqir.distributions.util.expand_offtarget(probs)[source]#

Expand state probability probs (e.g., $\pi$ or $A$) to off-target AOIs.

\[\begin{split}p(\mathsf{state}) = \begin{cases} \mathbf{Categorical}\left( \mathsf{probs} \right) & \textrm{if on-target} \\ \mathbf{Categorical}\left( \left[ 1, 0, \dots, 0 \right] \right) & \textrm{if off-target} \end{cases}\end{split}\]

Parameters:: probs (Tensor) – Probability of target-specific states.
Return type:: Tensor
Returns:: A tensor of a shape probs.shape + (2,) of probabilities for off-target (0) and on-target (1) AOI.

tapqir.distributions.util.probs_theta(K, device)[source]#

Prior probability for target-specific spot index $p(\theta | z)$.

\[\begin{split}p(\theta | z) = \begin{cases} \mathbf{Categorical}\left( \begin{bmatrix} 0 & 1/K & \dots & 1/K \end{bmatrix} \right) & z > 0 \\ \mathbf{Categorical}\left( \begin{bmatrix} 1 & 0 & \dots & 0 \end{bmatrix} \right) & z = 0 \end{cases}\end{split}\]

Parameters:: K (int) – Maximum number of spots that can be present in a single image.
Return type:: Tensor
Returns:: A tensor of a shape (2, 1 + K) of $\theta$ probabilities for spot-absent (0) and spot-present (1) cases.

Utility functions#

class tapqir.utils.dataset.OffsetData(samples, weights)[source]#

class tapqir.utils.dataset.CosmosDataset(images, xy, is_ontarget, mask=None, labels=None, offset_samples=None, offset_weights=None, device=device(type='cpu'), time1=None, ttb=None, name=None, channels=None)[source]#

Cosmos Dataset

property N: int#: Number of on-target AOIs.

property Nc: int#: Number of off-target AOIs.

property Nt: int#: Total number of AOIs.

property F: int#: Number of frames.

property C: int#: Number of color-channels.

property P: int#: Number of pixels.

property x: Tensor#: Target location on the x-axis.

property y: Tensor#: Target location on the y-axis.

tapqir.utils.imscroll.count_intervals(labels)[source]#

Count binding interval data.

Co-localization and absent intervals are coded as -3 and -2 respectively when they are the first (or only) interval in a record, 3 and 2 when they are the last interval in a record and 1 and 0 elsewhere.

Reference:

@article{friedman2015multi,
  title={Multi-wavelength single-molecule fluorescence analysis of transcription mechanisms},
  author={Friedman, Larry J and Gelles, Jeff},
  journal={Methods},
  volume={86},
  pages={27--36},
  year={2015},
  publisher={Elsevier}
}

tapqir.utils.imscroll.time_to_first_binding(labels)[source]#

Measure the time elapsed prior to the first binding.

Time-to-first binding for a binary data:

$\mathrm{ttfb} = \sum_{f=1}^{F-1} f z_{n,f} \prod_{f^\prime=0}^{f-1} (1 - z_{n,f^\prime}) + F \prod_{f^\prime=0}^{F-1} (1 - z_{n,f^\prime})$

Expected value of the time-to-first binding:

$\mathbb{E}[\mathrm{ttfb}] = \sum_{f=1}^{F-1} f q(z_{n,f}=1) \prod_{f^\prime=f-1}^{f-1} q(z_{n,f^\prime}=0) + F \prod_{f^\prime=0}^{F-1} q(z_{n,f^\prime}=0)$

Reference:

@article{friedman2015multi,
  title={Multi-wavelength single-molecule fluorescence analysis of transcription mechanisms},
  author={Friedman, Larry J and Gelles, Jeff},
  journal={Methods},
  volume={86},
  pages={27--36},
  year={2015},
  publisher={Elsevier}
}

tapqir.utils.imscroll.association_rate(labels)[source]#: Compute the on-rate from the binary data assuming a two-state HMM model.

tapqir.utils.imscroll.dissociation_rate(labels)[source]#: Compute the off-rate from the binary data assuming a two-state HMM model.

tapqir.utils.imscroll.bootstrap(samples, estimator, repetitions=1000, probs=0.68)[source]#: Estimate the confidence interval of an estimator by constructing approximating distributions using the bootstrap method (resampling with replacement).

tapqir.utils.imscroll.posterior_estimate(dist, estimator, repetitions=1000, probs=0.68)[source]#: A version of bootstrapping method where samples are first drawn from a distribution and then resampled with replacement.

tapqir.utils.imscroll.sample_and_bootstrap(dist, estimator, preprocess=None, repetitions=1000, probs=0.68)[source]#: A version of bootstrapping method where samples are first drawn from a distribution and then resampled with replacement.

Parameter	Shape	Description
`gain` - \(g\)	(1,)	camera gain
`proximity` - \(\sigma^{xy}\)	(1,)	proximity
`lamda` - \(\lambda\)	(1,)	average rate of target-nonspecific binding
`pi` - \(\pi\)	(1,)	average binding probability of target-specific binding
`background` - \(b\)	(N, F)	background intensity
`z` - \(z\)	(N, F)	target-specific spot presence
`theta` - \(\theta\)	(N, F)	target-specific spot index
`m` - \(m\)	(K, N, F)	spot presence indicator
`height` - \(h\)	(K, N, F)	spot intensity
`width` - \(w\)	(K, N, F)	spot width
`x` - \(x\)	(K, N, F)	spot position on x-axis
`y` - \(y\)	(K, N, F)	spot position on y-axis
`data` - \(D\)	(N, F, P, P)	observed images