For Developers#

Models#

List of Tapqir models.

class tapqir.models.model.Model(S=1, K=2, Q=None, device='cpu', dtype='double', priors=None)[source]#

Bases: object

Base class for tapqir models.

Derived models must implement the methods

Parameters:
  • S (int) – Number of distinct molecular states for the binder molecules.

  • K (int) – Maximum number of spots that can be present in a single image.

  • Q (Optional[int]) – Number of fluorescent dyes.

  • device (str) – Computation device (cpu or gpu).

  • dtype (str) – Floating point precision.

  • priors (Optional[dict]) – Dictionary of parameters of prior distributions.

to(device, dtype='double')[source]#

Change tensor device and dtype.

Parameters:
  • device (str) – Computation device, either “gpu” or “cpu”.

  • dtype (str) – Floating point precision, either “double” or “float”.

Return type:

None

load(path, data_only=True)[source]#

Load data and optionally parameters from a specified path

Parameters:
  • path (Union[str, Path]) – Path to Tapqir analysis folder.

  • data_only (bool) – Load only data or both data and parameters.

Return type:

None

model()[source]#

Generative Model

guide()[source]#

Variational Distribution

TraceELBO(jit)[source]#

A trace implementation of ELBO-based SVI.

init_parameters()[source]#

Initialize variational parameters.

init(lr=0.005, nbatch_size=5, fbatch_size=512, jit=False)[source]#

Initialize SVI object.

Parameters:
  • lr (float) – Learning rate.

  • nbatch_size (int) – AOI batch size.

  • fbatch_size (int) – Frame batch size.

  • jit (bool) – Use JIT compiler.

Return type:

None

run(num_iter=0, progress_bar=<class 'tqdm.std.tqdm'>)[source]#

Run inference procedure for a specified number of iterations. If num_iter equals zero then run till model converges.

Parameters:

num_iter (int) – Number of iterations.

Return type:

None

save_checkpoint(writer=None)[source]#

Save checkpoint.

Parameters:

writer (Optional[SummaryWriter]) – SummaryWriter object.

load_checkpoint(path=None, param_only=False, warnings=False)[source]#

Load checkpoint.

Parameters:
  • path (Union[str, Path, None]) – Path to model checkpoint.

  • param_only (bool) – Load only parameters.

  • warnings (bool) – Give warnings if loaded model has not been fully trained.

compute_stats(CI=0.95, save_matlab=False)[source]#

Compute credible regions (CI) and other stats.

Parameters:
  • CI (float) – credible region.

  • save_matlab (bool) – Save output in Matlab format as well.

cosmos#

class tapqir.models.cosmos.cosmos(S=1, K=2, Q=None, device='cpu', dtype='double', use_pykeops=True, priors={'background_mean_std': 1000.0, 'background_std_std': 100.0, 'gain_std': 50.0, 'height_std': 10000.0, 'lamda_rate': 1.0, 'proximity_rate': 1.0, 'width_max': 2.25, 'width_min': 0.75})[source]#

Bases: Model

Multi-Color Time-Independent Colocalization Model

Reference:

  1. Ordabayev YA, Friedman LJ, Gelles J, Theobald DL. Bayesian machine learning analysis of single-molecule fluorescence colocalization images. eLife. 2022 March. doi: 10.7554/eLife.73860.

Parameters:
  • K (int) – Maximum number of spots that can be present in a single image.

  • device (str) – Computation device (cpu or gpu).

  • dtype (str) – Floating point precision.

  • use_pykeops (bool) – Use pykeops as backend to marginalize out offset.

  • priors (dict) – Dictionary of parameters of prior distributions.

model()[source]#

Generative Model

Model parameters:

Parameter

Shape

Description

gain - \(g\)

(1,)

camera gain

proximity - \(\sigma^{xy}\)

(1,)

proximity

lamda - \(\lambda\)

(1,)

average rate of target-nonspecific binding

pi - \(\pi\)

(1,)

average binding probability of target-specific binding

background - \(b\)

(N, F)

background intensity

z - \(z\)

(N, F)

target-specific spot presence

theta - \(\theta\)

(N, F)

target-specific spot index

m - \(m\)

(K, N, F)

spot presence indicator

height - \(h\)

(K, N, F)

spot intensity

width - \(w\)

(K, N, F)

spot width

x - \(x\)

(K, N, F)

spot position on x-axis

y - \(y\)

(K, N, F)

spot position on y-axis

data - \(D\)

(N, F, P, P)

observed images

Full joint distribution:

\[\begin{split}\begin{aligned} p(D, \phi) =~&p(g) p(\sigma^{xy}) p(\pi) p(\lambda) \prod_{\mathsf{AOI}} \left[ p(\mu^b) p(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} p(b | \mu^b, \sigma^b) p(z | \pi) p(\theta | z) \vphantom{\prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}}} \cdot \right. \right. \\ &\prod_{\mathsf{spot}} \left[ \vphantom{\prod_{F}} p(m | \theta, \lambda) p(h) p(w) p(x | \sigma^{xy}, \theta) p(y | \sigma^{xy}, \theta) \right] \left. \left. \prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} \sum_{\delta} p(\delta) p(D | \mu^I, g, \delta) \right] \right] \end{aligned}\end{split}\]

\(z\) and \(\theta\) marginalized joint distribution:

\[\begin{split}\begin{aligned} \sum_{z, \theta} p(D, \phi) =~&p(g) p(\sigma^{xy}) p(\pi) p(\lambda) \prod_{\mathsf{AOI}} \left[ p(\mu^b) p(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} p(b | \mu^b, \sigma^b) \sum_{z} p(z | \pi) \sum_{\theta} p(\theta | z) \vphantom{\prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}}} \cdot \right. \right. \\ &\prod_{\mathsf{spot}} \left[ \vphantom{\prod_{F}} p(m | \theta, \lambda) p(h) p(w) p(x | \sigma^{xy}, \theta) p(y | \sigma^{xy}, \theta) \right] \left. \left. \prod_{\substack{\mathsf{pixelX} \\ \mathsf{pixelY}}} \sum_{\delta} p(\delta) p(D | \mu^I, g, \delta) \right] \right] \end{aligned}\end{split}\]
guide()[source]#

Variational Distribution

\[\begin{split}\begin{aligned} q(\phi \setminus \{z, \theta\}) =~&q(g) q(\sigma^{xy}) q(\pi) q(\lambda) \cdot \\ &\prod_{\mathsf{AOI}} \left[ q(\mu^b) q(\sigma^b) \prod_{\mathsf{frame}} \left[ \vphantom{\prod_{F}} q(b) \prod_{\mathsf{spot}} q(m) q(h | m) q(w | m) q(x | m) q(y | m) \right] \right] \end{aligned}\end{split}\]
init_parameters()[source]#

Initialize variational parameters.

TraceELBO(jit=False)[source]#

A trace implementation of ELBO-based SVI that supports - exhaustive enumeration over discrete sample sites, and - local parallel sampling over any sample site in the guide.

property z_probs: Tensor#

Probability of there being a target-specific spot \(p(z=1)\)

property theta_probs: Tensor#

Posterior target-specific spot probability \(q(\theta = k)\) for \(k \in \{1, \dots, K\}\).

property m_probs: Tensor#

Posterior spot presence probability \(q(m=1)\).

property pspecific: Tensor#

Probability of there being a target-specific spot \(p(\mathsf{specific})\)

hmm#

class tapqir.models.hmm.hmm(S=1, K=2, device='cpu', dtype='double', use_pykeops=True, vectorized=True, priors={'background_mean_std': 1000.0, 'background_std_std': 100.0, 'gain_std': 50.0, 'height_std': 10000.0, 'lamda_rate': 1.0, 'proximity_rate': 1.0, 'width_max': 2.25, 'width_min': 0.75})[source]#

Bases: cosmos

Multi-Color Hidden Markov Colocalization Model

EXPERIMENTAL This model relies on Funsor backend.

Parameters:
  • K (int) – Maximum number of spots that can be present in a single image.

  • device (str) – Computation device (cpu or gpu).

  • dtype (str) – Floating point precision.

  • use_pykeops (bool) – Use pykeops as backend to marginalize out offset.

  • vectorized (bool) – Vectorize time-dimension.

  • priors (dict) – Dictionary of parameters of prior distributions.

model()[source]#

Generative Model

guide()[source]#

Variational Distribution

init_parameters()[source]#

Initialize variational parameters.

TraceELBO(jit=False)[source]#

A trace implementation of ELBO-based SVI that supports - exhaustive enumeration over discrete sample sites, and - local parallel sampling over any sample site in the guide.

property z_probs: Tensor#

Probability of there being a target-specific spot \(p(z=1)\)

property theta_probs: Tensor#

Posterior target-specific spot probability \(q(\theta = k, z=z_\mathsf{MAP})\).

property pspecific: Tensor#

Probability of there being a target-specific spot \(p(\mathsf{specific})\)

property m_probs: Tensor#

Posterior spot presence probability \(q(m=1, z=z_\mathsf{MAP})\).

Distributions#

KSMOGN#

class tapqir.distributions.ksmogn.KSMOGN(height, width, x, y, target_locs, background, gain, offset_samples, offset_logits, P, m=None, alpha=None, use_pykeops=True, validate_args=None)[source]#

Bases: TorchDistribution

K-Spots Marginalized Offset Gamma Noise Image Distribution.

\[\mu^S_{\mathsf{pixelX}(i), \mathsf{pixelY}(j)} = \dfrac{m \cdot h}{2 \pi w^2} \exp{\left( -\dfrac{(i-x-x^\mathsf{target})^2 + (j-y-y^\mathsf{target})^2}{2 w^2} \right)}\]
\[\mu^I = b + \sum_{\mathsf{spot}} \mu^S\]
\[p(D|\mu^I, g) = \sum_\delta p(\delta) p(D|\mu^I, g, \delta) = \sum_\delta \delta_\mathsf{weights} \cdot \mathrm{Gamma}(D - \delta_\mathsf{samples} | \mu^I, g)\]

Reference:

  1. Ordabayev YA, Friedman LJ, Gelles J, Theobald DL. Bayesian machine learning analysis of single-molecule fluorescence colocalization images. bioRxiv. 2021 Oct. doi: 10.1101/2021.09.30.462536.

Parameters:
  • height (Tensor) – Integrated spot intensity. Should be broadcastable to batch_shape + (K,).

  • width (Tensor) – Spot width. Should be broadcastable to batch_shape + (K,).

  • x (Tensor) – Spot center on x-axis. Should be broadcastable to batch_shape + (K,).

  • y (Tensor) – Spot center on y-axis. Should be broadcastable to batch_shape + (K,).

  • target_locs (Tensor) – Target location. Should have the rightmost size 2 correspondnig to locations on x- and y-axes, and be broadcastable to batch_shape + (2,).

  • background (Tensor) – Background intensity. Should be broadcastable to batch_shape.

  • gain (Tensor) – Camera gain.

  • offset_samples (Tensor) – Offset samples from the empirical distribution.

  • offset_logits (Tensor) – Offset log weights corresponding to the offset samples.

  • P (int) – Number of pixels along the axis.

  • m (Optional[Tensor]) – Spot presence indicator. Should be broadcastable to batch_shape + (K,).

  • alpha (Optional[Tensor]) – Signal cross-talk coefficient matrix. Should be broadcastable to (Q, C).

  • use_pykeops (bool) – Use pykeops as backend to marginalize out offset.

rsample(sample_shape=torch.Size([]))[source]#

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

log_prob(value)[source]#

Returns the log of the probability density/mass function evaluated at value.

Args:

value (Tensor):

Distribution utility functions#

tapqir.distributions.util.gaussian_spots(height, width, x, y, target_locs, P, m=None)[source]#

Calculates ideal shape of the 2D-Gaussian spots given spot parameters and target positions.

\[\mu^S_{\mathsf{pixelX}(i), \mathsf{pixelY}(j)} = \dfrac{m \cdot h}{2 \pi w^2} \exp{\left( -\dfrac{(i-x-x^\mathsf{target})^2 + (j-y-y^\mathsf{target})^2}{2 w^2} \right)}\]
Parameters:
  • height (Tensor) – Integrated spot intensity. Should be broadcastable to batch_shape.

  • width (Tensor) – Spot width. Should be broadcastable to batch_shape.

  • x (Tensor) – Spot center on x-axis. Should be broadcastable to batch_shape.

  • y (Tensor) – Spot center on y-axis. Should be broadcastable to batch_shape.

  • target_locs (Tensor) – Target location. Should have the rightmost size 2 correspondnig to locations on x- and y-axes, and be broadcastable to batch_shape + (2,).

  • P (int) – Number of pixels along the axis.

  • m (Optional[Tensor]) – Spot presence indicator. Should be broadcastable to batch_shape.

Return type:

Tensor

Returns:

A tensor of a shape batch_shape + (P, P) representing 2D-Gaussian spots.

tapqir.distributions.util.truncated_poisson_probs(lamda, K)[source]#

Probability of the number of non-specific spots.

\[\begin{split}\mathbf{TruncatedPoisson}(\lambda, K) = \begin{cases} 1 - e^{-\lambda} \sum_{i=0}^{K-1} \dfrac{\lambda^i}{i!} & \textrm{if } k = K \\ \dfrac{\lambda^k e^{-\lambda}}{k!} & \mathrm{otherwise} \end{cases}\end{split}\]
Parameters:
  • lamda (Tensor) – Average rate of target-nonspecific binding.

  • K (int) – Maximum number of spots that can be present in a single image.

Return type:

Tensor

Returns:

A tensor of a shape lamda.shape + (K+1,) of probabilities.

tapqir.distributions.util.probs_m(lamda, K)[source]#

Prior spot presence probability \(p(m | \theta, \lambda)\).

\[\begin{split}p(m_{\mathsf{spot}(k)} | \theta, \lambda) = \begin{cases} \mathbf{Bernoulli}(1) & \text{$\theta = k$} \\ \mathbf{Bernoulli} \left( \sum_{l=1}^K \dfrac{l \cdot \mathbf{TruncPoisson}(l; \lambda, K)}{K} \right) & \text{$\theta = 0$} \rule{0pt}{4ex} \\ \mathbf{Bernoulli} \left( \sum_{l=1}^{K-1} \dfrac{l \cdot \mathbf{TruncPoisson}(l; \lambda, K-1)}{K-1} \right) & \text{otherwise} \rule{0pt}{4ex} \end{cases}\end{split}\]
Parameters:
  • lamda (Tensor) – Average rate of target-nonspecific binding.

  • K (int) – Maximum number of spots that can be present in a single image.

Return type:

Tensor

Returns:

A tensor of a shape lamda.shape + (1 + K, K) of probabilities.

tapqir.distributions.util.expand_offtarget(probs)[source]#

Expand state probability probs (e.g., \(\pi\) or \(A\)) to off-target AOIs.

\[\begin{split}p(\mathsf{state}) = \begin{cases} \mathbf{Categorical}\left( \mathsf{probs} \right) & \textrm{if on-target} \\ \mathbf{Categorical}\left( \left[ 1, 0, \dots, 0 \right] \right) & \textrm{if off-target} \end{cases}\end{split}\]
Parameters:

probs (Tensor) – Probability of target-specific states.

Return type:

Tensor

Returns:

A tensor of a shape probs.shape + (2,) of probabilities for off-target (0) and on-target (1) AOI.

tapqir.distributions.util.probs_theta(K, device)[source]#

Prior probability for target-specific spot index \(p(\theta | z)\).

\[\begin{split}p(\theta | z) = \begin{cases} \mathbf{Categorical}\left( \begin{bmatrix} 0 & 1/K & \dots & 1/K \end{bmatrix} \right) & z > 0 \\ \mathbf{Categorical}\left( \begin{bmatrix} 1 & 0 & \dots & 0 \end{bmatrix} \right) & z = 0 \end{cases}\end{split}\]
Parameters:

K (int) – Maximum number of spots that can be present in a single image.

Return type:

Tensor

Returns:

A tensor of a shape (2, 1 + K) of \(\theta\) probabilities for spot-absent (0) and spot-present (1) cases.

Utility functions#

class tapqir.utils.dataset.OffsetData(samples, weights)[source]#
class tapqir.utils.dataset.CosmosDataset(images, xy, is_ontarget, mask=None, labels=None, offset_samples=None, offset_weights=None, device=device(type='cpu'), time1=None, ttb=None, name=None, channels=None)[source]#

Cosmos Dataset

property N: int#

Number of on-target AOIs.

property Nc: int#

Number of off-target AOIs.

property Nt: int#

Total number of AOIs.

property F: int#

Number of frames.

property C: int#

Number of color-channels.

property P: int#

Number of pixels.

property x: Tensor#

Target location on the x-axis.

property y: Tensor#

Target location on the y-axis.

tapqir.utils.imscroll.count_intervals(labels)[source]#

Count binding interval data.

Co-localization and absent intervals are coded as -3 and -2 respectively when they are the first (or only) interval in a record, 3 and 2 when they are the last interval in a record and 1 and 0 elsewhere.

Reference:

@article{friedman2015multi,
  title={Multi-wavelength single-molecule fluorescence analysis of transcription mechanisms},
  author={Friedman, Larry J and Gelles, Jeff},
  journal={Methods},
  volume={86},
  pages={27--36},
  year={2015},
  publisher={Elsevier}
}
tapqir.utils.imscroll.time_to_first_binding(labels)[source]#

Measure the time elapsed prior to the first binding.

Time-to-first binding for a binary data:

\(\mathrm{ttfb} = \sum_{f=1}^{F-1} f z_{n,f} \prod_{f^\prime=0}^{f-1} (1 - z_{n,f^\prime}) + F \prod_{f^\prime=0}^{F-1} (1 - z_{n,f^\prime})\)

Expected value of the time-to-first binding:

\(\mathbb{E}[\mathrm{ttfb}] = \sum_{f=1}^{F-1} f q(z_{n,f}=1) \prod_{f^\prime=f-1}^{f-1} q(z_{n,f^\prime}=0) + F \prod_{f^\prime=0}^{F-1} q(z_{n,f^\prime}=0)\)

Reference:

@article{friedman2015multi,
  title={Multi-wavelength single-molecule fluorescence analysis of transcription mechanisms},
  author={Friedman, Larry J and Gelles, Jeff},
  journal={Methods},
  volume={86},
  pages={27--36},
  year={2015},
  publisher={Elsevier}
}
tapqir.utils.imscroll.association_rate(labels)[source]#

Compute the on-rate from the binary data assuming a two-state HMM model.

tapqir.utils.imscroll.dissociation_rate(labels)[source]#

Compute the off-rate from the binary data assuming a two-state HMM model.

tapqir.utils.imscroll.bootstrap(samples, estimator, repetitions=1000, probs=0.68)[source]#

Estimate the confidence interval of an estimator by constructing approximating distributions using the bootstrap method (resampling with replacement).

tapqir.utils.imscroll.posterior_estimate(dist, estimator, repetitions=1000, probs=0.68)[source]#

A version of bootstrapping method where samples are first drawn from a distribution and then resampled with replacement.

tapqir.utils.imscroll.sample_and_bootstrap(dist, estimator, preprocess=None, repetitions=1000, probs=0.68)[source]#

A version of bootstrapping method where samples are first drawn from a distribution and then resampled with replacement.