Title: | Quantify Disease Transmission Within and Between Population Groups |
---|---|
Description: | A simple tool to quantify the amount of transmission of an infectious disease of interest occurring within and between population groups. 'bumblebee' uses counts of observed directed transmission pairs, identified phylogenetically from deep-sequence data or from epidemiological contacts, to quantify transmission flows within and between population groups accounting for sampling heterogeneity. Population groups might include: geographical areas (e.g. communities, regions), demographic groups (e.g. age, gender) or arms of a randomized clinical trial. See the 'bumblebee' website for statistical theory, documentation and examples <https://magosil86.github.io/bumblebee/>. |
Authors: | Lerato E Magosi [aut] |
Maintainer: | Lerato E Magosi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0 |
Built: | 2025-02-01 03:42:17 UTC |
Source: | https://github.com/magosil86/bumblebee |
Counts of directed HIV transmission pairs observed within and between intervention and control communities in the 30-community BCPP/Ya Tsie HIV prevention trial in Botswana (2013-2018). The Botswana -Ya Tsie trial was a pair-matched community randomized trial that evaluated the effect of a universal HIV test and treat intervention in reducing population-level incidence. For further details see references and: https://magosil86.github.io/bumblebee/.
counts_hiv_transmission_pairs
counts_hiv_transmission_pairs
A data frame:
Name of population group 1
Name of population group 1
Number of observed directed transmission pairs between samples from population groups 1 and 2
https://magosil86.github.io/bumblebee/
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
estimate_c_hat
Estimates probability of clusteringThis function estimates c_hat
, the probability that a randomly
selected pathogen sequence in one population group links to at least
one pathogen sequence in another population group.
estimate_c_hat(df_counts_and_p_hat, ...) ## Default S3 method: estimate_c_hat(df_counts_and_p_hat, ...)
estimate_c_hat(df_counts_and_p_hat, ...) ## Default S3 method: estimate_c_hat(df_counts_and_p_hat, ...)
df_counts_and_p_hat |
A data.frame returned by the function: |
... |
Further arguments. |
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
c_hat, Probability that a randomly selected pathogen sequence in one population group links to at least one pathogen sequence in another population group i.e. probability of clustering
default
: Estimates probability of clustering
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
See estimate_p_hat
to prepare input data to estimate c_hat
library(bumblebee) library(dplyr) # Estimate the probability of clustering between individuals from two population groups of interest # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs, ?sampling_frequency and ?estimated_hiv_transmission_flows # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs within and # between intervention and control communities in the BCPP/Ya Tsie trial, sampling # information and the probability of linkage between individuals sampled from # intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate c_hat results_estimate_c_hat <- estimate_c_hat(df_counts_and_p_hat = results_estimate_p_hat) # View results results_estimate_c_hat
library(bumblebee) library(dplyr) # Estimate the probability of clustering between individuals from two population groups of interest # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs, ?sampling_frequency and ?estimated_hiv_transmission_flows # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs within and # between intervention and control communities in the BCPP/Ya Tsie trial, sampling # information and the probability of linkage between individuals sampled from # intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate c_hat results_estimate_c_hat <- estimate_c_hat(df_counts_and_p_hat = results_estimate_p_hat) # View results results_estimate_c_hat
estimate_multinom_ci
Estimates confidence intervals for transmission flowsThis function computes simultaneous confidence intervals at the 5% significance level for estimated transmission flows. Available methods for computing confidence intervals are: Goodman, Goodman with a continuity correction, Sison-Glaz and Queensbury-Hurst.
estimate_multinom_ci(df_theta_hat, ...) ## Default S3 method: estimate_multinom_ci(df_theta_hat, detailed_report = FALSE, ...)
estimate_multinom_ci(df_theta_hat, ...) ## Default S3 method: estimate_multinom_ci(df_theta_hat, detailed_report = FALSE, ...)
df_theta_hat |
A data.frame returned by the function: |
... |
Further arguments. |
detailed_report |
A boolean value to produce detailed output of the analysis. (Default is |
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
est_linkedpairs_in_population, Estimated transmission pairs between population groups 1 and 2
theta_hat, Estimated transmission flows or relative probability of transmission within and between population groups 1 and 2 adjusted for sampling heterogeneity. More precisely, the conditional probability that a pair of pathogen sequences is from a specific population group pairing given that the pair is linked.
obs_trm_pairs_est_goodman, Point estimate, Goodman method Confidence intervals for observed transmission pairs
obs_trm_pairs_lwr_ci_goodman, Lower bound of Goodman confidence interval
obs_trm_pairs_upr_ci_goodman, Upper bound of Goodman confidence interval
est_goodman, Point estimate, Goodman method Confidence intervals for estimated transmission flows
lwr_ci_goodman, Lower bound of Goodman confidence interval
upr_ci_goodman, Upper bound of Goodman confidence interval
The following additional fields are returned if the detailed_report flag is set
est_goodman_cc, Point estimate, Goodman method Confidence intervals with continuity correction
lwr_ci_goodman_cc, Lower bound of Goodman confidence interval
upr_ci_goodman_cc, Upper bound of Goodman confidence interval
est_sisonglaz, Point estimate, Sison-Glaz method Confidence intervals
lwr_ci_sisonglaz, Lower bound of Sison-Glaz confidence interval
upr_ci_sisonglaz, Upper bound of Sison-Glaz confidence interval
est_qhurst_acswr, Point estimate, Queensbury-Hurst method Confidence intervals via ACSWR r package
lwr_ci_qhurst_acswr, Lower bound of Queensbury-Hurst confidence interval
upr_ci_qhurst_acswr, Upper bound of Queensbury-Hurst confidence interval
est_qhurst_coinmind, Point estimate, Queensbury-Hurst method Confidence intervals via CoinMinD r package
lwr_ci_qhurst_coinmind, Lower bound of Queensbury-Hurst confidence interval
upr_ci_qhurst_coinmind, Upper bound of Queensbury-Hurst confidence interval
lwr_ci_qhurst_adj_coinmind, Lower bound of Queensbury-Hurst confidence interval adjusted
upr_ci_qhurst_adj_coinmind, Upper bound of Queensbury-Hurst confidence interval adjusted
default
: Estimates confidence intervals for transmission flows
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Goodman, L. A. On Simultaneous Confidence Intervals for Multinomial Proportions Technometrics, 1965. 7, 247-254.
Cherry, S., A Comparison of Confidence Interval Methods for Habitat Use-Availability Studies. The Journal of Wildlife Management, 1996. 60(3): p. 653-658.
Sison, C.P and Glaz, J. Simultaneous confidence intervals and sample size determination for multinomial proportions. Journal of the American Statistical Association, 1995. 90:366-369.
Glaz, J., Sison, C.P. Simultaneous confidence intervals for multinomial proportions. Journal of Statistical Planning and Inference, 1999. 82:251-262.
May, W.L., Johnson, W.D. Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells. Journal of Statistical Software, 2000. 5(6). Paper and code available at https://www.jstatsoft.org/v05/i06.
Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
Ratmann, O., et al., Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis. Nature Communications, 2019. 10(1): p. 1411.
Wymant, C., et al., PHYLOSCANNER: Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Molecular Biology and Evolution, 2017. 35(3): p. 719-733.
See estimate_theta_hat
to prepare input data to estimate confidence intervals.
To learn more about the Goodman and Sison-Glaz confidence interval methods see \code{\link[DescTools]{MultinomCI}}. For Queensbury-Hurst confidence intervals see \code{\link[ACSWR]{QH_CI}} and \code{\link[CoinMinD]{QH}}
library(bumblebee) library(dplyr) # Compute confidence intervals for estimated transmission flows # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The data comprises counts of observed directed HIV transmission pairs between individuals # sampled from intervention and control communities (i.e. num_linked_pairs_observed); # and the estimated HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie trial population adjusted for sampling heterogneity # (i.e. \code{est_linkedpairs_in_population}). See ?estimate_theta_hat() for details on # computing \code{est_linkedpairs_in_population} and \code{theta_hat}. results_estimate_theta_hat <- estimated_hiv_transmission_flows[, c(1:13)] results_estimate_theta_hat # Compute Goodman confidence intervals (Default) results_estimate_multinom_ci <- estimate_multinom_ci( df_theta_hat = results_estimate_theta_hat, detailed_report = FALSE) # View results results_estimate_multinom_ci # Compute Goodman, Sison-Glaz and Queensbury-Hurst confidence intervals results_estimate_multinom_ci_detailed <- estimate_multinom_ci( df_theta_hat = results_estimate_theta_hat, detailed_report = TRUE) # View results results_estimate_multinom_ci_detailed
library(bumblebee) library(dplyr) # Compute confidence intervals for estimated transmission flows # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The data comprises counts of observed directed HIV transmission pairs between individuals # sampled from intervention and control communities (i.e. num_linked_pairs_observed); # and the estimated HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie trial population adjusted for sampling heterogneity # (i.e. \code{est_linkedpairs_in_population}). See ?estimate_theta_hat() for details on # computing \code{est_linkedpairs_in_population} and \code{theta_hat}. results_estimate_theta_hat <- estimated_hiv_transmission_flows[, c(1:13)] results_estimate_theta_hat # Compute Goodman confidence intervals (Default) results_estimate_multinom_ci <- estimate_multinom_ci( df_theta_hat = results_estimate_theta_hat, detailed_report = FALSE) # View results results_estimate_multinom_ci # Compute Goodman, Sison-Glaz and Queensbury-Hurst confidence intervals results_estimate_multinom_ci_detailed <- estimate_multinom_ci( df_theta_hat = results_estimate_theta_hat, detailed_report = TRUE) # View results results_estimate_multinom_ci_detailed
estimate_p_hat
Estimates probability of linkage between two individualsThis function computes the probability that pathogen sequences from two individuals randomly sampled from their respective population groups (e.g. communities) are linked.
estimate_p_hat(df_counts, ...) ## Default S3 method: estimate_p_hat(df_counts, ...)
estimate_p_hat(df_counts, ...) ## Default S3 method: estimate_p_hat(df_counts, ...)
df_counts |
A data.frame returned by the function: |
... |
Further arguments. |
For a population group pairing ,
p_hat
is computed as the
fraction of distinct possible pairs between samples from groups and
that are linked. Note: The number of distinct possible
pairs in the sample is the product of sampled individuals in groups
and
. If
, then the distinct possible pairs is the number
of individuals sampled in population group
choose 2. See bumblebee
website for more details https://magosil86.github.io/bumblebee/.
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
default
: Estimates probability of linkage between two individuals
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
See prep_p_hat
to prepare input data to estimate p_hat
library(bumblebee) library(dplyr) # Estimate the probability of linkage between two individuals randomly sampled from # two population groups of interest. # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Prepare input to estimate p_hat # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency results_prep_p_hat <- prep_p_hat(group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = FALSE) # View results results_prep_p_hat # Estimate p_hat results_estimate_p_hat <- estimate_p_hat(df_counts = results_prep_p_hat) # View results results_estimate_p_hat
library(bumblebee) library(dplyr) # Estimate the probability of linkage between two individuals randomly sampled from # two population groups of interest. # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Prepare input to estimate p_hat # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency results_prep_p_hat <- prep_p_hat(group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = FALSE) # View results results_prep_p_hat # Estimate p_hat results_estimate_p_hat <- estimate_p_hat(df_counts = results_prep_p_hat) # View results results_estimate_p_hat
estimate_prob_group_pairing_and_linked
Estimates joint probability of linkageThis function computes the joint probability that a pair of pathogen sequences is from a specific population group pairing and linked.
estimate_prob_group_pairing_and_linked( df_counts_and_p_hat, individuals_population_in, ... ) ## Default S3 method: estimate_prob_group_pairing_and_linked( df_counts_and_p_hat, individuals_population_in, verbose_output = FALSE, ... )
estimate_prob_group_pairing_and_linked( df_counts_and_p_hat, individuals_population_in, ... ) ## Default S3 method: estimate_prob_group_pairing_and_linked( df_counts_and_p_hat, individuals_population_in, verbose_output = FALSE, ... )
df_counts_and_p_hat |
A data.frame returned by function: |
individuals_population_in |
A numeric vector of the estimated number of individuals per population group |
... |
Further arguments. |
verbose_output |
A boolean value to display intermediate output.
(Default is |
For a population group pairing , the joint probability that a pair
is from groups
and is linked is computed as
where,
N_uv = N_u * N_v: maximum distinct possible pairs in population
p_hat_uv: probability of linkage between two individuals randomly sampled
from groups and
N choose 2 or (N * (N - 1))/2 : all distinct possible pairs in population.
See bumblebee website for more details https://magosil86.github.io/bumblebee/.
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
prob_group_pairing_and_linked, Probability that a pair of pathogen sequences is from a specific population group pairing and is linked
default
: Estimates joint probability of linkage
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
See estimate_p_hat
to prepare input data to estimate prob_group_pairing_and_linked
library(bumblebee) library(dplyr) # Estimate joint probability that a pair is from a specific group pairing and linked # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs # within and between intervention and control communities in the BCPP/Ya Tsie # trial, sampling information and the probability of linkage between individuals # sampled from intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate prob_group_pairing_and_linked results_prob_group_pairing_and_linked <- estimate_prob_group_pairing_and_linked( df_counts_and_p_hat = results_estimate_p_hat, individuals_population_in = sampling_frequency$number_population) # View results results_prob_group_pairing_and_linked
library(bumblebee) library(dplyr) # Estimate joint probability that a pair is from a specific group pairing and linked # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs # within and between intervention and control communities in the BCPP/Ya Tsie # trial, sampling information and the probability of linkage between individuals # sampled from intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate prob_group_pairing_and_linked results_prob_group_pairing_and_linked <- estimate_prob_group_pairing_and_linked( df_counts_and_p_hat = results_estimate_p_hat, individuals_population_in = sampling_frequency$number_population) # View results results_prob_group_pairing_and_linked
estimate_theta_hat
Estimates conditional probability of linkage (transmission flows)This function estimates theta_hat
, the relative probability of
transmission within and between population groups accounting for variable
sampling rates among population groups. This relative probability is also
refferred to as transmission flows.
estimate_theta_hat(df_counts_and_p_hat, ...) ## Default S3 method: estimate_theta_hat(df_counts_and_p_hat, ...)
estimate_theta_hat(df_counts_and_p_hat, ...) ## Default S3 method: estimate_theta_hat(df_counts_and_p_hat, ...)
df_counts_and_p_hat |
A data.frame returned by the function: |
... |
Further arguments. |
For a population group pairing , the estimated transmission flows
within and between population groups
and
, are represented by
the vector theta_hat,
and are computed as
See bumblebee website for more details https://magosil86.github.io/bumblebee/.
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
est_linkedpairs_in_population, Estimated transmission pairs between population groups 1 and 2
theta_hat, Estimated transmission flows or relative probability of transmission within and between population groups 1 and 2 adjusted for sampling heterogeneity. More precisely, the conditional probability that a pair of pathogen sequences is from a specific population group pairing given that the pair is linked.
default
: Estimates conditional probability of linkage (transmission flows)
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
See estimate_p_hat
to prepare input data to estimate theta_hat
library(bumblebee) library(dplyr) # Estimate transmission flows within and between population groups accounting for variable # sampling among population groups # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs within # and between intervention and control communities in the BCPP/Ya Tsie trial, # sampling information and the probability of linkage between individuals sampled # from intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate theta_hat results_estimate_theta_hat <- estimate_theta_hat(df_counts_and_p_hat = results_estimate_p_hat) # View results results_estimate_theta_hat
library(bumblebee) library(dplyr) # Estimate transmission flows within and between population groups accounting for variable # sampling among population groups # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # Load and view data # # The input data comprises counts of observed directed HIV transmission pairs within # and between intervention and control communities in the BCPP/Ya Tsie trial, # sampling information and the probability of linkage between individuals sampled # from intervention and control communities (i.e. \code{p_hat}) # # See ?estimate_p_hat() for details on estimating p_hat results_estimate_p_hat <- estimated_hiv_transmission_flows[, c(1:10)] results_estimate_p_hat # Estimate theta_hat results_estimate_theta_hat <- estimate_theta_hat(df_counts_and_p_hat = results_estimate_p_hat) # View results results_estimate_theta_hat
estimate_transmission_flows_and_ci
Estimates transmission flows and corresponding confidence intervalsThis function estimates transmission flows or the relative probability of transmission within and between population groups accounting for variable sampling among population groups.
Corresponding confidence intervals are provided with the following methods: Goodman, Goodman with a continuity correction, Sison-Glaz and Queensbury-Hurst.
estimate_transmission_flows_and_ci( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, ... ) ## Default S3 method: estimate_transmission_flows_and_ci( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, detailed_report = FALSE, verbose_output = FALSE, ... )
estimate_transmission_flows_and_ci( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, ... ) ## Default S3 method: estimate_transmission_flows_and_ci( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, detailed_report = FALSE, verbose_output = FALSE, ... )
group_in |
A character vector indicating population groups/strata (e.g. communities, age-groups, genders or trial arms) between which transmission flows will be evaluated, |
individuals_sampled_in |
A numeric vector indicating the number of individuals sampled per population group, |
individuals_population_in |
A numeric vector of the estimated number of individuals per population group, |
linkage_counts_in |
A data.frame of counts of linked pairs identified between samples of each population
group pairing of interest.
|
... |
Further arguments. |
detailed_report |
A boolean value to produce detailed output of the analysis |
verbose_output |
A boolean value to display intermediate output (Default is |
Counts of observed directed transmission pairs can be obtained from deep-sequence phylogenetic data (via phyloscanner) or from known epidemiological contacts. Note: Deep-sequence data is also commonly referred to as high-throughput or next-generation sequence data. See references to learn more about phyloscanner.
The estimate_transmission_flows_and_ci()
function is a
wrapper function that calls the following functions:
The prep_p_hat()
function to determine all possible
combinations of the population groups/strata provided by
the user. Type ?prep_p_hat()
at R prompt to learn
more.
The estimate_p_hat()
function to compute the
probability of linkage between pathogen sequences from
two individuals randomly sampled from their respective
population groups. Type ?estimate_p_hat()
at R
prompt to learn more.
The estimate_theta_hat()
function that uses
p_hat
estimates to compute the conditional
probability of linkage that a pair of pathogen sequences
is from a specific population group pairing given that
the pair is linked. The conditional probability,
theta_hat
represents transmission flows or
the relative probability of transmission within and between
population groups adjusted for variable sampling among
population groups. Type ?estimate_theta_hat()
at R
prompt to learn more.
The estimate_multinom_ci()
function to estimate
corresponding confidence intervals for the computed
transmission flows.
Further to estimating transmission flows and corresponding confidence
intervals the estimate_transmission_flows_and_ci()
function provides
estimates for:
prob_group_pairing_and_linked
, the joint probability that a
pair of pathogen sequences is from a specific population group
pairing and linked. Type ?estimate_prob_group_pairing_and_linked()
at R prompt to learn more.
c_hat
, the probability of clustering that a pathogen sequence
from a population group of interest is linked to one or more
pathogen sequences in another population group of interest. Type
?estimate_c_hat()
at R prompt to learn more.
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
p_hat, Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
est_linkedpairs_in_population, Estimated transmission pairs between population groups 1 and 2
theta_hat, Estimated transmission flows or relative probability of transmission within and between population groups 1 and 2 adjusted for sampling heterogeneity. More precisely, the conditional probability that a pair of pathogen sequences is from a specific population group pairing given that the pair is linked.
obs_trm_pairs_est_goodman, Point estimate, Goodman method Confidence intervals for observed transmission pairs
obs_trm_pairs_lwr_ci_goodman, Lower bound of Goodman confidence interval
obs_trm_pairs_upr_ci_goodman, Upper bound of Goodman confidence interval
est_goodman, Point estimate, Goodman method Confidence intervals for estimated transmission flows
lwr_ci_goodman, Lower bound of Goodman confidence interval
upr_ci_goodman, Upper bound of Goodman confidence interval
The following additional fields are returned if the detailed_report flag is set
prob_group_pairing_and_linked, Probability that a pair of pathogen sequences is from a specific population group pairing and is linked
c_hat, Probability that a randomly selected pathogen sequence in one population group links to at least one pathogen sequence in another population group i.e. probability of clustering
est_goodman_cc, Point estimate, Goodman method Confidence intervals with continuity correction
lwr_ci_goodman_cc, Lower bound of Goodman confidence interval
upr_ci_goodman_cc, Upper bound of Goodman confidence interval
est_sisonglaz, Point estimate, Sison-Glaz method Confidence intervals
lwr_ci_sisonglaz, Lower bound of Sison-Glaz confidence interval
upr_ci_sisonglaz, Upper bound of Sison-Glaz confidence interval
est_qhurst_acswr, Point estimate, Queensbury-Hurst method Confidence intervals via ACSWR r package
lwr_ci_qhurst_acswr, Lower bound of Queensbury-Hurst confidence interval
upr_ci_qhurst_acswr, Upper bound of Queensbury-Hurst confidence interval
est_qhurst_coinmind, Point estimate, Queensbury-Hurst method Confidence intervals via CoinMinD r package
lwr_ci_qhurst_coinmind, Lower bound of Queensbury-Hurst confidence interval
upr_ci_qhurst_coinmind, Upper bound of Queensbury-Hurst confidence interval
lwr_ci_qhurst_adj_coinmind, Lower bound of Queensbury-Hurst confidence interval adjusted
upr_ci_qhurst_adj_coinmind, Upper bound of Queensbury-Hurst confidence interval adjusted
default
: Estimates transmission flows and accompanying confidence intervals
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Carnegie, N.B., et al., Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data. PLoS Computational Biology, 2014. 10(1): p. e1003430.
Cherry, S., A Comparison of Confidence Interval Methods for Habitat Use-Availability Studies. The Journal of Wildlife Management, 1996. 60(3): p. 653-658.
Ratmann, O., et al., Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis. Nature Communications, 2019. 10(1): p. 1411.
Wymant, C., et al., PHYLOSCANNER: Inferring Transmission from Within- and Between-Host Pathogen Genetic Diversity. Molecular Biology and Evolution, 2017. 35(3): p. 719-733.
Goodman, L. A. On Simultaneous Confidence Intervals for Multinomial Proportions Technometrics, 1965. 7, 247-254.
Sison, C.P and Glaz, J. Simultaneous confidence intervals and sample size determination for multinomial proportions. Journal of the American Statistical Association, 1995. 90:366-369.
Glaz, J., Sison, C.P. Simultaneous confidence intervals for multinomial proportions. Journal of Statistical Planning and Inference, 1999. 82:251-262.
May, W.L., Johnson, W.D. Constructing two-sided simultaneous confidence intervals for multinomial proportions for small counts in a large number of cells. Journal of Statistical Software, 2000. 5(6). Paper and code available at https://www.jstatsoft.org/v05/i06.
estimate_theta_hat
and estimate_multinom_ci
to learn
more about estimation of transmission flows and confidence intervals.
library(bumblebee) library(dplyr) # Estimate transmission flows and confidence intervals # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency # Estimate transmission flows within and between intervention and control communities # accounting for variable sampling among population groups. # Basic output results_estimate_transmission_flows_and_ci <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs) # View results results_estimate_transmission_flows_and_ci # Retrieve dataset of estimated transmission flows dframe <- results_estimate_transmission_flows_and_ci$flows_dataset # Detailed output results_estimate_transmission_flows_and_ci_detailed <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, detailed_report = TRUE) # View results results_estimate_transmission_flows_and_ci_detailed # Retrieve dataset of estimated transmission flows dframe <- results_estimate_transmission_flows_and_ci_detailed$flows_dataset # Options: # To show intermediate output set verbose_output = TRUE # Basic output results_estimate_transmission_flows_and_ci <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = TRUE) # View results results_estimate_transmission_flows_and_ci # Detailed output results_estimate_transmission_flows_and_ci_detailed <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, detailed_report = TRUE, verbose_output = TRUE) # View results results_estimate_transmission_flows_and_ci_detailed
library(bumblebee) library(dplyr) # Estimate transmission flows and confidence intervals # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency # Estimate transmission flows within and between intervention and control communities # accounting for variable sampling among population groups. # Basic output results_estimate_transmission_flows_and_ci <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs) # View results results_estimate_transmission_flows_and_ci # Retrieve dataset of estimated transmission flows dframe <- results_estimate_transmission_flows_and_ci$flows_dataset # Detailed output results_estimate_transmission_flows_and_ci_detailed <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, detailed_report = TRUE) # View results results_estimate_transmission_flows_and_ci_detailed # Retrieve dataset of estimated transmission flows dframe <- results_estimate_transmission_flows_and_ci_detailed$flows_dataset # Options: # To show intermediate output set verbose_output = TRUE # Basic output results_estimate_transmission_flows_and_ci <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = TRUE) # View results results_estimate_transmission_flows_and_ci # Detailed output results_estimate_transmission_flows_and_ci_detailed <- estimate_transmission_flows_and_ci( group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, detailed_report = TRUE, verbose_output = TRUE) # View results results_estimate_transmission_flows_and_ci_detailed
Estimated HIV transmissions within and betweeen intervention and control communities in the BCPP/Ya Tsie trial population adjusted for variability in sampling.
estimated_hiv_transmission_flows
estimated_hiv_transmission_flows
A data frame:
Name of population group 1
Name of population group 2
Number of individuals sampled from population group 1
Number of individuals sampled from population group 2
Estimated number of individuals in population group 1
Estimated number of individuals in population group 2
Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
Number of distinct possible transmission pairs between individuals in population groups 1 and 2
Number of observed directed transmission pairs between samples from population groups 1 and 2
Probability that pathogen sequences from two individuals randomly sampled from their respective population groups are linked
Estimated transmission pairs between population groups 1 and 2
Estimated transmission flows or relative probability of transmission within and between population groups 1 and 2 adjusted for sampling heterogeneity. More precisely, the conditional probability that a pair of pathogen sequences is from a specific population group pairing given that the pair is linked.
Point estimate, Goodman method Confidence intervals for observed transmission pairs
Lower bound of Goodman confidence interval
Upper bound of Goodman confidence interval
Point estimate, Goodman method Confidence intervals for estimated transmission flows
Lower bound of Goodman confidence interval
Upper bound of Goodman confidence interval
Probability that a pair of pathogen sequences is from a specific population group pairing and is linked
Probability that a randomly selected pathogen sequence in one population group links to at least one pathogen sequence in another population group i.e. probability of clustering
Point estimate, Goodman method Confidence intervals with continuity correction
Lower bound of Goodman confidence interval
Upper bound of Goodman confidence interval
Point estimate, Sison-Glaz method Confidence intervals
Lower bound of Sison-Glaz confidence interval
Upper bound of Sison-Glaz confidence interval
Point estimate, Queensbury-Hurst method Confidence intervals via ACSWR r package
Lower bound of Queensbury-Hurst confidence interval
Upper bound of Queensbury-Hurst confidence interval
Point estimate, Queensbury-Hurst method Confidence intervals via CoinMinD r package
Lower bound of Queensbury-Hurst confidence interval
Upper bound of Queensbury-Hurst confidence interval
Lower bound of Queensbury-Hurst confidence interval adjusted
Upper bound of Queensbury-Hurst confidence interval adjusted
https://magosil86.github.io/bumblebee/
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
prep_p_hat
Prepares input data to estimate p_hat
This function generates variables required for estimating
p_hat
, the probability that pathogen sequences from
two individuals randomly sampled from their respective
population groups are linked. For a population group
pairing ,
prep_p_hat
determines all possible
group pairings i.e. .
prep_p_hat( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, ... ) ## Default S3 method: prep_p_hat( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, verbose_output = FALSE, ... )
prep_p_hat( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, ... ) ## Default S3 method: prep_p_hat( group_in, individuals_sampled_in, individuals_population_in, linkage_counts_in, verbose_output = FALSE, ... )
group_in |
A character vector indicating population groups/strata (e.g. communities, age-groups, genders or trial arms) between which transmission flows will be evaluated, |
individuals_sampled_in |
A numeric vector indicating the number of individuals sampled per population group, |
individuals_population_in |
A numeric vector of the estimated number of individuals per population group, |
linkage_counts_in |
A data.frame of counts of linked pairs identified
between samples of each population group pairing of interest.
|
... |
Further arguments. |
verbose_output |
A boolean value to display intermediate output.
(Default is |
Counts of observed directed transmission pairs can be obtained from deep-sequence phylogenetic data (via phyloscanner) or from known epidemiological contacts. Note: Deep-sequence data is also commonly referred to as high-throughput or next-generation sequence data. See references to learn more about phyloscanner.
Returns a data.frame containing:
H1_group, Name of population group 1
H2_group, Name of population group 2
number_hosts_sampled_group_1, Number of individuals sampled from population group 1
number_hosts_sampled_group_2, Number of individuals sampled from population group 2
number_hosts_population_group_1, Estimated number of individuals in population group 1
number_hosts_population_group_2, Estimated number of individuals in population group 2
max_possible_pairs_in_sample, Number of distinct possible transmission pairs between individuals sampled from population groups 1 and 2
max_possible_pairs_in_population, Number of distinct possible transmission pairs between individuals in population groups 1 and 2
num_linked_pairs_observed, Number of observed directed transmission pairs between samples from population groups 1 and 2
default
: Prepares input data to estimate p_hat
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.
Ratmann, O., et al., Inferring HIV-1 transmission networks and sources of epidemic spread in Africa with deep-sequence phylogenetic analysis. Nature Communications, 2019. 10(1): p. 1411.
Wymant, C., et al., PHYLOSCANNER: Inferring Transmission from Within and Between-Host Pathogen Genetic Diversity. Molecular Biology and Evolution, 2017. 35(3): p. 719-733.
library(bumblebee) library(dplyr) # Prepare input to estimate p_hat # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency results_prep_p_hat <- prep_p_hat(group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = TRUE) # View results results_prep_p_hat
library(bumblebee) library(dplyr) # Prepare input to estimate p_hat # We shall use the data of HIV transmissions within and between intervention and control # communities in the BCPP/Ya Tsie HIV prevention trial. To learn more about the data # ?counts_hiv_transmission_pairs and ?sampling_frequency # View counts of observed directed HIV transmissions within and between intervention # and control communities counts_hiv_transmission_pairs # View the estimated number of individuals with HIV in intervention and control # communities and the number of individuals sampled from each sampling_frequency results_prep_p_hat <- prep_p_hat(group_in = sampling_frequency$population_group, individuals_sampled_in = sampling_frequency$number_sampled, individuals_population_in = sampling_frequency$number_population, linkage_counts_in = counts_hiv_transmission_pairs, verbose_output = TRUE) # View results results_prep_p_hat
Estimated number of individuals with HIV in intervention and control communities of the BCPP/Ya Tsie trial, and the number of individuals sampled from each for HIV viral phylogenetic analysis.
sampling_frequency
sampling_frequency
A data frame:
Population group
Number of individuals sampled per population group
Estimated number of individuals in each population group
https://magosil86.github.io/bumblebee/
Magosi LE, et al., Deep-sequence phylogenetics to quantify patterns of HIV transmission in the context of a universal testing and treatment trial – BCPP/ Ya Tsie trial. To submit for publication, 2021.