Select a minimum set of genomes that best represent the gene content of the pangenome — get_pangenome_representatives • pdtools

Select a minimum set of genomes that best represent the gene content of the pangenome

Usage

get_pangenome_representatives(
  pan_mat,
  desired_coverage = 0.95,
  SEED = 3,
  verbose = FALSE
)

Arguments

pan_mat: a presence absence matrix of 1/0, rows are genomes, columns are genes
desired_coverage: proportion of the pangenome's gene content you want the reduced set to contain (.95)
SEED: random seed to use when selecting the first genome of the collection.
verbose: T/F provides updates via print statements

Value

returns a list of length 3. 1:names of the genomes, 2:scores for each iteration , 3:proportion coverage for each iteration

Examples

# this example pangenome has 100 genomes with 1000 total genes
# ~5 genomes can provide > 95% gene coverage
pan_reps <- get_pangenome_representatives(example_pangenome_matrix)
pan_reps
#> [[1]]
#> [1] "genome_60" "genome_55" "genome_19" "genome_72"
#> 
#> [[2]]
#> [1] 778 877 923 951
#> 
#> [[3]]
#> [1] 0.778 0.877 0.923 0.951
#>