Skip to contents

Select a minimum set of genomes that best represent the gene content of the pangenome

Usage

get_pangenome_representatives(
  pan_mat,
  desired_coverage = 0.95,
  SEED = 3,
  verbose = FALSE
)

Arguments

pan_mat

a presence absence matrix of 1/0, rows are genomes, columns are genes

desired_coverage

proportion of the pangenome's gene content you want the reduced set to contain (.95)

SEED

random seed to use when selecting the first genome of the collection.

verbose

T/F provides updates via print statements

Value

returns a list of length 3. 1:names of the genomes, 2:scores for each iteration , 3:proportion coverage for each iteration

Examples

# this example pangenome has 100 genomes with 1000 total genes
# ~5 genomes can provide > 95% gene coverage
pan_reps <- get_pangenome_representatives(example_pangenome_matrix)
pan_reps
#> [[1]]
#> [1] "genome_60" "genome_55" "genome_19" "genome_72"
#> 
#> [[2]]
#> [1] 778 877 923 951
#> 
#> [[3]]
#> [1] 0.778 0.877 0.923 0.951
#>