Select a minimum set of genomes that best represent the gene content of the pangenome
Source:R/pangenome_tools.R
get_pangenome_representatives.Rd
Select a minimum set of genomes that best represent the gene content of the pangenome
Arguments
- pan_mat
a presence absence matrix of 1/0, rows are genomes, columns are genes
- desired_coverage
proportion of the pangenome's gene content you want the reduced set to contain (.95)
- SEED
random seed to use when selecting the first genome of the collection.
- verbose
T/F provides updates via print statements
Value
returns a list of length 3. 1:names of the genomes, 2:scores for each iteration , 3:proportion coverage for each iteration
Examples
# this example pangenome has 100 genomes with 1000 total genes
# ~5 genomes can provide > 95% gene coverage
pan_reps <- get_pangenome_representatives(example_pangenome_matrix)
pan_reps
#> [[1]]
#> [1] "genome_60" "genome_55" "genome_19" "genome_72"
#>
#> [[2]]
#> [1] 778 877 923 951
#>
#> [[3]]
#> [1] 0.778 0.877 0.923 0.951
#>