Skip to contents

All functions

build_ppanggolin_file_fastas()
Build a ppanggolin file from fastas
calculate_novelty()
Calculate genome 'novelty' from a list of selection sets returned by pick_derep_sets
cluster_genomes()
Cluster genomes at 4 levels from a pangenome gene PA matrix Needs parallelDist, igraph, genomes as rows and genes as columns?
country_vector
A named vector of country names
download_PDD_metadata()
Download Pathogen Detection metadata for a given organism
download_SNP_trees()
Download SNP trees from a dataframe containing SNP_tree_urls and dests
download_gbk_assembly_summary()
Convenience function to download the assembly_summary.txt file from genbank
download_genomes()
Download a set of specified genome files
download_most_recent_complete()
Download the most recent complete metadata for a specified organism
download_reference_genomes()
Download reference genomes
example_pan_dist
A distance matrix describing the jaccard distance between genomes in the provided example pangenome
example_pangenome_matrix
A gene presence absence matrix for a synthetic pangenome
extract_collection_agency()
extract collecting agency from several metadata fields
extract_consensus_ag_species()
Extract a consensus ag host species from metadata This is pretty crude and should not be relied on for 100% accuracy
extract_country()
Extract a standardized country name from geo_loc_tag column
extract_earliest_year()
return a Year column containing the earliest year from the available 'date' fields.
extract_state()
Extract the state name from the geo_loc_name column, this only makes sence for USA isolates
get_PDG_version()
get the version of the PDD metadata
get_organism_table()
Load the organism summary table
get_pangenome_representatives()
Select a minimum set of genomes that best represent the gene content of the pangenome
get_pangenome_representatives_jaccard()
Get a subset of a pangenome that tries to represent the gene presence absence diversity Selects a random genome, then selects the most distant genome from the selected genome then
klebsiella_example_dat
NCBI Pathogen Detection metadata for 200 Klebsiella isolates
list_PDGs()
List available PDG accessions for an organism
list_organisms()
List organisms available in the Pathogens database
make_SNP_tree_dest()
Make SNP tree download destination paths
make_SNPtree_urls()
generate ftp site download urls for all SNP trees containing the provided isolates
make_dest_paths()
Make download destination paths
make_download_urls()
make specific ftp download paths for a dataframe with ftp_paths and assembly accessions
make_ftp_paths()
generate ftp site paths for a selection of assembly accessions
mark_outliers()
Mark outliers from a distance matrix
pick_derep_sets()
Pick multiple de-replication sets from a pangenome, uses furrr::future_map Be sure to set your 'plan()'!!!
remove_strict_core()
Removes genes present in all genomes from pangenome presence/absence matrix