Function reference
-
build_ppanggolin_file_fastas() - Build a ppanggolin file from fastas
-
calculate_novelty() - Calculate genome 'novelty' from a list of selection sets returned by pick_derep_sets
-
cluster_genomes() - Cluster genomes at 4 levels from a pangenome gene PA matrix Needs parallelDist, igraph, genomes as rows and genes as columns?
-
country_vector - A named vector of country names
-
download_PDD_metadata() - Download Pathogen Detection metadata for a given organism
-
download_SNP_trees() - Download SNP trees from a dataframe containing SNP_tree_urls and dests
-
download_gbk_assembly_summary() - Convenience function to download the assembly_summary.txt file from genbank
-
download_genomes() - Download a set of specified genome files
-
download_most_recent_complete() - Download the most recent complete metadata for a specified organism
-
download_reference_genomes() - Download reference genomes
-
example_pan_dist - A distance matrix describing the jaccard distance between genomes in the provided example pangenome
-
example_pangenome_matrix - A gene presence absence matrix for a synthetic pangenome
-
extract_collection_agency() - extract collecting agency from several metadata fields
-
extract_consensus_ag_species() - Extract a consensus ag host species from metadata This is pretty crude and should not be relied on for 100% accuracy
-
extract_country() - Extract a standardized country name from geo_loc_tag column
-
extract_earliest_year() - return a Year column containing the earliest year from the available 'date' fields.
-
extract_state() - Extract the state name from the geo_loc_name column, this only makes sence for USA isolates
-
get_PDG_version() - get the version of the PDD metadata
-
get_organism_table() - Load the organism summary table
-
get_pangenome_representatives() - Select a minimum set of genomes that best represent the gene content of the pangenome
-
get_pangenome_representatives_jaccard() - Get a subset of a pangenome that tries to represent the gene presence absence diversity Selects a random genome, then selects the most distant genome from the selected genome then
-
klebsiella_example_dat - NCBI Pathogen Detection metadata for 200 Klebsiella isolates
-
list_PDGs() - List available PDG accessions for an organism
-
list_organisms() - List organisms available in the Pathogens database
-
make_SNP_tree_dest() - Make SNP tree download destination paths
-
make_SNPtree_urls() - generate ftp site download urls for all SNP trees containing the provided isolates
-
make_dest_paths() - Make download destination paths
-
make_download_urls() - make specific ftp download paths for a dataframe with ftp_paths and assembly accessions
-
make_ftp_paths() - generate ftp site paths for a selection of assembly accessions
-
mark_outliers() - Mark outliers from a distance matrix
-
pick_derep_sets() - Pick multiple de-replication sets from a pangenome, uses furrr::future_map Be sure to set your 'plan()'!!!
-
remove_strict_core() - Removes genes present in all genomes from pangenome presence/absence matrix