Function reference
-
build_ppanggolin_file_fastas()
- Build a ppanggolin file from fastas
-
calculate_novelty()
- Calculate genome 'novelty' from a list of selection sets returned by pick_derep_sets
-
cluster_genomes()
- Cluster genomes at 4 levels from a pangenome gene PA matrix Needs parallelDist, igraph, genomes as rows and genes as columns?
-
country_vector
- A named vector of country names
-
download_PDD_metadata()
- Download Pathogen Detection metadata for a given organism
-
download_SNP_trees()
- Download SNP trees from a dataframe containing SNP_tree_urls and dests
-
download_gbk_assembly_summary()
- Convenience function to download the assembly_summary.txt file from genbank
-
download_genomes()
- Download a set of specified genome files
-
download_most_recent_complete()
- Download the most recent complete metadata for a specified organism
-
download_reference_genomes()
- Download reference genomes
-
example_pan_dist
- A distance matrix describing the jaccard distance between genomes in the provided example pangenome
-
example_pangenome_matrix
- A gene presence absence matrix for a synthetic pangenome
-
extract_collection_agency()
- extract collecting agency from several metadata fields
-
extract_consensus_ag_species()
- Extract a consensus ag host species from metadata This is pretty crude and should not be relied on for 100% accuracy
-
extract_country()
- Extract a standardized country name from geo_loc_tag column
-
extract_earliest_year()
- return a Year column containing the earliest year from the available 'date' fields.
-
extract_state()
- Extract the state name from the geo_loc_name column, this only makes sence for USA isolates
-
get_PDG_version()
- get the version of the PDD metadata
-
get_organism_table()
- Load the organism summary table
-
get_pangenome_representatives()
- Select a minimum set of genomes that best represent the gene content of the pangenome
-
get_pangenome_representatives_jaccard()
- Get a subset of a pangenome that tries to represent the gene presence absence diversity Selects a random genome, then selects the most distant genome from the selected genome then
-
klebsiella_example_dat
- NCBI Pathogen Detection metadata for 200 Klebsiella isolates
-
list_PDGs()
- List available PDG accessions for an organism
-
list_organisms()
- List organisms available in the Pathogens database
-
make_SNP_tree_dest()
- Make SNP tree download destination paths
-
make_SNPtree_urls()
- generate ftp site download urls for all SNP trees containing the provided isolates
-
make_dest_paths()
- Make download destination paths
-
make_download_urls()
- make specific ftp download paths for a dataframe with ftp_paths and assembly accessions
-
make_ftp_paths()
- generate ftp site paths for a selection of assembly accessions
-
mark_outliers()
- Mark outliers from a distance matrix
-
pick_derep_sets()
- Pick multiple de-replication sets from a pangenome, uses furrr::future_map Be sure to set your 'plan()'!!!
-
remove_strict_core()
- Removes genes present in all genomes from pangenome presence/absence matrix