Sleipnir
|
Organizes a collection of unique genes representing a background or maximum gene set for some situation. More...
#include <genome.h>
Public Member Functions | |
bool | Open (std::istream &istmFeatures) |
Construct a new genome by loading the SGD features file. | |
bool | Open (const std::vector< std::string > &vecstrGenes) |
Constructs a new genome containing the given gene IDs. | |
bool | Open (const char *szFile, std::vector< CGenes * > &vecpGenes) |
bool | Open (std::istream &istmGenes, std::vector< CGenes * > &vecpGenes) |
CGene & | AddGene (const std::string &strID) |
Adds a new gene with the given primary ID to the genome. | |
size_t | FindGene (const std::string &strGene) const |
Return the index of a gene within the genome, or -1 if it does not exist. | |
std::vector< std::string > | GetGeneNames () const |
Return a vector of all primary gene IDs in the genome. | |
size_t | CountGenes (const IOntology *pOntology) const |
Returns the number of genes in the genome with annotations in the given ontology. | |
bool | AddSynonym (CGene &Gene, const std::string &strName) |
Explicitly add a gene synonym to the gene and to the genome's name map. | |
CGene & | GetGene (size_t iGene) const |
Return the gene at the requested index within the genome. | |
size_t | GetGene (const std::string &strGene) const |
Return the index of the gene with the given name, or -1 if one cannot be found. | |
size_t | GetGenes () const |
Return the number of genes in the genome. |
Organizes a collection of unique genes representing a background or maximum gene set for some situation.
Ideally, a genome represents a collection of all known genes for some organism, each with a single unique identifier and some number of non-overlapping synonyms. In practice, this doesn't happen: a genome often represents a background or comprehensive gene set for some situation (e.g. functional enrichment), or the total set of genes in some data file or analysis (e.g. a functional catalog). CGenome will do its best to disambiguate overlapping gene names, but it boils down to a simple one-to-one map, which will not deal with ambiguous synonyms accurately. For best results, guarantee that each gene has a unique primary identifier that does not overlap with any synonyms, and look up genes using only those identifiers.
CGene & Sleipnir::CGenome::AddGene | ( | const std::string & | strID | ) |
Adds a new gene with the given primary ID to the genome.
strID | Gene ID to be added to the genome. |
Given a gene name, AddGene will first test to see if any gene in the genome has that ID or synonym; if so, a reference to the existing gene is returned. Otherwise, an empty gene with the given primary ID is created, and a reference to this new gene is returned.
Definition at line 368 of file genome.cpp.
References GetGene().
Referenced by Sleipnir::CGenes::AddGene(), Sleipnir::COrthology::Open(), Open(), Sleipnir::CGenes::Open(), and Sleipnir::CGenes::OpenWeighted().
bool Sleipnir::CGenome::AddSynonym | ( | CGene & | Gene, |
const std::string & | strName | ||
) |
Explicitly add a gene synonym to the gene and to the genome's name map.
Gene | Gene to which synonym is to be added. |
strName | Synonym to be added to the given gene. |
Definition at line 474 of file genome.cpp.
References Sleipnir::CGene::AddSynonym(), and Sleipnir::CGene::GetName().
Referenced by Open().
size_t Sleipnir::CGenome::CountGenes | ( | const IOntology * | pOntology | ) | const |
Returns the number of genes in the genome with annotations in the given ontology.
pOntology | Ontology to be scanned for annotated genes. |
Definition at line 444 of file genome.cpp.
size_t Sleipnir::CGenome::FindGene | ( | const std::string & | strGene | ) | const |
Return the index of a gene within the genome, or -1 if it does not exist.
strGene | Name of gene to be retrieved from the genome. |
Search the genome's gene list for a gene with the given name, primary or synonymous, and return its index if found.
Definition at line 401 of file genome.cpp.
References GetGene(), GetGenes(), Sleipnir::CGene::GetSynonym(), and Sleipnir::CGene::GetSynonyms().
Referenced by Sleipnir::CGenes::Open(), and Sleipnir::CGenes::OpenWeighted().
CGene& Sleipnir::CGenome::GetGene | ( | size_t | iGene | ) | const [inline] |
Return the gene at the requested index within the genome.
iGene | Index of gene to retrieve. |
Definition at line 327 of file genome.h.
Referenced by AddGene(), FindGene(), Sleipnir::CGenes::Open(), Sleipnir::CGenes::OpenWeighted(), Sleipnir::CDat::SaveDOT(), and Sleipnir::CDat::SaveMATISSE().
size_t Sleipnir::CGenome::GetGene | ( | const std::string & | strGene | ) | const [inline] |
Return the index of the gene with the given name, or -1 if one cannot be found.
strGene | Name of gene whose index should be retrieved. |
vector< string > Sleipnir::CGenome::GetGeneNames | ( | ) | const |
Return a vector of all primary gene IDs in the genome.
Definition at line 421 of file genome.cpp.
Referenced by Sleipnir::CDat::Open().
size_t Sleipnir::CGenome::GetGenes | ( | ) | const [inline] |
Return the number of genes in the genome.
Definition at line 358 of file genome.h.
Referenced by FindGene().
bool Sleipnir::CGenome::Open | ( | std::istream & | istmFeatures | ) |
Construct a new genome by loading the SGD features file.
istmFeatures | Stream containing the SGD features information. |
Loads a (presumably yeast) genome from a file formatted as per the SGD features file (SGD_features.tab). This includes gene IDs, synonyms, glosses, and RNA and dubious tags.
Definition at line 252 of file genome.cpp.
References AddGene(), AddSynonym(), Sleipnir::CGene::SetDubious(), Sleipnir::CGene::SetGloss(), Sleipnir::CGene::SetRNA(), and Sleipnir::CMeta::Tokenize().
bool Sleipnir::CGenome::Open | ( | const std::vector< std::string > & | vecstrGenes | ) |
Constructs a new genome containing the given gene IDs.
vecstrGenes | Vector of gene IDs to add to the new genome. |
Definition at line 309 of file genome.cpp.
References AddGene().