Sleipnir
Public Member Functions
Sleipnir::CGenome Class Reference

Organizes a collection of unique genes representing a background or maximum gene set for some situation. More...

#include <genome.h>

Inheritance diagram for Sleipnir::CGenome:
Sleipnir::CGenomeImpl

Public Member Functions

bool Open (std::istream &istmFeatures)
 Construct a new genome by loading the SGD features file.
bool Open (const std::vector< std::string > &vecstrGenes)
 Constructs a new genome containing the given gene IDs.
bool Open (const char *szFile, std::vector< CGenes * > &vecpGenes)
bool Open (std::istream &istmGenes, std::vector< CGenes * > &vecpGenes)
CGeneAddGene (const std::string &strID)
 Adds a new gene with the given primary ID to the genome.
size_t FindGene (const std::string &strGene) const
 Return the index of a gene within the genome, or -1 if it does not exist.
std::vector< std::string > GetGeneNames () const
 Return a vector of all primary gene IDs in the genome.
size_t CountGenes (const IOntology *pOntology) const
 Returns the number of genes in the genome with annotations in the given ontology.
bool AddSynonym (CGene &Gene, const std::string &strName)
 Explicitly add a gene synonym to the gene and to the genome's name map.
CGeneGetGene (size_t iGene) const
 Return the gene at the requested index within the genome.
size_t GetGene (const std::string &strGene) const
 Return the index of the gene with the given name, or -1 if one cannot be found.
size_t GetGenes () const
 Return the number of genes in the genome.

Detailed Description

Organizes a collection of unique genes representing a background or maximum gene set for some situation.

Ideally, a genome represents a collection of all known genes for some organism, each with a single unique identifier and some number of non-overlapping synonyms. In practice, this doesn't happen: a genome often represents a background or comprehensive gene set for some situation (e.g. functional enrichment), or the total set of genes in some data file or analysis (e.g. a functional catalog). CGenome will do its best to disambiguate overlapping gene names, but it boils down to a simple one-to-one map, which will not deal with ambiguous synonyms accurately. For best results, guarantee that each gene has a unique primary identifier that does not overlap with any synonyms, and look up genes using only those identifiers.

Remarks:
CGenome uses an internal map to look up genes given a name. Both primary identifiers and synonyms are placed in this map, which allows synonyms to be looked up rapidly, but at the price of potentially screwing up in any case with overlapping names. It's a tough problem; Sleipnir's solution is, as usual, to favor efficiency at the expense of restricting input format (i.e. find some way to uniquely identify your genes).
See also:
CGene | CGenes

Definition at line 302 of file genome.h.


Member Function Documentation

CGene & Sleipnir::CGenome::AddGene ( const std::string &  strID)

Adds a new gene with the given primary ID to the genome.

Parameters:
strIDGene ID to be added to the genome.
Returns:
A reference to the newly added gene, or to an existing gene with the given name.

Given a gene name, AddGene will first test to see if any gene in the genome has that ID or synonym; if so, a reference to the existing gene is returned. Otherwise, an empty gene with the given primary ID is created, and a reference to this new gene is returned.

Remarks:
A newly created gene will have no information beyond the provided primary ID.
See also:
FindGene

Definition at line 368 of file genome.cpp.

References GetGene().

Referenced by Sleipnir::CGenes::AddGene(), Sleipnir::COrthology::Open(), Open(), Sleipnir::CGenes::Open(), and Sleipnir::CGenes::OpenWeighted().

bool Sleipnir::CGenome::AddSynonym ( CGene Gene,
const std::string &  strName 
)

Explicitly add a gene synonym to the gene and to the genome's name map.

Parameters:
GeneGene to which synonym is to be added.
strNameSynonym to be added to the given gene.
Returns:
True if the synonym was added successfully.
Remarks:
Addition will fail if the synonym is the given gene's primary ID or an existing synonym.
See also:
CGene::AddSynonym

Definition at line 474 of file genome.cpp.

References Sleipnir::CGene::AddSynonym(), and Sleipnir::CGene::GetName().

Referenced by Open().

size_t Sleipnir::CGenome::CountGenes ( const IOntology pOntology) const

Returns the number of genes in the genome with annotations in the given ontology.

Parameters:
pOntologyOntology to be scanned for annotated genes.
Returns:
Number of genes in the genome with annotations in the given ontology.
See also:
CGene::GetOntology

Definition at line 444 of file genome.cpp.

size_t Sleipnir::CGenome::FindGene ( const std::string &  strGene) const

Return the index of a gene within the genome, or -1 if it does not exist.

Parameters:
strGeneName of gene to be retrieved from the genome.
Returns:
Index of the requested gene, or -1 if it does not exist.

Search the genome's gene list for a gene with the given name, primary or synonymous, and return its index if found.

Remarks:
Both the genome's internal name map and the synonyms of every gene are explicitly searched; the latter can be very slow, and the internal map will not always contain synonyms (depending on how the genome was constructed).
See also:
AddGene

Definition at line 401 of file genome.cpp.

References GetGene(), GetGenes(), Sleipnir::CGene::GetSynonym(), and Sleipnir::CGene::GetSynonyms().

Referenced by Sleipnir::CGenes::Open(), and Sleipnir::CGenes::OpenWeighted().

CGene& Sleipnir::CGenome::GetGene ( size_t  iGene) const [inline]

Return the gene at the requested index within the genome.

Parameters:
iGeneIndex of gene to retrieve.
Returns:
Gene at the requested index.
Remarks:
For efficiency, no bounds checking is performed. The given value must be smaller than GetGenes.

Definition at line 327 of file genome.h.

Referenced by AddGene(), FindGene(), Sleipnir::CGenes::Open(), Sleipnir::CGenes::OpenWeighted(), Sleipnir::CDat::SaveDOT(), and Sleipnir::CDat::SaveMATISSE().

size_t Sleipnir::CGenome::GetGene ( const std::string &  strGene) const [inline]

Return the index of the gene with the given name, or -1 if one cannot be found.

Parameters:
strGeneName of gene whose index should be retrieved.
Returns:
Index of gene with the given name; -1 if one cannot be found.
Remarks:
This is sensitive to the same name mapping issues discussed in AddSynonym. For maximum safety, refer to genes only by their unique primary identifiers.

Definition at line 345 of file genome.h.

vector< string > Sleipnir::CGenome::GetGeneNames ( ) const

Return a vector of all primary gene IDs in the genome.

Returns:
Vector of all primary gene IDs in the genome.

Definition at line 421 of file genome.cpp.

Referenced by Sleipnir::CDat::Open().

size_t Sleipnir::CGenome::GetGenes ( ) const [inline]

Return the number of genes in the genome.

Returns:
Number of genes in the genome.

Definition at line 358 of file genome.h.

Referenced by FindGene().

bool Sleipnir::CGenome::Open ( std::istream &  istmFeatures)

Construct a new genome by loading the SGD features file.

Parameters:
istmFeaturesStream containing the SGD features information.
Returns:
True if the genome was loaded successfully.

Loads a (presumably yeast) genome from a file formatted as per the SGD features file (SGD_features.tab). This includes gene IDs, synonyms, glosses, and RNA and dubious tags.

Definition at line 252 of file genome.cpp.

References AddGene(), AddSynonym(), Sleipnir::CGene::SetDubious(), Sleipnir::CGene::SetGloss(), Sleipnir::CGene::SetRNA(), and Sleipnir::CMeta::Tokenize().

bool Sleipnir::CGenome::Open ( const std::vector< std::string > &  vecstrGenes)

Constructs a new genome containing the given gene IDs.

Parameters:
vecstrGenesVector of gene IDs to add to the new genome.
Returns:
True if the genome was created successfully.
Remarks:
Genes in the new genome will have no information beyond the provided primary IDs, which should (as usual) be unique.

Definition at line 309 of file genome.cpp.

References AddGene().


The documentation for this class was generated from the following files: