Sleipnir
Public Member Functions | Static Public Member Functions
Sleipnir::CGenes Class Reference

Represents a simple set of unique genes. More...

#include <genome.h>

Inheritance diagram for Sleipnir::CGenes:
Sleipnir::CGenesImpl

Public Member Functions

 CGenes (CGenome &Genome)
 Construct a new gene set containing genomes drawn from the given underlying genome.
bool Open (std::istream &istm, bool fCreate=true)
 Construct a new gene set by loading genes from the given text stream, one per line.
bool Open (const std::vector< std::string > &vecstrGenes, bool fCreate=true)
 Construct a new gene set containing the given gene IDs.
bool OpenWeighted (std::istream &istm, bool fCreate=true)
 Construct a new weighted gene set by loading genes from the given text stream, one per line.
void Filter (const CGenes &GenesExclude)
 Remove the given genes from the gene set.
size_t CountAnnotations (const IOntology *pOntology, size_t iTerm, bool fRecursive=true, const CGenes *pBackground=NULL) const
 Return the number of genes in the set annotated at or, optionally, below the given ontology term.
std::vector< std::string > GetGeneNames () const
 Return the primary identifiers of all genes in the set.
bool Open (const char *szFile, bool fCreate=true)
 Construct a new gene set by loading genes from the given text file, one per line.
bool OpenWeighted (const char *szFile, bool fCreate=true)
 Construct a new weighted gene set by loading genes from the given text stream, one per line.
size_t GetGenes () const
 Return the number of genes in the set.
bool IsGene (const std::string &strGene) const
 Return true if the given name is a primary identifier of a gene in the set.
bool IsWeighted () const
 Determine whether genes are weighted.
CGenomeGetGenome () const
 Return the gene set's underlying genome.
const CGeneGetGene (size_t iGene) const
 Return the gene at the requested index.
const float GetGeneWeight (size_t iGene) const
 Return weight of the gene at the requested index.
size_t GetGene (const std::string &strGene) const
 Return the index of the gene with the given primary identifier, or -1 if none exists.
bool AddGene (const std::string &strGene)
 Adds a new gene with the given ID to the gene set.

Static Public Member Functions

static bool Open (const char *szFile, CGenome &Genome, std::vector< std::string > &vecstrNames, std::vector< CGenes * > &vecpGenes)
 Simultaneously construct multiple new gene sets loaded from the given file, one per line, with tab-delimited genes.

Detailed Description

Represents a simple set of unique genes.

Remarks:
Genes are represented by index only and not explicitly checked for uniqueness, so most of the naming issues of CGenome are avoided. Gene comparisons generally assume a constant gene pool drawn from the base CGenome and are thus performed by pointer comparisons for efficiency; in other words, don't expect two different CGene objects with the same primary ID to behave correctly.

Definition at line 373 of file genome.h.


Constructor & Destructor Documentation

Construct a new gene set containing genomes drawn from the given underlying genome.

Parameters:
GenomeGenome containing all genes which might become members of this gene set.

Definition at line 548 of file genome.cpp.

Referenced by Open().


Member Function Documentation

bool Sleipnir::CGenes::AddGene ( const std::string &  strGene) [inline]

Adds a new gene with the given ID to the gene set.

Parameters:
strGeneGene ID to be added.
Returns:
True if gene was added, false if it was already included.
See also:
Open | GetGene

Definition at line 565 of file genome.h.

References Sleipnir::CGenome::AddGene(), and GetGene().

size_t Sleipnir::CGenes::CountAnnotations ( const IOntology pOntology,
size_t  iTerm,
bool  fRecursive = true,
const CGenes pBackground = NULL 
) const

Return the number of genes in the set annotated at or, optionally, below the given ontology term.

Parameters:
pOntologyOntology in which annotations are counted.
iTermOntology term at or below which annotations are counted.
fRecursiveIf true, count annotations at or below the given term; otherwise, count only direct annotations to the term.
pBackgroundIf non-null, count only annotations for genes also contained in the given background set.
Returns:
Number of genes in the gene set annotated at or below the given ontology term.
See also:
IOntology::IsAnnotated

Definition at line 711 of file genome.cpp.

References Sleipnir::IOntology::IsAnnotated(), and IsGene().

void Sleipnir::CGenes::Filter ( const CGenes GenesExclude)

Remove the given genes from the gene set.

Parameters:
GenesExcludeGenes to be removed from the current gene set.
Remarks:
Comparisons are performed using pointers to CGene objects, so both gene sets should use the same underlying CGenome for proper behavior.

Definition at line 769 of file genome.cpp.

References GetGene(), and GetGenes().

const CGene& Sleipnir::CGenes::GetGene ( size_t  iGene) const [inline]

Return the gene at the requested index.

Parameters:
iGeneGene index to retrieve.
Returns:
Gene at the requested index.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes.

Definition at line 512 of file genome.h.

Referenced by AddGene(), Sleipnir::CPCL::Distance(), Filter(), Sleipnir::CDat::FilterGenes(), Sleipnir::CDat::Open(), and Sleipnir::CDatasetCompact::Open().

size_t Sleipnir::CGenes::GetGene ( const std::string &  strGene) const [inline]

Return the index of the gene with the given primary identifier, or -1 if none exists.

Parameters:
strGenePrimary gene identifier for which the set is searched.
Returns:
Index of the gene with the given primary identifier; -1 if none exists.
See also:
IsGene

Definition at line 546 of file genome.h.

vector< string > Sleipnir::CGenes::GetGeneNames ( ) const

Return the primary identifiers of all genes in the set.

Returns:
Vector of primary identifiers of all genes in the set.

Definition at line 789 of file genome.cpp.

Referenced by Sleipnir::CPCL::Distance().

size_t Sleipnir::CGenes::GetGenes ( ) const [inline]

Return the number of genes in the set.

Returns:
Number of genes in the set.

Definition at line 457 of file genome.h.

Referenced by Sleipnir::CPCL::Distance(), Filter(), Sleipnir::CDat::FilterGenes(), Sleipnir::CDataFilter::IsExample(), Sleipnir::CSVM::Learn(), Sleipnir::CDat::Open(), and Sleipnir::CDatasetCompact::Open().

const float Sleipnir::CGenes::GetGeneWeight ( size_t  iGene) const [inline]

Return weight of the gene at the requested index.

Parameters:
iGeneGene index to retrieve.
Returns:
Gene weight at the requested index. NULL if gene requested doesn't exist.
Remarks:
For efficiency, no bounds checking is performed. The given index must be smaller than GetGenes.

Definition at line 529 of file genome.h.

CGenome& Sleipnir::CGenes::GetGenome ( ) const [inline]

Return the gene set's underlying genome.

Returns:
Gene set's underlying genome.

Definition at line 495 of file genome.h.

Referenced by Sleipnir::CSVM::Learn().

bool Sleipnir::CGenes::IsGene ( const std::string &  strGene) const [inline]

Return true if the given name is a primary identifier of a gene in the set.

Parameters:
strGenePrimary gene identifier for which the set is searched.
Returns:
True if the set contains a gene with the given primary identifier.
See also:
GetGene

Definition at line 474 of file genome.h.

Referenced by Sleipnir::CDataFilter::Attach(), CountAnnotations(), Sleipnir::CDat::Open(), and Sleipnir::CDatasetCompact::Open().

bool Sleipnir::CGenes::IsWeighted ( ) const [inline]

Determine whether genes are weighted.

Returns:
Value at the requested location, or NaN if it does not exist or has been filtered.

Definition at line 485 of file genome.h.

bool Sleipnir::CGenes::Open ( const char *  szFile,
CGenome Genome,
std::vector< std::string > &  vecstrNames,
std::vector< CGenes * > &  vecpGenes 
) [static]

Simultaneously construct multiple new gene sets loaded from the given file, one per line, with tab-delimited genes.

Parameters:
szFileFile from which gene sets are loaded.
GenomeGenome containing all genes which might become members of these gene sets.
vecstrNamesHuman-readable identifiers for the loaded gene sets.
vecpGenesVector to which loaded gene sets are appended.
Returns:
True on success, false otherwise.

Opens multiple gene sets from the given tab-delimited text file. Each line should contain a single tab-delimited gene set, and the first token on each line should be a human-readable identifier for that line's gene set.

See also:
Open

Definition at line 507 of file genome.cpp.

References CGenes(), and Sleipnir::CMeta::Tokenize().

Referenced by Sleipnir::CPCL::Distance(), Sleipnir::CDat::FilterGenes(), Sleipnir::CDatasetCompact::FilterGenes(), and Open().

bool Sleipnir::CGenes::Open ( std::istream &  istm,
bool  fCreate = true 
)

Construct a new gene set by loading genes from the given text stream, one per line.

Parameters:
istmStream containing gene IDs to load, one per line.
fCreateIf true, add unknown genes to the underlying genome; otherwise, unknown gene IDs are ignored.
Returns:
True if gene set was constructed successfully.

Loads a text file of the form:

 GENE1
 GENE2
 GENE3

containing one primary gene identifier per line. If these gene identifiers are found in the gene set's underlying genome, CGene objects are loaded from there. Otherwise, if fCreate is true, new genes are created from the loaded IDs. If fCreate is false, unrecognized genes are skipped with a warning.

See also:
CGenome::AddGene

Definition at line 578 of file genome.cpp.

References Sleipnir::CGenome::AddGene(), Sleipnir::CGenome::FindGene(), Sleipnir::CGenome::GetGene(), and Sleipnir::CGene::GetName().

bool Sleipnir::CGenes::Open ( const std::vector< std::string > &  vecstrGenes,
bool  fCreate = true 
)

Construct a new gene set containing the given gene IDs.

Parameters:
vecstrGenesPrimary identifiers of genes in the new gene set.
fCreateIf true, add unknown genes to the underlying genome; otherwise, unknown gene IDs are ignored.
Returns:
True if gene set was constructed successfully.

If the given gene identifiers are found in the gene set's underlying genome, CGene objects are loaded from there. Otherwise, if fCreate is true, new genes are created from the loaded IDs. If fCreate is false, unrecognized genes are skipped with a warning.

See also:
CGenome::AddGene

Definition at line 742 of file genome.cpp.

References Sleipnir::CGenome::AddGene(), Sleipnir::CGenome::FindGene(), and Sleipnir::CGenome::GetGene().

bool Sleipnir::CGenes::Open ( const char *  szFile,
bool  fCreate = true 
) [inline]

Construct a new gene set by loading genes from the given text file, one per line.

Parameters:
szFileFile containing gene IDs to load, one per line.
fCreateIf true, add unknown genes to the underlying genome; otherwise, unknown gene IDs are ignored.
Returns:
True if gene set was constructed successfully.

Loads a text file of the form:

 GENE1
 GENE2
 GENE3

containing one primary gene identifier per line. If these gene identifiers are found in the gene set's underlying genome, CGene objects are loaded from there. Otherwise, if fCreate is true, new genes are created from the loaded IDs. If fCreate is false, unrecognized genes are skipped with a warning.

See also:
CGenome::AddGene

Definition at line 414 of file genome.h.

References Open().

bool Sleipnir::CGenes::OpenWeighted ( std::istream &  istm,
bool  fCreate = true 
)

Construct a new weighted gene set by loading genes from the given text stream, one per line.

Parameters:
istmStream containing gene IDs and corresponding weights to load, one per line.
fCreateIf true, add unknown genes to the underlying genome; otherwise, unknown gene IDs are ignored.
Returns:
True if gene set was constructed successfully.

Loads a text file of the form:

 GENE1 WEIGHT1
 GENE2 WEIGHT2
 GENE3 WEIGHT3

containing one primary gene identifier per line. If these gene identifiers are found in the gene set's underlying genome, CGene objects are loaded from there. Otherwise, if fCreate is true, new genes are created from the loaded IDs. If fCreate is false, unrecognized genes are skipped with a warning.

See also:
CGenome::AddGene

Definition at line 639 of file genome.cpp.

References Sleipnir::CGenome::AddGene(), Sleipnir::CGenome::FindGene(), Sleipnir::CGenome::GetGene(), Sleipnir::CGene::GetName(), and Sleipnir::CMeta::Tokenize().

Referenced by OpenWeighted().

bool Sleipnir::CGenes::OpenWeighted ( const char *  szFile,
bool  fCreate = true 
) [inline]

Construct a new weighted gene set by loading genes from the given text stream, one per line.

Parameters:
istmStream containing gene IDs and corresponding weights to load, one per line.
fCreateIf true, add unknown genes to the underlying genome; otherwise, unknown gene IDs are ignored.
Returns:
True if gene set was constructed successfully.

Loads a text file of the form:

 GENE1 WEIGHT1
 GENE2 WEIGHT2
 GENE3 WEIGHT3

containing one primary gene identifier per line. If these gene identifiers are found in the gene set's underlying genome, CGene objects are loaded from there. Otherwise, if fCreate is true, new genes are created from the loaded IDs. If fCreate is false, unrecognized genes are skipped with a warning.

See also:
CGenome::AddGene

Definition at line 445 of file genome.h.

References OpenWeighted().


The documentation for this class was generated from the following files: