Sleipnir
|
Combines a set of DB files generated from different Sleipnir::CDatabase's into one DB file.
Perhaps for space reason, it is sometime not feasible to generate a Sleipnir::CDatabase covering all datasets on one machine or one partition. Consequently, people generate separate Sleipnir::CDatabase's on different machines first, and then join them into one CDatabase instance with the help of DBCombiner. DBCombiner performs the joining on a per DB-file basis, so users still need to repeat the joining for all DB files in the database.
DBCombiner -i <genes.txt> -x <db_list.txt> -d <input_dir> -D <output_dir> [-s]
Combines the DB files listed in the db_list.txt
into one DB.
DBCombiner accepts DB files that are generated from different Sleipnir::CDatabase instances, as long as the same gene map was used. In order for DBCombiner to work, only DB files covering the same genes may be combined. This can be ensured by using only DB files with the same ID in the file name (see some sample lines in db_list.txt
below). The final joined DB will have datasets listed in the order defined by db_list.txt
.
The -s
option further splits the combined Sleipnir::CDatabaselet into one gene per DB
file. This -s
must be enabled for Seek coexpression integrations. (SeekMiner, SeekServer).
Sample lines from the genes.txt
file:
1 1 2 10 3 100 4 1000 5 10000 6 100008589
Sample lines from the db_list.txt
file:
/x/y/database1/00000004.db /x/y/database2/00000004.db /x/y/database3/00000004.db
Note that database1
, database2
, database3
are three Sleipnir::CDatabase's generated for different datasets.
Note how we use the same ID 00000004
to ensure that the DB files cover the same genes.
package "DBCombiner"
version "1.0"
purpose "Combines a list of DB files with the same gene content"
section "Mode"
option "combine" C "Combine a set of DB's, each coming from a different dataset subset"
flag off
option "reorganize" R "Reorganize a set of DB's, such as from 21000 DB files to 1000 DB files, ie expanding/shrinking the number of genes a DB contains"
flag off
section "Main"
option "input" i "Input gene mapping"
string typestr="filename" yes
section "Combine Mode"
option "db" x "Input a set of databaselet filenames (including path)"
string typestr="filename"
option "dir_out" D "Output database directory"
string typestr="directory" default="."
option "is_nibble" N "Whether the input DB is nibble type"
flag off
option "split" s "Split to one-gene per file"
flag off
section "Reorganize Mode"
option "dataset" A "Dataset-platform mapping file"
string typestr="filename"
option "db_dir" d "Source DB collection directory"
string typestr="directory"
option "src_db_num" n "Source DB number of files"
int
option "dest_db_num" b "Destination DB number of files"
int
option "dest_db_dir" B "Destination DB directory"
string typestr="directory"
Flag | Default | Type | Description |
---|---|---|---|
-i | None | Text file | Tab-delimited text file containing two columns, numerical gene IDs (one-based) and unique gene names (matching those in the input DAT/DAB files). |
-d | None | Directory | Input directory containing * .db files |
-D | None | Directory | Output directory in which database files will be stored. |
-x | None | Text file | Input file containing a list of Sleipnir::CDatabaselet's to combine |
-s | None | off | If enabled, split the combined Sleipnir::CDatabaselet to one gene per DB file |
DBCombiner -i <genes.txt> -x <db list> -d <input directory> -D <output_dir>
package "DBCombiner"
version "1.0"
purpose "Combines a list of DB files with the same gene content"
section "Mode"
option "combine" C "Combine a set of DB's, each coming from a different dataset subset"
flag off
option "reorganize" R "Reorganize a set of DB's, such as from 21000 DB files to 1000 DB files, ie expanding/shrinking the number of genes a DB contains"
flag off
section "Main"
option "input" i "Input gene mapping"
string typestr="filename" yes
section "Combine Mode"
option "db" x "Input a set of databaselet filenames (including path)"
string typestr="filename"
option "dir_out" D "Output database directory"
string typestr="directory" default="."
option "is_nibble" N "Whether the input DB is nibble type"
flag off
option "split" s "Split to one-gene per file"
flag off
section "Reorganize Mode"
option "dataset" A "Dataset-platform mapping file"
string typestr="filename"
option "db_dir" d "Source DB collection directory"
string typestr="directory"
option "src_db_num" n "Source DB number of files"
int
option "dest_db_num" b "Destination DB number of files"
int
option "dest_db_dir" B "Destination DB directory"
string typestr="directory"
Flag | Default | Type | Description |
---|---|---|---|
-i | stdin | Text file | Tab-delimited text file containing two columns, numerical gene IDs (one-based) and unique gene names (matching those in the input DAT/DAB files). |
-d | . | Directory | Input directory containing DB files |
-D | . | Directory | Output directory in which database files will be stored. |
-x | . | Text file | Input file containing list of CDatabaselets to combine |