Run information
bio
has support to automatically query your data for additional information at sra.
While Not all GenBank records are properly cross referenced, but for those that are bio
can get you the SRA inforamation right away. Here is how it works. Get a strain of Ebola sequenced in 2014
bio fetch KM233118 --rename ebola
First check to see if the record is being properly cross referenced:
bio runinfo ebola
ebola BioProject PRJNA257197
ebola BioSample SAMN02952049
We can see that the data has both a BioProject and a BioSample associated with it. It means we may obtained a more the detailed information on the sequencing data that produced the information:
bio runinfo ebola --sample | head -38
[
{
"Run": "SRR1553609",
"ReleaseDate": "2014-08-19 11:41:53",
"LoadDate": "2014-08-19 11:18:49",
"spots": "464802",
"bases": "93890004",
"spots_with_mates": "464802",
"avgLength": "202",
"size_MB": "51",
"download_path": "https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos1/sra-pub-run-5/SRR1553609/SRR1553609.1",
"Experiment": "SRX674271",
"LibraryName": "NM042.3.FCH9",
"LibraryStrategy": "RNA-Seq",
"LibrarySelection": "cDNA",
"LibrarySource": "TRANSCRIPTOMIC",
"LibraryLayout": "PAIRED",
"InsertSize": "0",
"InsertDev": "0",
"Platform": "ILLUMINA",
"Model": "Illumina HiSeq 2500",
"SRAStudy": "SRP045416",
"BioProject": "PRJNA257197",
"Study_Pubmed_id": "2",
"ProjectID": "257197",
"Sample": "SRS677968",
"BioSample": "SAMN02952049",
"SampleType": "simple",
"TaxID": "186538",
"ScientificName": "Zaire ebolavirus",
"SampleName": "NM042.3",
"Tumor": "no",
"CenterName": "BI",
"Submission": "SRA178666",
"Consent": "public",
"RunHash": "9D6BFED60C2E1DAB6CC06BE718DDA1C0",
"ReadHash": "B783CD0B858C0BED5FF3BC7319CAFF19"
},
We can also obtain the full run information for the entire project (we are limiting the results to make the query speedier):
bio runinfo ebola --project --limit 10 | head -10
[
{
"Run": "SRR1972967",
"ReleaseDate": "2015-04-14 13:52:35",
"LoadDate": "2015-04-14 13:47:37",
"spots": "1013114",
"bases": "204649028",
"spots_with_mates": "1013114",
"avgLength": "202",
"size_MB": "114",
You can also produce the output in a tab delimited format:
bio runinfo ebola --project --table --limit 10 | cut -f 1,5,8,12,15,19,29 | head
Run bases size_MB LibraryStrategy LibraryLayout Model SampleName
SRR1972969 123767824 66 RNA-Seq PAIRED Illumina HiSeq 2500 G6089.1
SRR1972970 103654482 56 RNA-Seq PAIRED Illumina HiSeq 2500 G6091.1
SRR1972967 204649028 114 RNA-Seq PAIRED Illumina HiSeq 2500 G6062.1
SRR1972972 92415808 52 RNA-Seq PAIRED Illumina HiSeq 2500 G6103.1
SRR1972971 102959602 59 RNA-Seq PAIRED Illumina HiSeq 2500 G6095.1
SRR1972968 168524560 94 RNA-Seq PAIRED Illumina HiSeq 2500 G6069.1
SRR1972973 82751118 45 RNA-Seq PAIRED Illumina HiSeq 2500 G6104.1
SRR1972976 1685747974 997 RNA-Seq PAIRED Illumina HiSeq 2500 W220.0
SRR1972975 546366570 292 RNA-Seq PAIRED Illumina HiSeq 2500 W219.0