Run information

bio has support to automatically query your data for additional information at sra.

While Not all GenBank records are properly cross referenced, but for those that are bio can get you the SRA inforamation right away. Here is how it works. Get a strain of Ebola sequenced in 2014

bio fetch KM233118 --rename ebola 

First check to see if the record is being properly cross referenced:

bio runinfo ebola
ebola   BioProject  PRJNA257197
ebola   BioSample   SAMN02952049

We can see that the data has both a BioProject and a BioSample associated with it. It means we may obtained a more the detailed information on the sequencing data that produced the information:

bio runinfo ebola --sample | head -38
[
    {
        "Run": "SRR1553609",
        "ReleaseDate": "2014-08-19 11:41:53",
        "LoadDate": "2014-08-19 11:18:49",
        "spots": "464802",
        "bases": "93890004",
        "spots_with_mates": "464802",
        "avgLength": "202",
        "size_MB": "51",
        "download_path": "https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos1/sra-pub-run-5/SRR1553609/SRR1553609.1",
        "Experiment": "SRX674271",
        "LibraryName": "NM042.3.FCH9",
        "LibraryStrategy": "RNA-Seq",
        "LibrarySelection": "cDNA",
        "LibrarySource": "TRANSCRIPTOMIC",
        "LibraryLayout": "PAIRED",
        "InsertSize": "0",
        "InsertDev": "0",
        "Platform": "ILLUMINA",
        "Model": "Illumina HiSeq 2500",
        "SRAStudy": "SRP045416",
        "BioProject": "PRJNA257197",
        "Study_Pubmed_id": "2",
        "ProjectID": "257197",
        "Sample": "SRS677968",
        "BioSample": "SAMN02952049",
        "SampleType": "simple",
        "TaxID": "186538",
        "ScientificName": "Zaire ebolavirus",
        "SampleName": "NM042.3",
        "Tumor": "no",
        "CenterName": "BI",
        "Submission": "SRA178666",
        "Consent": "public",
        "RunHash": "9D6BFED60C2E1DAB6CC06BE718DDA1C0",
        "ReadHash": "B783CD0B858C0BED5FF3BC7319CAFF19"
    },

We can also obtain the full run information for the entire project (we are limiting the results to make the query speedier):

bio runinfo ebola --project --limit 10 | head -10
[
    {
        "Run": "SRR1972967",
        "ReleaseDate": "2015-04-14 13:52:35",
        "LoadDate": "2015-04-14 13:47:37",
        "spots": "1013114",
        "bases": "204649028",
        "spots_with_mates": "1013114",
        "avgLength": "202",
        "size_MB": "114",

You can also produce the output in a tab delimited format:

bio runinfo ebola --project --table --limit 10 | cut -f 1,5,8,12,15,19,29 | head
Run bases   size_MB LibraryStrategy LibraryLayout   Model   SampleName
SRR1972969  123767824   66  RNA-Seq PAIRED  Illumina HiSeq 2500 G6089.1
SRR1972970  103654482   56  RNA-Seq PAIRED  Illumina HiSeq 2500 G6091.1
SRR1972967  204649028   114 RNA-Seq PAIRED  Illumina HiSeq 2500 G6062.1
SRR1972972  92415808    52  RNA-Seq PAIRED  Illumina HiSeq 2500 G6103.1
SRR1972971  102959602   59  RNA-Seq PAIRED  Illumina HiSeq 2500 G6095.1
SRR1972968  168524560   94  RNA-Seq PAIRED  Illumina HiSeq 2500 G6069.1
SRR1972973  82751118    45  RNA-Seq PAIRED  Illumina HiSeq 2500 G6104.1
SRR1972976  1685747974  997 RNA-Seq PAIRED  Illumina HiSeq 2500 W220.0
SRR1972975  546366570   292 RNA-Seq PAIRED  Illumina HiSeq 2500 W219.0