Center for Plant Molecular Biology

aGBSQL -- this module reads GenBank files from disk and extracts upstream regulatory sequences. These sequences are saved internally as FastaRecord objects (see <link fasta.html>fasta) and can be saved to file in FASTA format.

Class gbSQL(_files = [], _start = -1000, _end = 0, _root = 0, _mRNAannot = 0, _overlap = 1)

    gb_files
      this will hold the list of GB files
    extracted
      this will hold the list of FASTA seqs as FastaRecord
    start
      start position (5')
    end
      start position (3')
    overlap
      if overlap should be allowed with upstream genes (1=yes, 0=no)
    mRNAannot
      0 = uses the gene annotation (many more files) or 1 = mRNA annotation (not available for all files) for choosing where the 5' of the transcript is. Default is set to gene.
    root
      This is the root where the program should begin to take upstream sequence from.
      0 = ATG only, 1 = TSS only, 2 = TSS if possible, otherwise take the ATG

    checkvalues(self)

      this is called to check to see if the values are appropiate before running. if there is an error, an error message is sent and the script terminated.

    clear(self)

      clears data in self.extracted and self.dict

    getsubset(locustags = [])

      this will return a subset of genes that match the FastaRecord.shortname string

    savefastatofile(_filehandle)

      this will save a fasta list to a text file. It works by calling an instance of Fasta.write_to_file()

    savedicttofile(_filehandle)

      this will save the dict entries to a text file.

    getsubset(_locustags = [])

      this will return a subset of genes from the dictionary object (only)

    getpromoters_gene(self):

      this is the main function to call for extracting promoters, use when only the gene tag is present in the GenBank file; global variables are used for toggle parameters

    getpromoters_locustag(self):

      this is the main function to call for extracting promoters, use when the locus tag is present in the GenBank file; global variables are used for toggle parameters