aGBSQL -- this module reads GenBank files from disk and extracts upstream regulatory sequences. These sequences are saved internally as FastaRecord objects (see <link fasta.html>fasta) and can be saved to file in FASTA format.
Class gbSQL(_files = [], _start = -1000, _end = 0, _root = 0, _mRNAannot = 0, _overlap = 1)
- gb_files
- this will hold the list of GB files
- this will hold the list of FASTA seqs as FastaRecord
- start position (5')
- start position (3')
- if overlap should be allowed with upstream genes (1=yes, 0=no)
- 0 = uses the gene annotation (many more files) or 1 = mRNA annotation (not available for all files) for choosing where the 5' of the transcript is. Default is set to gene.
- This is the root where the program should begin to take upstream sequence from.
0 = ATG only, 1 = TSS only, 2 = TSS if possible, otherwise take the ATG
checkvalues(self)
- this is called to check to see if the values are appropiate before running. if there is an error, an error message is sent and the script terminated.
clear(self)
- clears data in self.extracted and self.dict
getsubset(locustags = [])
- this will return a subset of genes that match the FastaRecord.shortname string
savefastatofile(_filehandle)
- this will save a fasta list to a text file. It works by calling an instance of Fasta.write_to_file()
savedicttofile(_filehandle)
- this will save the dict entries to a text file.
getsubset(_locustags = [])
- this will return a subset of genes from the dictionary object (only)
getpromoters_gene(self):
- this is the main function to call for extracting promoters, use when only the gene tag is present in the GenBank file; global variables are used for toggle parameters
getpromoters_locustag(self):
- this is the main function to call for extracting promoters, use when the locus tag is present in the GenBank file; global variables are used for toggle parameters