mapping -- this module contains a class called MapFasta that makes all the necessary instances for mapping CREs to sequences in FASTA format or those in MotifMapper <link fasta.html>FastaRecord objects.
Class MapFasta(_files = [], _outputMode = 0, _fileScope = 0, _smartmotif = 0)
MapFasta makes instances of all the other classes within the Motif Mapper package. If you want to just use Motif Mapper for mapping CREs in a list of files with FASTA, this is the only class you need.
- Global Variables
self.files = _files #holds file paths self.motifsRX = {} #holds motifs to be mapped self.outputMode = _outputMode #holds output type self.fileScope = _fileScope #holds the option for mapping, only per seq1 or all2, or both0 self.hits_local = {} #holds hits for each sequence :SEQ->(len,MOTIFs->HITS) self.hits_global = {} #holds hits for the whole file MOTIF:->HITS self.M = motifs.Motifs() #make a Motifs class instance self.F_loader = '' ##fasta.FASTA_dict() #reserve for FASTA_dict class instance self.Fiter = '' ##fasta.Iterator() #reserve for FASTA.Iterator class instance self.Fld = folders.Folders() #make a Folders class instance self.Fdict = {} #holds a dictonary of FastaRecords, use for mapObjs only! self.localhits_dict = {} #holds match TUPLES as dictionary, use with mapObjs only! self.HandleMTBS = None #holds the open handle for the writing output self.HandleMMTB = None self.fileAppend = '' #holds a string for appending to an anaylsis from a map() call self.pointMapData = {} #holds whatever self.pointMaps() can make self.pointCurveData = {} #holds whatever self.pointCurves() can make
outputMode
- 0 = memory (in self.hits_local and self.hits_global) no file handles opened
1 = to disk directly (only for MTBS)
2 = process all in memory, when finished dump to disk
3 = process in memory (as outputMode 0), openfile handles but does not save to disk
fileScope
- 0 = return per sequence and file whole
1 = only per seq
2 = only file whole should be called for each file
Fdict
- should be used only if the user want to map sequences from memory load sequences using addSEQtoFdict and call mapObjs(). You may save the results to a file, or keep them in memory, in which case they are save in localhits_tuple if you want to use a list of FastaRecord make the list first and use fasta.convert_to_dict
hits_local
- contains the TUPLE of SEQ:->MOTIF:tuple[HITS, matches] (MTBS)
hits_global
- contains the dictionary MOTIFS:->HITS (MMTB)
fileAppend
- holds a string used to append to a map() call
loadFiles(man=0, dir='')
- this calls the file path reading fuction of the Folders class. man = 0, all from a folder and requires a directory path, man = 1 enter files by hand
smartmotif
- is the smart motif toggle for MapFiles, on means automatic antisene and all combos for composite motifs
globalallfiles
- 0 means hits_global is global per file, otherwise, over all files and all sequences
checkMODES()
- this is used to check the user has not altered these variables into something usless.
loadMotifs(motifsHandle='', man=0, ls=[])
- motifs must have be a dictionary of the entered motif as key and the regular expression as value use the motifs.motifs_re to get the appropriate dictionary if manual = 1 then make sure you send in a list (the motifsHandle will be automatically closed when done here)
setOutput(_filename)
- this is called when setting the new output folders.
closeOutput()
- closes the two file handles
writeOutputMMTB()
- call this only when you want to write out the hits in the self.hits_global. this is normally called internally.
writeOutputMTBS()
- this will return a subset of genes that match the FastaRecord.shortname string. this is also normally called internally.
updateHitsGlobal()
- copys the keys from motifsRX and sets the values to 0. called internally.
mapFiles()
- this is the principle sub-routine for this class which calls/uses all other functions present. This is the one you use to map the motif onto the FASTA files. To use, call loadMotifs and loadFiles, then call mapFiles().
mapObjs(outputname='', savetofile=0)
- use for mapping sequences in memory only present in Fdict Fdict should be filled by calling addSEQtodict. FileScope is used to determine which output you need (MTBS or MMTB). OutputMode is not used. You can dump the results to a file by changing savetofile to 1, otherwise outputname is not relevant. FileAppend is still active and will append to the output name when saving to a file.
mapSEQ(seq)
- this will map motifs in self.motifsRX onto the sequence passed and save them in self.hits_local and/or self.hits_global when ignoring the global counts, one can use the hit match objects for other analyses {motifKEY:(totalhits, [match start postion, end postion])}
addSEQtoFdict(name, sequence, _alphabet=Alphabet.IUPAC.ambiguous_dna)
- this will add an FastaRecord entry to Fdict as a dictionary, to be called for mapObjs. The other option is to make a dictionary of sequences with <link fasta.html>fasta.covert_to_dict. To Map from files, use mapFiles.
pointMapsFromMemory(_length)
- this should map PromoterPointMaps, that is we take the counts in their beginning 5' position and save them to a list by increasing the value for 1 for each position. The user defines a given length that all the sequences should have; those that do not have this length are not considered. To save the data to disk call savePointMaps() This processes any data in self.hits_local to make Maps
pointCurvesFromMemory(_length)
- this should map PromoterPointCurves, that is we take the counts in their beginning 5' position and save them to a list by increasing the value at each position for the length of the motif. The user defines a given length that all the sequences should have; those that do not have this length are not considered. To save the data to disk call savePointCurves() This processes any data in self.hits_local to make Maps
pointMaps(_length)
- same as pointMapsFromMemory but reads FASTA files.
pointCurves(_length)
- same as pointCurvesFromMemory but reads FASTA files.
statsOfPointMaps(_length, append='')
- um, just a little thing that calls PointCurves and then saves the var() of the pointCurveData
requires that there are files(best just one) and motifs to be loaded
REQUIRES Numpy
statsOfPointCurves(_length, append='')
- um, just a little thing that calls PointCurves and then saves the var() of the pointCurveData
requires that there are files(best just one) and motifs to be loaded
REQUIRES Numpy
savePointMaps(style=0,append='')
- saves the PointMap data to a file, you may add an appending name to file
style can be 0 (individual files) or 1 (one file) only
savePointCurves(_style=0,append='')
- saves the PointMap data to a file, you may add an appending name to file
style can be 0 (individual files) or 1 (one file) only