Center for Plant Molecular Biology

mapping -- this module contains a class called MapFasta that makes all the necessary instances for mapping CREs to sequences in FASTA format or those in MotifMapper <link fasta.html>FastaRecord objects.

Class MapFasta(_files = [], _outputMode = 0, _fileScope = 0, _smartmotif = 0)
MapFasta makes instances of all the other classes within the Motif Mapper package. If you want to just use Motif Mapper for mapping CREs in a list of files with FASTA, this is the only class you need.

      Global Variables
       self.files = _files		#holds file paths self.motifsRX = {}		#holds motifs to be mapped self.outputMode = _outputMode	#holds output type self.fileScope = _fileScope	#holds the option for mapping, only per seq1 or all2, or both0 self.hits_local	= {} 		#holds hits for each sequence :SEQ->(len,MOTIFs->HITS) self.hits_global = {}      	#holds hits for the whole file MOTIF:->HITS self.M = motifs.Motifs() 	#make a Motifs class instance self.F_loader = '' 		##fasta.FASTA_dict() #reserve for FASTA_dict class instance self.Fiter = '' 		##fasta.Iterator()   #reserve for FASTA.Iterator class instance self.Fld = folders.Folders()	#make a Folders class instance self.Fdict = {}			#holds a dictonary of FastaRecords, use for mapObjs only! self.localhits_dict = {}	#holds match TUPLES as dictionary, use with mapObjs only! self.HandleMTBS	= None		#holds the open handle for the writing output self.HandleMMTB = None self.fileAppend = ''		#holds a string for appending to an anaylsis from a map() call self.pointMapData = {}		#holds whatever self.pointMaps() can make self.pointCurveData = {}	#holds whatever self.pointCurves() can make 				

      outputMode

        0 = memory (in self.hits_local and self.hits_global) no file handles opened
        1 = to disk directly (only for MTBS)
        2 = process all in memory, when finished dump to disk
        3 = process in memory (as outputMode 0), openfile handles but does not save to disk

      fileScope

        0 = return per sequence and file whole
        1 = only per seq
        2 = only file whole should be called for each file

      Fdict

        should be used only if the user want to map sequences from memory load sequences using addSEQtoFdict and call mapObjs(). You may save the results to a file, or keep them in memory, in which case they are save in localhits_tuple if you want to use a list of FastaRecord make the list first and use fasta.convert_to_dict

      hits_local

        contains the TUPLE of SEQ:->MOTIF:tuple[HITS, matches] (MTBS)

      hits_global

        contains the dictionary MOTIFS:->HITS (MMTB)

      fileAppend

        holds a string used to append to a map() call

        loadFiles(man=0, dir='')

          this calls the file path reading fuction of the Folders class. man = 0, all from a folder and requires a directory path, man = 1 enter files by hand

        smartmotif

          is the smart motif toggle for MapFiles, on means automatic antisene and all combos for composite motifs

          globalallfiles

            0 means hits_global is global per file, otherwise, over all files and all sequences

            checkMODES()

              this is used to check the user has not altered these variables into something usless.

            loadMotifs(motifsHandle='', man=0, ls=[])

              motifs must have be a dictionary of the entered motif as key and the regular expression as value use the motifs.motifs_re to get the appropriate dictionary if manual = 1 then make sure you send in a list (the motifsHandle will be automatically closed when done here)

            setOutput(_filename)

              this is called when setting the new output folders.

            closeOutput()

              closes the two file handles

            writeOutputMMTB()

              call this only when you want to write out the hits in the self.hits_global. this is normally called internally.

            writeOutputMTBS()

              this will return a subset of genes that match the FastaRecord.shortname string. this is also normally called internally.

            updateHitsGlobal()

              copys the keys from motifsRX and sets the values to 0. called internally.

            mapFiles()

              this is the principle sub-routine for this class which calls/uses all other functions present. This is the one you use to map the motif onto the FASTA files. To use, call loadMotifs and loadFiles, then call mapFiles().

            mapObjs(outputname='', savetofile=0)

              use for mapping sequences in memory only present in Fdict Fdict should be filled by calling addSEQtodict. FileScope is used to determine which output you need (MTBS or MMTB). OutputMode is not used. You can dump the results to a file by changing savetofile to 1, otherwise outputname is not relevant. FileAppend is still active and will append to the output name when saving to a file.

            mapSEQ(seq)

              this will map motifs in self.motifsRX onto the sequence passed and save them in self.hits_local and/or self.hits_global when ignoring the global counts, one can use the hit match objects for other analyses {motifKEY:(totalhits, [match start postion, end postion])}

            addSEQtoFdict(name, sequence, _alphabet=Alphabet.IUPAC.ambiguous_dna)

              this will add an FastaRecord entry to Fdict as a dictionary, to be called for mapObjs. The other option is to make a dictionary of sequences with <link fasta.html>fasta.covert_to_dict. To Map from files, use mapFiles.

            pointMapsFromMemory(_length)

              this should map PromoterPointMaps, that is we take the counts in their beginning 5' position and save them to a list by increasing the value for 1 for each position. The user defines a given length that all the sequences should have; those that do not have this length are not considered. To save the data to disk call savePointMaps() This processes any data in self.hits_local to make Maps

            pointCurvesFromMemory(_length)

              this should map PromoterPointCurves, that is we take the counts in their beginning 5' position and save them to a list by increasing the value at each position for the length of the motif. The user defines a given length that all the sequences should have; those that do not have this length are not considered. To save the data to disk call savePointCurves() This processes any data in self.hits_local to make Maps

            pointMaps(_length)

              same as pointMapsFromMemory but reads FASTA files.

            pointCurves(_length)

              same as pointCurvesFromMemory but reads FASTA files.

            statsOfPointMaps(_length, append='')

              um, just a little thing that calls PointCurves and then saves the var() of the pointCurveData
              requires that there are files(best just one) and motifs to be loaded
              REQUIRES Numpy

            statsOfPointCurves(_length, append='')

              um, just a little thing that calls PointCurves and then saves the var() of the pointCurveData
              requires that there are files(best just one) and motifs to be loaded
              REQUIRES Numpy

            savePointMaps(style=0,append='')

              saves the PointMap data to a file, you may add an appending name to file
              style can be 0 (individual files) or 1 (one file) only

            savePointCurves(_style=0,append='')

              saves the PointMap data to a file, you may add an appending name to file
              style can be 0 (individual files) or 1 (one file) only