an efﬁcient multicore implementation of planted motif …zubair/papers/hpcs2010_final.pdf · an...

An Efficient Multicore Implementation of Planted Motif Prob lem

Naga Shailaja DasariDepartment of Computer Science,

Old Dominion University,Norfolk, Virginia

[email protected]

Ranjan DeshDepartment of Computer Science,


[email protected]

Zubair MDepartment of Computer Science,


[email protected]

ABSTRACT

In this paper we propose a parallel algorithm for theplanted motif problem that arises in computational biol-ogy. A variety of algorithms have been proposed in theliterature to solve this problem. The drawback of all thesealgorithms is that they have been designed to work on se-rial computers; and are not suitable for parallelization oncurrent multicore architectures. We have implemented theproposed algorithm on a 4 Quad-Core Intel Xeon X55502.67GHz processor for a total of 16 cores. We compare ourperformance results with the best performance results re-ported in the literature; and showed that the performanceof our algorithm scales linearly with the number of cores.We also solved the (21, 8) challenging instance on 16 coresin 6.9 hrs.

KEYWORDS: Bioinformatics, multicore, planted mo-

tif problem, parallel computing, cache, DNA.

1. INTRODUCTION

The planted motif problem (PMP) is a fundamental searchproblem with applications in computational biology, espe-cially in locating regulatory sites [1], [2]. The(l, d) plantedmotif problem can be defined as: “Given a set ofn DNA se-quences, each of lengthL, findM , the set of sequences(ormotifs) of length-l which have at-least oned-neighbor ineach of then sequences”. Ad-neighbor is a sequence oflengthl that differs from the motif in at mostd positions.

We refer to a sequence of lengthl as anl-mer in the rest ofthe paper.

Sequential algorithms for this problem have been exten-sively studied in the literature [2]. A number of these al-gorithms find the approximate motif [1], [3], [4] and othersfind the exact motif [5], [6], [7], [8], [9], [10], [11], [12],[13], [14]. In this paper, we focus on the exact motif find-ing problem. The drawback of these algorithms is that theyhave been designed to work on serial computers and arenot suitable for straightforward parallelization on currentmulticore architectures. One of the issues we need to beaware on multicore architectures is that caches are sharedby different cores and a cache line that is updated by dif-ferent cores generates a lot of memory traffic. Therefore itis desirable to have a parallel algorithm that works, wheredifferent cores update different portions of the storage area.To understand this better, consider the original exact algo-rithm of Waterman et al. [15], which we have modified formulticore implementation. The original algorithm main-tains a table of size4l. For each subsequence,x, of lengthlin a sequence of lengthL, generate the(l, d)-neighborhoodwhich is all patterns (sequences) of sizel that differ fromx in at mostd positions. For each of the pattern in the(l, d)-neighborhood, we increment the corresponding entryin the4l -size table. Once we have processed all the subse-quences of sizel, we can use the score to find the plantedmotif. One straightforward approach to parallelize this al-gorithm is to assign a sequence of lengthL to each core forprocessing and updating a common table of size4l that isshared by all cores. This results infalse sharing on multi-core architectures [16]. False sharing occurs when threadson different cores write to different locations on the same

cache line. Assume core1 loads a cache line for the firsttime and it is marked as “Exclusive” (Here, we are assum-ing that the multicore architecture implements MESI cachecoherency protocol [16]). If core2 loads the same cacheline, it is marked as “Shared.” If core2 writes into loca-tion X on the cache line, the line is marked as “Modified”.Once core2 has a cache line in a modified state, it snoopsall reads to this cache line. Now if core1 reads location Y(different from X, the one modified by core2) on the samecache line, the read will be intercepted by core2. Typically,core2 then forces the read to retry and writes the line tomain memory and mark it as “Shared.” This results in in-crease in memory traffic that can have a significant impacton the performance.

To address this problem we use a bit based approach whereat the expense of some additional storage we avoid any in-crease in memory traffic due to false sharing. We allocateone array of size4l -bits for each sequence of sizeL. Thisenables every core to update its own bit-array without anyinterference from the other cores. The basic bit based ap-proach finds thed-neighborhood of thel-mers in each inputsequence, and sets the bits corresponding bits in the bit ar-ray. It then finds the motifs by performing logical ANDon the bit arrays. In addition to this modification to theoriginal algorithm [15], we suggest three major modifica-tions to address memory issues and enhance performanceof the parallel algorithm: incremental support, limiting mo-tif search space, and filtering.

Incremental support: To address the memory issue and fur-ther enhance the runtime performance, we propose a paral-lel incremental approach, which is based on the incremen-tal approach suggested in the context of sequential imple-mentation by [9]. To solve(l, d) instance, the incremen-tal approach works by solving the(l′, d) instance, wherel′ ≤ l, and then extending it to(l, d). The advantage ofthe incremental approach is that it is easily parallelizableas opposed to tree-based approaches, graph based that aredifficult to parallelize.

Iterative approach: Recall that the basic kernel in the pro-posed algorithm is that for each of thel-mer in thed-neighborhood, we set the bits in the4l size bit array. Here,if we limit the motif search space by considering specificl-mers in thed-neighborhood, say all thel-mers with nu-cleotideG at the first position, we need to work with only4l−1 size bit array. However, this requires that we re-peat this kernel computation four times corresponding tofour nucleotides in the first position of thel-mer in thed-neighborhood.

Filtering: In this approach we work with a smaller set,n′

wheren′ ≤ n, of input sequences to find the candidate mo-

tifs and filter the candidate motifs by finding if they haved-neighbors in rest of the input sequences. The key obser-vation here is that the number of candidate motifs falls ex-ponentially as we increase the number of sequences in thesmaller set. This enables us to work with a small numberof candidate motifs during the filtering phase. As findingcandidate motif is the most expensive operation, this en-hances the overall running time. The serial version of thisapproach is discussed in [9].

In this paper, we propose a parallel algorithm that incor-porates all the above ideas to solve the challenge problem[1]. There are many instances of PMP that become chal-lenging for various approaches, the reason is given in [1].Some of the challenging(l, d) instances forn = 20 andL = 600 are (13, 4), (15, 5), (17, 6), (19, 7). The pro-posed algorithm has a number of parameters such as thevalue ofl′ for the incremental approach, and the number ofsequences(n′) to consider for selecting candidate motifs.The values we assign to these parameters have a significantimpact on the performance of the algorithm. The optimumvalues for these parameters depend on the resources avail-able on the target machine, and the growth rate of candidatemotifs. We have developed a set of rules based on our ex-perimentation and theoretical analysis to derive the valuesof various parameters for optimum performance. For ex-ample, if we are trying to solve(20, 5) problem on a mul-ticore architecture with4 cores and1-GB memory, we firstsolve (15, 5) and then use incremental approach to solve(20, 5). We work with 8 sequences to identify candidatemotifs and then filter the candidate motifs to find the mo-tifs for 20 sequences. The theoretical analysis and our ex-periments that help us in deciding the parameter values arediscussed later in section 3.

2. THE BITBASED APPROACH

Let S = {Si ∣ i = 0 to n − 1} be the set ofn input se-quences.Sl

i{j}, j = 0 toL − l + 1, denotes thel-mer insequenceSi starting at locationj. Let N l,d

i {j} be the setof d-neighbors ofSl

i{j} and letN l,di = ∪L−l+1

j=0 N l,di {j}.

It is easy to see that the set of planted motifs,M is

M =

n−1∩

i=0

N l,di

As in PMS1, BitBased approach also first generatesN l,di

for the input sequences and then performs the intersectionto generateM . PMS1 first sorts thel-mers inN l,d

i and thenmerges the sorted output to findM . Our BitBased approachrepresentsN l,d

i using bit arrays which allows us to directlyperform the intersection using ANDing of bit arrays. Weobserve that, one can potentially use a single integer array,

and increment operation instead of using bit arrays poten-tially saving space [15]. However, this presents difficultieswhen one tries to make use of parallelization. For simplic-ity of understanding we present our algorithm modularly.The basic approach is presented in subsection 2.1. Wethen present the iterative approach in subsection 2.2 whichis applied in the cases where the memory required is notavailable. The iterative approach would reduce the spacecomplexity but also would increase the time complexity.The basic algorithm is improved by using incremental mo-tif computation in subsection 2.3. The algorithm is furtherimproved in subsection 2.4 using the idea that computingmotifs for a subset of sequences and then filtering out spu-rious motifs is faster. Throughout, our methods are drivenby the desire to efficiently make use parallelism.

2.1. The Basic Approach

The basic approach is divided into two phases. The firstphase presented in subsection 2.1.1 sets bits in the bit ar-rays, the second module presented in subsection 2.1.2 per-forms intersection of the bit arrays to find the motifs.

2.1.1. Setting BitsIn setting bits phase an array of size4l bits is allocated foreach input sequence. LetBi be the bit array correspondingto the input sequenceSi, wherei = 0 to n − 1. Each bitin the bit array corresponds to anl-mer. For example, forl = 7, the index0 corresponds to the7-mer AAAAAAA,index1 corresponds toAAAAAAC, similarly index47 − 1corresponds toTTTTTTT, assumingA = 0, C = 1, G =2 andT = 3. Eachl-mer Sl

i{j}, j = 0 to L − l + 1,is enumerated to generate itsd-neighborhood setN l,d

i {j}.The bits corresponding to thel-mers inN l,d

i {j} are set inBi for i = 0 to n − 1. We represent each residue using2 bits. i.e. A is represented by00, C by 01, G by 10 andT by 11. The index of the bit corresponding to anl-mercan be easily obtained by replacing the residues with the 2bits that represent the residue. For example, the index ofthe sequenceGACCTG is 100001011110. The procedurefor setting bits is given in algorithm 1. Figure 1 shows anexample of how the bits are set for a given sequence.

After SetBits is executed, a bit arrayBi has a bit set to1at indexj only if the l-mer corresponding toj is presentin N l,d

i . It is worth noting thatSetBits is very amenable toparallelization.

2.1.2. Finding MotifsIn finding motifs phase, the motifs are found using the bitarrays generated in the setting bits phase. The requiredplanted motifs are the ones which correspond to the indicesat which all the bit arrays have the bits set. The indexes can

Algorithm 1 SetBitsInput: n, l, dOutput: B = {B0, B1, ..., Bn−1}

1: for i = 0 to n− 1 do2: for j = 0 to L− l+ 1 do3: GenerateN , the set of alld-neighbors ofSl

i{j}4: for eachl-merp in N do5: Calculate indexidx corresponding top6: Bi[idx] = 17: end for8: end for9: end for

Figure 1. Example Of Setting Bits For (2,1) Instance

be found by just performinglogical AND operation on thebit arrays. Figure 2 shows an example of finding motifs forl = 2. The procedure for finding motifs is given in algo-rithm 2. The output of this algorithm isM , the set of allplanted motifs.

Algorithm 2 FindMotifs

Input: n, l, B = {B0, B1, ..., Bn−1}Output: M

1: M = ∅2: B0 =

⋀i=n−1

i=0Bi

3: for j = 0 to 4l − 1 do4: if B0[j] = 1 then5: let p be thel-mer that corresponds to indexj6: M = M ∪ {p}7: end if8: end for

The two phases can be easily parallelized. LetP be thenumber of processors. In setting bits phase then input se-quences can be divided amongP processors. Each proces-sor is assignedn/P sequences. Each processor only setsthe bits in the arrays corresponding to the input sequencesassigned to it and thus will not result in any conflicts be-

Figure 2. Example Of Finding Motifs Of Length 2

tween processors. If the number of processors is more thanthe number of input sequences, then all the processors canbe utilized by assigning a single sequence to multiple pro-cessors. Conflicts can be avoided in this case by divid-ing the bit arrays among the processors. For example, ifa single input sequence is assigned to four processors, thenthe bit array corresponding to the input sequence is dividedamong the four processors. Each processor enumerates thel-mers in such a way that it only sets the bits in the partof the bit array assigned to it. The finding motifs phase canalso be easily parallelized. Theith processor is assigned in-dexes from4l/P ⋅i to 4l/P ∗(i+1)−1 of all the bit arrays.Each processor performslogical AND operation on the partof bit arrays assigned to it to find the planted motifs.

The main issue with this approach is the memory require-ment, which grows exponentially withl. For example forn = 20,L = 600 the(15, 4) instance would require 2.5GBof memory and the(17, 6) instance would require 40GB ofmemory. The memory requirements for using this simplescheme directly become prohibitive for problem instance(19, 7) on commonly available multicore or parallel ma-chines. However, this approach is very flexible and it canbe parametrized to work with only the memory availableto it. In other words, it is possible to use this approach totrade off space for time. The iterative approach in section2.2 explains how to solve the(l, d) instances for which thememory requirement is higher than the available memory.

2.2. Iterative Approach

The basic BitBased approach fails if the memory requiredfor the bit arrays cannot be allocated. The iterative ap-proach, which works by reusing the memory that is avail-able, can be applied to such instances. Notice that if4l

bit array(that is constructed using the method that was ex-plained in section 2.1.1) is divided into four parts, each partof size4l−1, all thel mers corresponding to the bits in first

part start with residueA second part start withC, third withG and fourth withT. Similarly, if the bit array is dividedinto 4l parts, each part of size4l−l, the firstl residues aresame for all thel-mers corresponding to the bits in the samepart. Iterative approach works by virtually breaking the4l

bit space into4l−lmax parts, where4lmax is the maximumnumber of bits that can be allocated for each bit array. Ititeratively performs the set bits and find motifs operation.In itℎ iteration, it sets the bits and finds motifs only in theitℎ part, which is of size4lmax . Note that while finding theindex of the bit corresponding to a sequence, we only needthe trailinglmax residues as the startingl − lmax residuesare the same in each part. Algorithm 3 shows the procedurefor the iterative approach.

2.3. Increment Motifs

This modification significantly improves the performanceof the approach along with improving the space complex-ity. Given the set of motifs of(l − 1, d) instance, theird-neighbors in all then input sequences and their corre-sponding distances from the motifs, we can find the motifsfor (l, d) instance inO(n) time. This can be done by usingthe following lemma.

Algorithm 3 IterativeApproachInput: n, l, lmax

Output: M

1: Let ldiff = l− lmax

2: for idx = 0 to 4ldiff − 1 do3: get the sequencep of lengthldiff that corresponds

to idx4: {setting the bits inidxtℎ part}5: for i = 0 to n− 1 do6: for j = 0 to L− l+ 1 do7: get distanced′ betweenp andSldiff

i {j}

8: generateN lmax,d−d′

i {j + ldiff}

9: for eachlmax-merq in N lmax,d−d′

i {j + ldiff}do

10: get indexidx′ corresponding toq11: setBi[idx

′] = 112: end for13: end for14: end for15: call FindMotifs with input n, lmax, B =

{B0, B1, ..., Bn−1} and get the outputM ′

16: for each sequencer in M ′ do17: appendr to p and add the appended sequence to

M18: end for19: clear all the bit arraysB0 toBn−1

20: end for

Lemma: If there is a motifp = p0p1..pl−1 for (l, d) in-stance which hasd-neighbors at positions(j0, j1, ..., jn−1)and corresponding distance(d0, d1, ..., dn−1) from p inthe n input sequences, thenp′ = p0p1 . . . pl−2 is a mo-tif of (l − 1, d) instance which has neighbors at positions(j0, j1, ..., jn−1) with errors (d′0, d

′

1, . . . , d′

n−1) whered′i = di if S1

i {ji + l − 1} = pl−1, andd′i = di − 1otherwise.

To solve(l, d) instance, we first find the planted motifs for(l′, d) instance, wherel′ <= l, and then apply the incre-mental algorithm iteratively to obtain the planted motifs for(l, d) instance. The procedure for this modification is givenin Algorithm 4.M ′, the set of planted motifs for(l′, d) in-stance, is input to the increment motifs phase andM , theset of motifs for(l, d) instance is generated.

The lower the value ofl′, the less time it takes for settingbits and finding motifs and also less memory is required.But if the value ofl′ is too low, the number of(l′, d) motifswould be very high and so the time spent in increment mo-tifs phase would be high. So an optimal value ofl′ needsto be chosen. Finding an optimal value ofl′ is discussed insection 3.1.

2.4. Filtering Motifs

This modification also improves the performance of the ap-proach and reduces the space requirement. To solve plantedmotif problem for(l, d) instance inn input sequences, wefirst find the planted motifs only forn′ input sequences,wheren′ ≤ n. We call these motifscandidate motifs.These candidate motifs are then filtered to find the motifsfor n sequences by checking if each of the candidate motifsis present in the remaining(n − n′) input sequences. Thelower the value forn′, the less time it takes for setting thebits, which is the dominating part as discussed earlier. Butif n′ is too low, the number of candidate motifs is high andso is the time spent in filtering motifs. Therefore, it is im-portant to choose an optimum value forn′ which balancesthe time spent in setting bits and the time spent in filteringmotifs. Finding an optimum value forn′ is discussed indetail in section 3.1.

2.5. The Combined Algorithm

The BitBased approach works by combining all the phasesdiscussed so far. First the optimum value ofl′ is calcu-lated. Then the optimum value ofn′ for (l′, d) instance iscalculated. If there is enough memory available forn′ bitarrays each of size4l

′

then the candidate motifs are foundusing the basic approach. Otherwise the candidate motifsare found using the iterative approach. Once the candidate

Algorithm 4 IncrementMotifs

Input: n, l′, l, M ′

Output: M

1: for eachl′-merp in M ′ do2: for i = 0 to n− 1 do3: for j = 0 to L− l+ 1 do4: calculated′, the distance betweenp andSl′

i {j}5: if d′ ≤ d then6: add(j, d′) to N l′

i

7: end if8: end for9: end for

10: add (p,N l′) to Kl′ , where N l′ ={N l′

0 , Nl′

1 , ..., Nl′

n−1}11: end for12: for l̃ = l′ to l − 1 do13: for each(p,N l̃) in K

l̃do

14: for each residueR in {A,C,G, T } do15: setcount = 016: for i = 0 to n− 1 do17: for each(j, d′) in N l̃

i do18: if (S1

i {j + l̃} == R) then

19: add(j, d′) to N l̃+1

i

20: else ifd′ < d then21: add(j, d′ + 1) to N l̃+1

i

22: end if23: end for24: if N l̃+1

i is not emptythen25: incrementcount by 126: end if27: end for28: if count == n then29: {appendR to p}30: p = p+R

31: add(p,N l̃+1) toKl̃+1

32: end if33: end for34: end for35: end for36: for each(p,N l) in Kl do37: addp toM38: end for

motifs are obtained, they are filtered to find the motifs for(l′, d) instance. The resultant motifs then go through theincremental phase to obtain the motifs for(l, d) instance.

We now discuss the time and space requirements of thesemethods. Recall that we haven sequences each of lengthLand that we are trying to find motifs of sizel with at mostd errors. The time complexities ofSetBits, FindMotifs, Fil-

Table 1. Comparision Chart

Algorithm (13,4) (15,5) (17,6) (19,7) (21,8)BitBased-16 2s 11s 2.4m 30.6m 6.9hBitBased-8 2s 16s 3.5m 42.3m -BitBased-4 4s 29s 6.5m 1.3h -BitBased-1 9s 1.8m 20.6m 4.7h -PMSprune 53s 9m 69m 9.2h -

terMotifs andIncrementMotifs areO(nLl�(l, d)), O(4ln),O(cLl′(n− n′)) andO(xnLl′) respectively, where�(l, d)is the number ofd-neighbors for anl-mer which is given by�(l, d) =

∑d

i=0

(li

)(Σ− 1)

i, c is the number of candidatemotifs andx is the number of motifs for(l′, d) instance.Though the time complexity ofFindMotifs is in the orderof 4l, since we only perform logical AND operation,Find-Motifs takes very less time, and in many cases negligiblewhen compared toSetBits. The time complexity of itera-tive approach isO(nlL�(lmax, d)4

l−lmax + 4ln). Whenvalues ofn′ andl′ are chosen to be optimal, the time takenfor filtering and incremental phases can be ignored makingthe time complexityO(n′Ll′�(l′, d) + 4l

′

n′). The spacecomplexities for basic and iterative approaches areO(n4l)andO(n4lmax) respectively.

3. EXPERIMENTAL RESULTS

We obtained our results on a 4 quadcore 2.67 GHz IntelXeon X5550 machine. Our program is coded inC usingopenMP directives for parallelizing the code. We haveperformed our experiments on random data with motifsplanted at random positions. We have takenn = 20 andL = 600 for all our experiments. We compare our re-sults with PMSPrune[5] which is the most recent exact ap-proaches developed for PMP to the best of our knowledge.Table 1 compares our results with PMSprune. In the ta-ble, the algorithm BitBased-[n] represents the bit based al-gorithm usingn processors. ’-’ in the table indicates thatthe the value cannot be computed or it takes more than 10hours. We have used a maximum of 1GB of memory forbit arrays for all our experiments. For(17, 6), (19, 7) and(21, 8), we used the iterative approach mentioned in section2.2. BitBased scales well with the number of processors.Figure 3 shows the scalability results for the BitBased ap-proach. Because the time taken for the incremental phaseis minimal, BitBased takes approximately the same timefor any(l, d) instance as for(l′, d) instance, wherel ≥ l′.For example, for(20, 4) instance, it takes same time as for(13, 4) instance.

1

2

3

4

5

6

7

8

9

10

1 2 4 8 16

Spe

edup

number of processors

(15,5)(17,6)(19,7)

Figure 3. Scalability Plot

3.1. Calculatingl′ And n′

The performance of our algorithm depends significantly onthe values ofl′ andn′. First we calculatel′ and then calcu-laten′ for (l′, d) instance. To find the optimuml′ value for(l, d), calculateE(l, d), the approximate expected numberof (l, d) motifs [3], for l = 0 to l. E(l, d) is given by

E(l, d) = 4l(1− (1 − pd)L−l+1

)n

(1)

wherepd is the probability that anl-mer has a neighbor ata given position in a random sequence of lengthL. pd iscalculated using the below equation

pd =d∑

i=0

(l

i

)(3

4

)i(1

4

)l−i

(2)

The value ofl′ is chosen to be the smallest value ofl, thathasE(l, d) less than some threshold value, say 100. Theapproximate estimate of number of(l′, d) motifs for differ-ent values ofl′ andd are given in table 2. From the tableit can be seen that the optimum values forl′ whend = 4,d = 5 andd = 6 are13, 15 and17 respectively. In otherwords, to solve an(l, 4) instance wherel ≥ 13 one shouldfirst solve(13, 4) instance and use incremental motif com-putation thereafter.

Finding the optimal value forn′ is relatively complex. Firstan estimate ofc, the number of candidate motifs, for dif-ferent values ofn, 1 ≤ n ≤ n are calculated using theequation 1. To find the optimum value ofn′, one needsto take into consideration,c, the available memory and

Table 2. Estimated Number Of(l′, d) Motifs

d = 4 d = 5 d = 6

l′

x l′

x l′

x

11 3.3 ∗ 106

13 3.3 ∗ 107

15 1.8 ∗ 108

12 2.3 ∗ 105

14 3.6 ∗ 105

16 2.9 ∗ 105

13 5.2 15 2.8 17 0.9

14 4.2 ∗ 10−7

16 2.3 ∗ 10−7

18 7.1 ∗ 10−8

15 2.2 ∗ 10−15

17 2 ∗ 10−15

19 9.1 ∗ 10−16

also the number of processors. We need to consider mem-ory because ifn′ is high the amount of memory requiredis also high. For example, for(15, 5), if n′ is chosento be 8, then only8 ∗ 415 bits(1GB) of memory is re-quired. If n′ is chosen to be 16, then16 ∗ 415 bits(2GB)of memory is required. We calculaten′ by estimating thenumber of operations using the time complexity which isO(⌊n′/P ⌋Ll′�(l′, d)+(4l

′

+cLl′(n−n′))/P ). We calcu-late the number of operations for different values ofn′ andchoose the one which results in minimum number of oper-ations and also satisfying the memory requirement. For ourexperiments we have chosen the value ofn′ to be 8, whichin most cases is the optimaln′ value.

4. CONCLUSION

We presented a simple, efficient, parallel and parametrizedapproach for solving the planted motif problem. We first in-troduced the basic approach and then added different mod-ifications which improve the performace of the algorithmand also decrease the memory requirement. BitBased wasable to solve instances upto error 5 within a quarter of aminute using 1GB of memory. The iterative approach wasable to solve the(17, 6), (19, 7) instances. It was alsoable to solve(21, 8) instance using 16 processors and 1GBmemory in 6.9 hrs.(21, 8) was not reported as solved inthe literature so far.

REFERENCES

[1] P. A. Pevzner and S.-H. Sze, “Combinatorial approaches tofinding subtle signals in dna sequences,” inISMB, 2000, pp.269–278.

[2] M. K. Das and H.-K. Dai, “A survey of dna motif findingalgorithms,”BMC Bioinformatics, vol. 8, no. S-7, 2007.

[3] J. Buhler and M. Tompa, “Finding motifs using random pro-jections,” Journal of Computational Biology, vol. 9, no. 2,pp. 225–242, 2002.

[4] A. L. Price, S. Ramabhadran, and P. A. Pevzner, “Findingsubtle motifs by branching from sample strings,” inECCB,2003, pp. 149–155.

[5] J. Davila, S. Balla, and S. Rajasekaran, “Fast and practi-cal algorithms for planted (l, d) motif search,”IEEE/ACMTransactions on Computational Biology and Bioinformat-ics, vol. 4, pp. 544–552, 2007.

[6] M.-F. Sagot, “Spelling approximate repeated or commonmotifs using a suffix tree,” inLATIN, 1998, pp. 374–390.

[7] M. Tompa, “An exact method for finding short motifs in se-quences, with application to the ribosome binding site prob-lem,” in ISMB, 1999, pp. 262–271.

[8] F. Y. L. Chin and H. C. M. Leung, “Voting algorithms fordiscovering long motifs,” inAPBC, 2005, pp. 261–271.

[9] S. Rajasekaran, S. Balla, and C.-H. Huang, “Exact algo-rithms for planted motif problems,”Journal of Computa-tional Biology, vol. 12, no. 8, pp. 1117–1128, 2005.

[10] L. Marsan and M.-F. Sagot, “Extracting structured motifsusing a suffix tree - algorithms and application to promoterconsensus identification,” inRECOMB, 2000, pp. 210–219.

[11] A. M. Carvalho, A. T. Freitas, A. L. Oliveira, and M.-F.Sagot, “A highly scalable algorithm for the extraction of cis-regulatory regions,” inAPBC, 2005, pp. 273–282.

[12] N. Pisanti, A. M. Carvalho, L. Marsan, and M.-F. Sagot,“Risotto: Fast extraction of motifs with mismatches,” inLATIN, 2006, pp. 757–768.

[13] E. Eskin and P. A. Pevzner, “Finding composite regulatorypatterns in dna sequences,” inISMB, 2002, pp. 354–363.

[14] J. Davila, S. Balla, and S. Rajasekaran, “Space and timeef-ficient algorithms for planted motif search,” inInternationalConference on Computational Science (2), 2006, pp. 822–829.

[15] M. S. Waterman, R. Arratia, and D. J. Galas, “Pattern recog-nition in several sequences: consensus and alignment,”BullMath Biol., vol. 46, no. 515-527, 1984.

[16] “MESI protocol,” 2008, http://software.intel.com/en-us/articles/mesi-protocol.

an efﬁcient multicore implementation of planted motif …zubair/papers/hpcs2010_final.pdf · an...

Documents