Abstract
General Purpose Graphic Processing Units (GPGPUs) constitute an inexpensive resource for computing-intensive
applications that could exploit an intrinsic fine-grain parallelism. This paper presents the design and implementation in GPGPUs of an
exact alignment tool for nucleotide sequences based on the Burrows-Wheeler Transform. We compare this algorithm with state-of-theart
implementations of the same algorithm over standard CPUs, and considering the same conditions in terms of I/O. Excluding disk
transfers, the implementation of the algorithm in GPUs shows a speedup larger than 12, when compared to CPU execution. This
implementation exploits the parallelism by concurrently searching different sequences on the same reference search tree, maximizing
memory locality and ensuring a symmetric access to the data. The paper describes the behavior of the algorithm in GPU, showing a
good scalability in the performance, only limited by the size of the GPU inner memory.