Abstract
Multicore and graphic processing units (GPUs) can be combined to efficiently implement signal-processing algorithms for communication systems, due to their parallel processing capabilities. This paper
proposes a fully parallel fixed-complexity soft-output detector, which is suitable for GPU implementation and allows a considerable decrease in the computational time required for the data detection stage in
multiple-inputmultiple-output (MIMO) systems. A novel channel matrix preprocessing stage, based on column-norm ordering, is developed to efficiently match the multicore architecture. The throughput of the implementation is shown to outperform other recent implementations and to support some of the configurations in the long-term evolution (LTE) standard.