GPU computing supposed a major step forward, bringing high performance computing to commodity hardware. Feature-rich parallel languages like CUDA and OpenCL reduced the programming complexity. However, to fully take advantage of their computing power, specialized parallel algorithms are required. Moreover, the complex GPU memory hierarchy and highly threaded architecture makes programming a difficult task even for experienced programmers. Due to the novelty of GPU programming, common general purpose libraries are scarce and parallel versions of the algorithms are not always readily available.