Skip to content

RankVector

Concept

The concept fmindex_collection::RankVector models a vector that stores elements of type uint8_t and provides a rank function for each symbol.

Classes fulfilling this concept are guaranteed to provide following functionality:

  • class is std::default_initializable<T>
  • class is std::movable<T>
  • constexpr size_t Sigma
  • T{std::span<uint8_t> data)
  • auto size() const -> size_t
  • auto rank(size_t idx, uint8_t symb) const -> size_t
  • auto prefix_rank(size_t idx, uint8_t symb) const -> size_t
  • auto all_ranks(size_t idx) const -> std::array<size_t, T::Sigma>
  • auto all_ranks_and_prefix_ranks(size_t idx) const -> std::tuple<std::array<size_t, T::Sigma>, std::array<size_t, T::Sigma>>

Examples usages

Creating a vector with rank support of length 100.

auto data = std::vector<uint8_t>{3, 2, 3, 2, 1, 1, 0, 3, 4, 5};
auto vector = T{data};

// prints the number of bits
std::cout << vector.size() << '\n';

// prints value of 5th bit (prints '1')
std::cout << vector.symbol(4) << '\n';

// prints how number of '2' in the first 4 numbers (prints '2')
std::cout << bitvector.rank(4, 2) << '\n';

Implementation

All implementation are located inside the namespace fmindex_collection::rankvector.

rank vector (fmindex_collection::rankvecto::) Description
Naive Stores every thing in std::vector<size_t>, requires O(n·log n).
MultiBitvector Standard FM-Index implementation. Using a bitvector for every alphabet character.
InterleavedBitvector8 Interleaving blocks and bits. Using 8 bits for storing block values. New super-block every 256bits.
InterleavedBitvector16 Interleaving blocks and bits. Using 16 bits for storing block values. New super-block every 65536 bits.
InterleavedBitvector32 Interleaving blocks and bits. Using 32 bits for storing block values. New super-block every 2^32 bits.
InterleavedBitvector8Aligned Same as InterleavedBitvector but the bits and blocks sections are aligned to 64bit.
InterleavedBitvector16Aligned Same as InterleavedBitvector but the bits and blocks sections are aligned to 64bit.
InterleavedBitvector32Aligned Same as InterleavedBitvector but the bits and blocks sections are aligned to 64bit.
InterleavedEPR8 Similar to `InterleavedBitvector, but bits are encoded according to their BWT representation.
InterleavedEPR16 See InterleavedEPR8.
InterleavedEPR32 See InterleavedEPR8.
InterleavedEPR8Aligned See InterleavedEPR8.
InterleavedEPR16Aligned See InterleavedEPR8.
InterleavedEPR32Aligned See InterleavedEPR8.
InterleavedEPRV2_8 Similar to InterleavedBitvector8, but custom bit shuffling. Faster and better use of space compared to EPR.
InterleavedEPRV2_16 See InterleavedEPRV2_8.
InterleavedEPRV2_32 See InterleavedEPRV2_8.
InterleavedEPRV2_8Aligned See InterleavedEPRV2_8.
InterleavedEPRV2_16Aligned See InterleavedEPRV2_8.
InterleavedEPRV2_32Aligned See InterleavedEPRV2_8.
EPRV3_8 Similar to InterlavedEPRV2_8 but no interleaving.
EPRV3_16 See EPRV3_8
EPRV3_32 See EPRV3_8
EPRV4 Similar to EPRV3 but using two additional layers of blocks. (8bit, 16bit and 32bit).
EPRV5 Similar to EPRV3 but using one additional layer of blocks. (8bit and 16bit).
DenseEPRV6 Same as EPRV5 but using a dense vector for the super blocks instead of 64bit values. (Not worth it).
InterleavedEPRV7 Similar to EPRV5 but blocks and lowest layer are interleaved.
InterleavedWavelet Special version of Wavelet-trees (!TODO I think this was a combination fo EPR and Wavelet, not sure)
RLE (!TODO some parameters are missing)
rRLE (!TODO some parameters are missing)
Sdsl_wt_bldc Wrapper of the Wavelet-Tree implementation of the SDSL library.
Sdsl_wt_epr Wrapper of the EPR implementation of the SDSL library. (!TODO something is broken)
Wavelet Custom Wavelet-Tree implementation.
CompactBitvector Special case for Sigma 2, using bitvector::CompactBitvector as implementation.

Statistics

Memory

!TODO, the σ=256 ran only on 100MB of data, not 1GB of data.

rank vector σ=4 σ=5 σ=6 σ=21 σ=256
Naive 256 320 384 1344 16384
MultiBitvector 5.5 6.9 8.3 28.9 352.0
MultiBitvector Sparse 2 5.2 5.9 6.7 17.1 178.8
MultiBitvector Sparse 4 5.1 5.8 6.3 12.3 93.5
MultiBitvector Sparse 8 5.6 6.6 7.4 12.9 54.9
MultiBitvector Sparse 16 43.4
MultiBitvector Sparse 32 52.5
MultiBitvector Sparse 64 83.5
MultiBitvector Sparse 2/2 6.0 7.2 8.3 24.2 266.8
MultiBitvector Sparse 4/2 5.2 6.0 6.8 15.7 137.5
MultiBitvector Sparse 4/4 5.2 6.0 6.7 14.1 115.5
MultiBitvector Sparse 4/-2 5.7 6.2 6.7 12.5 93.5
MultiBitvector Sparse 8/-2 5.6 6.2 6.7 11.0 52.2
MultiBitvector Sparse 8/-4 5.7 6.4 6.9 11.1 52.2
MultiBitvector Sparse 16/-4 10.8 32.9
InterleavedBitvector8 5.5 6.9 8.3 28.9 352.0
InterleavedBitvector16 5.0 6.3 7.5 26.3 320.3
InterleavedBitvector32 6.0 7.5 9.0 31.5 384.0
InterleavedBitvector8Aligned
InterleavedBitvector16Aligned
InterleavedBitvector32Aligned
InterleavedEPR8 4.0 6.2 6.9 24.7 328.0
InterleavedEPR16 4.0 6.9 7.6 33.4 520.3
InterleavedEPR32 6.0 10.7 12.2 61.3 1032
InterleavedEPR8Aligned
InterleavedEPR16Aligned
InterleavedEPR32Aligned
InterleavedEPRV2_8 4.0 5.3 5.5 13.3 104.0
InterleavedEPRV2_16 3.0 5.0 5.0 11.0 72.3
InterleavedEPRV2_32 4.0 6.0 6.0 16.0 136.0
InterleavedEPRV2_8Aligned
InterleavedEPRV2_16Aligned
InterleavedEPRV2_32Aligned
EPRV3_8 3.5 4.9 5.3 12.9 104.0
EPRV3_16 3.0 4.3 4.5 10.3 72.3
EPRV3_32 4.0 5.5 6.0 15.5 136.0
EPRV4 2.8 3.9 4.1 8.9 56.1
EPRV5 2.8 3.9 4.1 9.0 56.3
DenseEPRV6 2.8 3.9 4.1 8.9 56.1
InterleavedEPRV7 2.8 3.9 4.1 8.9 56.3
InterleavedWavelet 5.0 5.5 6.0 15.5 137.0
RLE
rRLE
Sdsl_wt_bldc 2.5 3.0 3.3 5.6 10.0
Sdsl_wt_epr
Wavelet 4.0 4.0 4.0 6.7 12.0
CompactBitvector

Run-time

Input data was 100'000 characters in range of 1-4

Name Construct(ns) symbol() (ns) rank() (ns)
Naive<256> 252.89ns na na
MultiBitvector<256, bitvector::Bitvector> 218.50ns 39.99 42.66
MultiBitvector<256, bitvector::CompactBitvector> 217.75ns 44.17 27.69
MultiBitvector<256, bitvector::CompactBitvector4Blocks> 220.26ns 44.54 28.03
MultiBitvector<256, bitvector::SparseBLEBitvector<>> (defect) 453.69ns 132.20 62.22
InterleavedBitvector8<256> 7.11ns 29.57 26.70
InterleavedBitvector16<256> 5.15ns 28.40 24.53
InterleavedBitvector32<256> 6.62ns 28.52 25.12
InterleavedBitvector8Aligned<256> 6.91ns 28.17 26.48
InterleavedBitvector16Aligned<256> 5.07ns 28.03 24.44
InterleavedBitvector32Aligned<256> 6.46ns 28.39 25.06
InterleavedEPR8<256> 7.59ns 14.83 50.80
InterleavedEPR16<256> 18.38ns 14.86 42.69
InterleavedEPR32<256> 39.76ns 16158.44 19360.74
InterleavedEPR8Aligned<256> 12.18ns 16.11 51.17
InterleavedEPR16Aligned<256> 21.47ns 16.24 44.59
InterleavedEPR32Aligned<256> 41.81ns 15377.29 18771.09
InterleavedEPRV2_8<256> 3.42ns 39.15 40.15
InterleavedEPRV2_16<256> 2.55ns 38.89 43.58
InterleavedEPRV2_32<256> 3.73ns 47.48 55.08
InterleavedEPRV2_8Aligned<256> 3.43ns 38.04 44.63
InterleavedEPRV2_16Aligned<256> 2.43ns 37.40 43.29
InterleavedEPRV2_32Aligned<256> 3.91ns 37.21 52.23
EPRV3_8<256> 3.65ns 36.76 61.26
EPRV3_16<256> 2.81ns 36.62 61.12
EPRV3_32<256> 2.92ns 36.67 46.78
EPRV4<256> 4.14ns 36.75 48.82
EPRV5<256> 4.07ns 36.28 46.65
DenseEPRV6<256> 3.97ns 37.59 65.85
InterleavedEPRV7<256> 3.99ns 38.32 45.06
InterleavedWavelet<256> 35.91ns 296.69 146.26
Wavelet<256> 27.85ns 1408.59 615.17
RLE<256, 4, InterleavedBitvector16<256>, InterleavedBitvector16<256>> 6.28ns 54.24 82.94
Sdsl_wt_bldc<256> 10.83ns 76.36 108.03