In the implementation I am working with, a bloom filter allows us to determine if a particular item has been selected from a set. Both the set and subset can be large.
While speed is a benefit, the real benefit for us is the memory footprint. We can quickly determine if the item is (probably) selected, without maintaining a list of (large) keys.
I am researching various types of bloom filters that must fit within the following parameters:
- Low false positive
- Memory efficient
- Deletable
- No false negative
The last two pose the challenge. A standard implementation of a bloom filter does not have false negatives, but items cannot be deleted. Deletions without knowing if an item was previously inserted can cause false negatives in the following implementations:
- counting bloom filter
- cuckoo filter
- dlbf (possibly, not tested)
Research links:
- False negative research (2010): http://ieeexplore.ieee.org/abstract/document/5374398/
- DlBF: A deletable bloom filter with supposed 0 false positives (2010): https://arxiv.org/pdf/1005.0352.pdf
- Wikipedia Bloom filter alternatives: https://en.wikipedia.org/wiki/Bloom_filter#Alternatives
- An optimal bloom filter replacement (2005) : http://www.it-c.dk/people/pagh/papers/bloom.pdf