public final class BloomFilterBuilder extends Object
The underlying math is described in the Wikipedia article on Bloom filters.
| Constructor and Description |
|---|
BloomFilterBuilder() |
| Modifier and Type | Method and Description |
|---|---|
static BloomFilter |
createByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using a random base seed for the hash function.
|
static BloomFilter |
createByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb,
long seed)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using the provided base seed for the hash function.
|
static BloomFilter |
createBySize(long numBits,
int numHashes)
Creates a BloomFilter with given number of bits and number of hash functions,
using a rnadom base seed for the hash function.
|
static BloomFilter |
createBySize(long numBits,
int numHashes,
long seed)
Creates a BloomFilter with given number of bits and number of hash functions,
using the provided base seed for the hash function.
|
static long |
getSerializedFilterSize(long numBits)
Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with the given number of bits.
|
static long |
getSerializedFilterSizeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb)
Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with an optimal number of bits
and hash functions for the given inputs.
|
static BloomFilter |
initializeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb,
long seed,
org.apache.datasketches.memory.WritableMemory dstMem)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using the provided base seed for the hash function and writing into the provided WritableMemory.
|
static BloomFilter |
initializeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb,
org.apache.datasketches.memory.WritableMemory dstMem)
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs,
using a random base seed for the hash function and writing into the provided WritableMemory.
|
static BloomFilter |
initializeBySize(long numBits,
int numHashes,
long seed,
org.apache.datasketches.memory.WritableMemory dstMem)
Initializes a BloomFilter with given number of bits and number of hash functions,
using the provided base seed for the hash function and writing into the provided WritableMemory.
|
static BloomFilter |
initializeBySize(long numBits,
int numHashes,
org.apache.datasketches.memory.WritableMemory dstMem)
Initializes a BloomFilter with given number of bits and number of hash functions,
using a random base seed for the hash function and writing into the provided WritableMemory.
|
static long |
suggestNumFilterBits(long maxDistinctItems,
double targetFalsePositiveProb)
Returns the optimal number of bits to use in a Bloom Filter given a target number of distinct
items and a target false positive probability.
|
static short |
suggestNumHashes(double targetFalsePositiveProb)
Returns the optimal number of hash functions to achieve a target false positive probability.
|
static short |
suggestNumHashes(long maxDistinctItems,
long numFilterBits)
Returns the optimal number of hash functions to given target numbers of distinct items
and the BloomFilter size in bits.
|
public static short suggestNumHashes(long maxDistinctItems,
long numFilterBits)
maxDistinctItems - The maximum expected number of distinct items to add to the filternumFilterBits - The intended size of the Bloom Filter in bitspublic static short suggestNumHashes(double targetFalsePositiveProb)
targetFalsePositiveProb - A desired false positive probability per itempublic static long suggestNumFilterBits(long maxDistinctItems,
double targetFalsePositiveProb)
maxDistinctItems - The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb - A desired false positive probability per itempublic static long getSerializedFilterSizeByAccuracy(long maxDistinctItems,
double targetFalsePositiveProb)
maxDistinctItems - The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb - A desired false positive probability per itempublic static long getSerializedFilterSize(long numBits)
numBits - The number of bits in the target BloomFilter's bit array.public static BloomFilter createByAccuracy(long maxDistinctItems, double targetFalsePositiveProb)
maxDistinctItems - The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb - A desired false positive probability per itempublic static BloomFilter createByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, long seed)
maxDistinctItems - The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb - A desired false positive probability per itemseed - A base hash seedpublic static BloomFilter createBySize(long numBits, int numHashes)
numBits - The size of the BloomFilter, in bitsnumHashes - The number of hash functions to apply to itemspublic static BloomFilter createBySize(long numBits, int numHashes, long seed)
numBits - The size of the BloomFilter, in bitsnumHashes - The number of hash functions to apply to itemsseed - A base hash seedpublic static BloomFilter initializeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, org.apache.datasketches.memory.WritableMemory dstMem)
maxDistinctItems - The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb - A desired false positive probability per itemdstMem - A WritableMemory to hold the initialized filterpublic static BloomFilter initializeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, long seed, org.apache.datasketches.memory.WritableMemory dstMem)
maxDistinctItems - The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb - A desired false positive probability per itemseed - A base hash seeddstMem - A WritableMemory to hold the initialized filterpublic static BloomFilter initializeBySize(long numBits, int numHashes, org.apache.datasketches.memory.WritableMemory dstMem)
numBits - The size of the BloomFilter, in bitsnumHashes - The number of hash functions to apply to itemsdstMem - A WritableMemory to hold the initialized filterpublic static BloomFilter initializeBySize(long numBits, int numHashes, long seed, org.apache.datasketches.memory.WritableMemory dstMem)
numBits - The size of the BloomFilter, in bitsnumHashes - The number of hash functions to apply to itemsseed - A base hash seeddstMem - A WritableMemory to hold the initialized filterCopyright © 2015–2024 The Apache Software Foundation. All rights reserved.