all InfoSec news
Differentially Private One Permutation Hashing and Bin-wise Consistent Weighted Sampling. (arXiv:2306.07674v1 [stat.ML])
cs.CR updates on arXiv.org arxiv.org
Minwise hashing (MinHash) is a standard algorithm widely used in the
industry, for large-scale search and learning applications with the binary
(0/1) Jaccard similarity. One common use of MinHash is for processing massive
n-gram text representations so that practitioners do not have to materialize
the original data (which would be prohibitive). Another popular use of MinHash
is for building hash tables to enable sub-linear time approximate near neighbor
(ANN) search. MinHash has also been used as a tool for building …
algorithm applications binary data hashing industry large materialize private scale search similarity standard text