Rebuilding an HNSW index is among the most resource-intensive elements of utilizing HNSW in manufacturing workloads. In contrast to conventional databases, the place information deletions will be dealt with by merely deleting a row in a desk, utilizing HNSW in a vector database usually requires an entire rebuild to take care of optimum efficiency and accuracy.
Why is Rebuilding Vital?
Due to its layered graph construction, HNSW will not be inherently designed for dynamic datasets that change steadily. Including new information or deleting current information is crucial for sustaining up to date information, particularly to be used circumstances like RAG, which goals to enhance search relevence.
Most databases work on an idea known as “laborious” and “comfortable” deletes. Arduous deletes completely take away information, whereas comfortable deletes flag information as ‘to-be-deleted’ and take away it later. The difficulty with comfortable deletes is that the to-be-deleted information nonetheless makes use of important reminiscence till it’s completely eliminated. That is notably problematic in vector databases that use HNSW, the place reminiscence consumption is already a major subject.
HNSW creates a graph the place nodes (vectors) are linked primarily based on their proximity within the vector house, and traversing on an HNSW graph is finished like a skip-list. In an effort to help that, the layers of the graph are designed in order that some layers have only a few nodes. When vectors are deleted, particularly these on layers which have only a few nodes that function vital connectors within the graph, the entire HNSW construction can turn out to be fragmented. This fragmentation could result in nodes (or layers) which might be disconnected from the primary graph, which require rebuilding of the complete graph, or on the very least will end in a degradation within the effectivity of searches.
HNSW then makes use of a soft-delete method, which marks vectors for deletion however doesn’t instantly take away them. This method lowers the expense of frequent full rebuilds, though periodic reconstruction continues to be wanted to take care of the graph’s optimum state.