What is K-D Tree?
A k-d tree, or k-dimensional tree, is a data structure used in computer science for organizing some number of points in a space with k dimensions. It is a binary search tree with other constraints imposed on it. K-d trees are very useful for range and nearest neighbor searches. It gives you a way to structure multidimensional data that makes range search efficiently. See this quick video to get an idea:
If you have a point A that you want to find the closest points in k dimensional space, you may need to loop thru all the points in the space and calculate the distance from A and find the shortest distance one. It is O(n) in complexity. If you structure your data using K-D tree, it will turn this into log(n) in complexity. Not only you can use it to find the closest point, you can also use the data structure to do the range search efficiently.
How to implement one?
I like this video series that shows you how to use python to find closest points in steps.
- K-d Tree in python #1 – NNS Problem and Parsing SVG file
- K-d Tree in Python #2 — Build the Tree – A naive one that could be wrong sometimes and the video shows you an example of it.
- K-d Tree in Python #3 — Finale – A correct way to implement K-D tree that handles the case that missed by the naive implementation.
How it is used in Industry?
Lucene 6 brought a new Points datastructure for numeric and geo-point fields called Block K-D trees, which has revolutionised how numeric values are indexed and searched. According to their benchmark, Points are 36% faster at query time, 71% faster at index time and used 66% less disk and 85% less memory.