Task 111: Efficient kNN search in 2D

Your task is to implement KD-tree or R-tree data structure for efficient lookup in huge set of 2D points. Think hard about optimal tree building (set of points is static, you will be able to balance a tree easily) and about the kNN search algorithm itself. Each kNN search instance (query) is defined by: center point and required result size (K). Query results will be returned in form of point handles, points have to be sorted from the closest one to the farthest one.

Screenshot

Your algorithms should count various statistics: number of computed point-point distances, number of point-box distances, etc. (see below). These numbers will be used in a result table comparing all the handed-in solutions.

It is recommended to test the algorithms on very large input sets (107 points or more). Your methods should be universal and robust!

WinForms-based application is ready to test efficiency and correctness of your kd-tree implementation by generating arbitrary number of short lines (deterministic but uneven distribution based on the Lorentz attractor, Paul Bourke page). Your tree-building code is invoked and after this building phase a required set of queries is performed. Results are optionally presented in graphical form, a simple hashcode-based fingerprint is computed for checking correctness of the results. Elapsed time and counter-based statistics are presented as well.

Statistics

Statistics which will be used to assess your solution:

  • number of atomic geometric tests "point-point"
  • number of bounding-box tests "point-box"
  • number of "heap-operations"; only operations with the O(log N) complexity are relevant
  • time spent in tree initialization (building phase) in seconds
  • total time of all kNN queries (in seconds)
You are obliged to provide the first three statistics (see the GetStatistics() function)!

Base

You have to use the 111tree project from the grcis repository. A WinForms-based application is ready, user can modify size of input set, number of queries, random-seed, etc. Your implementation will be in the Tree class, which you can find in the Tree.cs source file. You are free to add any code/data stuctures to this specific file.

You have to implement this simple API:
  • BuildTree() - builds the new tree from the scratch. Set of 2D points is passed in the points argument
  • FindNearest() - one kNN search. Result set of size K (or smaller if there were not enough points in the set) is returned in order from nearest to farthest one
  • GetStatistics() - retrieves accumulated statistics (point-point distance, point-box distance, heap operations). Internal counters should be reset by this call

Point-box distance: use provided function PointBoxDistanceSquared().
Beware of duplicities: no duplicities in the result list will be tolerated!

Result examples

Data size Random seed Query size K HASH Time [s]
1000000 7 100 24 EC3415849C3E9C25 4.54
1000000 12 200 24 8744A760399D6022 9.05
5000000 144 200 24 B0A24B22FE01F321 45.14
20000000 90125 200 24 5191B4516981094 181.66

Deadline

Due to: 2. 12. 2017

Credits

Base: 25 points (functional code, 100% correct results!), bonus points for very good efficiency

Project

Visual Studio project: 111tree.

Source file

Modify and submit the file: Tree.cs

Results

Result table.


Copyright (C) 2005-2017 J.Pelikán, last change: 2019-05-09 17:52:59 +0200 (Thu, 09 May 2019)