“Finding Kth nearest and farthest neighbor using CBDT”
Abstract—The distance transform is a basic operation in computer vision, pattern recognition, and robotics. In this paper, we consider the city block (L1) distance metric. An algorithm for computing the Kth nearest and farthest neighbors using city block distance is proposed in this paper. The time complexity and scalability of the algorithm are analyzed. The results indicate that the algorithm is scalable and efficient.
I. Introduction
Calculating distance between two points helps to find which point is nearest or farthest from the given or reference point. There are varieties of algorithm exits to find the distance between points. One such method is the city blocking method we can also use the breadth-first method and the depth-first method. Similarity searching is an important task when trying to find patterns in applications involving mining different types of data such as images, video, time series, text documents, DNA sequences, etc. Similarity searching often reduces to finding the k nearest neighbors to a query object. This process is facilitated by building an index on the data
This work was supported in part by the National Science Foundation under grants EIA-99-00268, EIA-99-01636, IIS-00-86162, EIA-00-91474, and CCF-0515241, Microsoft Research, and the University of Maryland General Research Board. Which is usually based on a hierarchical clustering? The idea is that the data objects are partitioned into clusters (termed nonobjects) which are aggregated to form other clusters, with the total aggregation being represented as a tree known as a search hierarchy. Numerous search hierarchies have been used for both vector data and non-vector data such as data lying in a metric. The methods that we describe in this paper are independent of the nature of the data. The most common strategy for finding the k nearest neighbors is the depth-first method which explores the elements of the search hierarchy in a depth-first manner. The k nearest neighbors found so far are kept track of in a set L with the aid of a variable Dk that indicates the distance, using a suitably defined distance function d, of the current kth-nearest object from the query object q. The depth-first method visits every element of the search hierarchy. Branch and bound variant of the depth-first method gives better performance by not visiting every nonobject, and its objects can be determined that it is impossible for the nonobject to contain any of the k nearest neighbors of q. For example, this is true if we know that for every nonobject element e of the search hierarchy, d(q; e) d(q; e0) for every object e0 in e and that the relation d(q; e) > Dk is satisfied1. This can indeed be achieved if we define d(q; e) as the minimum possible distance from q to any object e0 in nonobject e (referred to as MINDIST in contrast to MAXDIST, the maximum possible distance, which unlike MINDIST cannot be used for pruning).
Letting A(e) denote the set of nonobject immediate descendants ei of nonobject element e of the search hierarchy, using the above definition of distance for nonobject elements (i.e., MINDIST) makes it possible to obtain even better performance as a result of speeding up the convergence of Dk to its final value by processing elements of A(e) in increasing order of d(q; ei) (i.e., a MINDIST ordering). In this way, once an element ei in A(e) is found such that d(q; ei) > Dk, then d(q; ej ) > Dk for all remaining elements ej of A(e). This means that none of these remaining nonobject descendants of e need to be processed further, and the algorithm backtracks 1This stopping condition ensures that all objects at the distance of the kth- nearest neighbor are examined. Note that if the size of L is limited to k and if there are two or more objects at distance Dk, then some of them may not be reported in the set of q’s k nearest neighbors to the parent of e, or terminates if e is the root of the search hierarchy.
The drawback of the best-first method is that the priority queue may be rather large. In particular, the necessary amount of storage may be as large as the total number of nonobjects (and hence on the order of the number of objects) if the distance of each of the nonobjects from the query object q is approximately the same. In low dimensions, such an event is relatively rare as its occurrence requires two seemingly independent events — that is, that all objects lie in an approximate hypersphere centered at some point p and that the query object q be coincident with p. However, in high dimensions, where most of the data lie on the surface (e.g., [5]) and the curse of dimensionality [3], [6] comes into play, and in metric spaces with concentrated distance histograms, this situation is less rare. In contrast, the amount of storage required by the depth-first method is bounded. In particular, it is proportional to the sum of k and the maximum depth of the search hierarchy, where, in the worst case, all of the sibling nonobject elements must be retained for each partially explored nonobject element in the search hierarchy while executing the depth-first search.
In the city block method, the object can trace or travel only one step at a time and only in one direction that is up or down or right or left. Consider a black and white n * n binary image: i.e. a 2-dimensional array where ai j D 0 or 1, for i; j D 0;1;2;
The index i stands for the row and the index j for the column. Pixel (0,0) is the upper left point in the image. The city block distance transform (CBDT) is to find for each point (i, j) its city block distance from the set of all black pixels B D f.i; j/: ai j D 1g. In other words, we compute the array:
dij=min{(i-x)+(j-y)} for 0<=i, j<=n-1.
dij=max{(i-x)+(j-y)} for 0<=i, j<=n-1.
We are keeping one point p(x,y,z) as constant or the point for which we have to find the Kth nearest and farthest neighbor. Then concerning this point, we are calculating the distance between all other points. After sorting all the distance or for the unsorted list we are finding near and far object or neighbor by calling min and max function. Then we are displaying the min and max value as a kth nearest and farthest neighbor. The main advantage of city block method is easy to find and traverse or trace the next pixels or neighbors and also
II. Algorithm development
In this method initially, we are accepting a 3D matrix with points P(x,y,z) and up to n number of rows. The values for the matrix are going to fill using random function.
x
|
y
|
z
|
|
P1
|
10
|
45
|
12
|
P2
|
7
|
55
|
5
|
P3
|
99
|
14
|
3
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
Pn
|
55
|
74
|
25
|
Table 3D matrix with randomly filled data.
For the above-shown matrix, we are finding all nearest and farthest neighbors using or applying city block distance formula concerning point p(x,y,z).
i.e..,
Dist[k]=|X-Xi|+| Y-Yi|+|Z-Zi|.
Then we are obtaining some values corresponding to all the points in a matrix so that values are stored in another matrix From that matrix we are going to calculate Kth nearest neighbor and farthest neighbor by comparing the each and every value concerning any set of points p(x,y,z) and display the min and max neighbor. The algorithm is as shown below.
Algorithm City block method
Input n, Fx(i)=1,2,3...n , Fy(i)=1,2,3...n ,Fz(i)=1,2,3...n, X,Y,Z;
Output Dist[i],k=1,min,max;
{
For iß 1 to 3
{
For jß1 to 3
{
For kß 1 to 3
{
Mat[x,y,z]ß rand(xi,yi,zi);
}
}
}
For lß1to n
{
Dist[l]=|X-Fx(i)|+|Y-Fy(i)|+|Z-Fz(i)|
}
minßDist[i],locminß1;
For lß2 to n
{
If Dist[l]<min then {minßdist[l],locminßl}
else{0}
}
For lß2 to n
{
If Dist[l]>max then {maxßdist[l],locmaxßl}
else{0}
}
}
III. Algorithm and analysis of the algorithm
Best worst
minßdist[1]-----------------------à2 1 1
locminß1-------------------------à1 1 1
for kß2 to n-----------------------à3 n n
{
If Dist[k]<min----------------------à2 (n-1) (n-1)
{
then
{
minßDist[k]-------------------------}
locminßk----------------------------} 3 0 (n-1)
}
e lse
{
exit{0}
}
}
}
IV. Experimentation and profiling
A priori Analysis:
A Priori Analysis is analyzing the algorithm and calculating the computation time for the algorithm is termed as A Priori Analysis. After having made an analytical review of the algorithm, the calculated time complexity for this method is O(n-2).
Computing the time complexity:
tbestα2*1+1*1+3*n+2*(n-1)
tbestα5n+1=Ω(n).
Figure: The time complexity for the best-case.
A Posteriori Analysis:
A Posteriori Analysis, which is done after the execution of the algorithm in the Computer. This Analysis also called Profiling.
tworstα 2*1+1*1+3*n+2(n-1)+3(n-1)
tworstα 8n-2
Figure: The time complexity for the worst case.
V. Summary / conclusion
The time complexity to find the Kth nearest and farthest neighbor is tΩ α n and tO α n. The complexity of the algorithm remains to be constant for all inputs.
References
[1] Belur V. Dasarathy, ed. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. ISBN 0-8186-8930-7.
[2] Shakhnarovish, Darrell, and Indyk, ed. (2005). Nearest-Neighbor Methods in Learning and Vision. MIT Press. ISBN 0-262-19547-X.
[3] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. Journal of the ACM, 45(6):891–923, November 1998. Also, see Proceedings of the 5th Annual ACM-SIAM Symposium on Discrete.
[4] A. J. Broder. Strategies for efficient incremental nearest neighbor search. Pattern Recognition, 23(1–2):171–178, January 1990.