Posts

Showing posts from April, 2020

Chessboard distance or Minkowski distance

Image
Consider two objects and describe by 10 features and write a c-program to compute Lr distance by the varying the “r-value”. The C-program automatically reads the 10 random number and the value of r going to increment automatically up to 50. Our infinity value is set to 50, can be changed to the desired value. the r-value is 1 then it is called the city-block distance, if the value of r is 2 then it is called Euclidean distance and if the value of r is infinity then it is called as a chessboard or Lr distance, supremum distance (also referred to as L max , L ∞ norm and as the Chebyshev distance ) is a generalization of the Minkowski distance for L → ∞. The C-program automatically reads the 10 random number and the value of r going to increment automatically up to 50. Our infinity value is set to 50, can be changed to the desired value. The c-program is as follows: #include<stdio.h> #include<stdlib.h> #include<math.h> #include<floa...

Feature evaluation: Unsupervised model and supervised model

Image
Unsupervised model Here, Variance can be used as evaluation criteria. The maximum variance ranked first. Descending order sorting of variances. Fig: Unsupervised feature selection model. Rank the features based on variance and chose top "d" features. This is called unsupervised feature selection or elimination model.  Supervised model:  Fig: Supervised feature selection model. Equation: Mean. Fig: Mean of individual classes. Here, Fig: ICV < BCV. The minimum variance ranked first. Ascending order sorting of variance. Compute: Intra class variance (ICV)/within class variance: Inter class variance/between class variance(BCV): Ration from BCV to ICV is high. Fig: Ratio from BCV to ICV. Chose the one feature with the highest rank as first.  This is called as Ward's or Fisher's criterion. This is only for supervised learning. This is used in the Fisher's Linear Discriminant Analysis technique ...

Feature evaluation criteria

Feature sub-setting is through selection or elimination. We look approximation algorithm: Feature evaluation criteria. Feature are ranked. Select top 'd' features. 1 and 2 could be independent of learning algorithm (unsupervised). Example: Filtering algorithm (filters)/model. Ex: Big data, independent of class label. Then it is called as unsupervised feature selection (classification). 1 and 2 could be dependent of learning algorithm (supervised). Example: Wrapper algorithm (wrappers)/model. Ex: Small data, class label/class information dependent. Then it is called as supervised feature selection (clustering).

Techniques of dimensionality reduction

Image
Techniques of dimensionality reduction Dimensionality reduction:- Reducing the number of features keeping in mind the ability to discriminating each data object from the rest of the objects. Given are "m" number of features (m relatively large) discriminating each "n" objects, the task is to re-describing "n" objects with d<<m (d is relatively smaller than m) number of features which may be some of those "m" features or may be new features computed using "m" features, such that desired (target) or task accomplished (classification, clustering) without loss of data generality. Finding out d<<m choosing some d out of m features (also called sub-setting or selection). computed d from m (also called feature transformation). Fig: Dimensionality reduction 1. Feature sub-setting:- Thanks for reading. Subscribe and follow to get notification of my blog's update...

Cluster validation

Image
What is cluster validation? A process of estimating the goodness of cluster formed. That is, to ensure how accurate the obtained results of clustering. To understand the cluster validation, consider the below figure 1 example confusion matrix obtained by the process of classification (unsupervised learning) or clustering (supervised learning). Figure 1: Example of the confusion matrix.  Where, The expected result of cluster 1: C E 1 . The expected result of cluster 2: C E 2 . The expected result of cluster 3: C E 3 . The expected result of cluster 4: C E 4 . Obtained result of cluster 1: C O 1 . Obtained result of cluster 2: C O 2 . Obtained result of cluster 3: C O 3 . Obtained result of cluster 4: C O 4 . Row: Indicates how many samples belong to a particular cluster? Column: Indicates how many samples are placed in the cluster from which cluster? Example: Arrangement of library books. Calculating recall (recall ability) concerning: Cluster C ...

Difference between sequential allocation and linked allocation

Sequential allocation Linked allocation Static in nature. Dynamic in nature. Required to estimate the capacity of the structure. Not required. All memory locations are consecutive in nature. Therefore computed addressing is possible. Need not be. It works based on pointer addressing. Two logically adjacent elements are also physically adjacent. Not necessarily. Therefore each data element should hold the address of its next logically adjacent element. This can be a bottleneck problem. Support direct accessing or addressing of data. Sequential addressing or accessing of data. In a strict sense, deletion, and insertion operations are prohibited. Insertion and deletion operation is possible and is straight forward. Splitting and merging operation is also prohibited. However, these can be realized by data movement with extra memory allocation. This can be sometimes a bottleneck problem. Possible and straight forw...