VectorModel

The Vector Model demonstrates how documents are converted into vectors and how Cosine Measure and Euclidean Distance is calculated

In this model each document is treated as a vector
First the frequency of each term is found and a preliminary document vector is formed
Then the document vectors are normalized using the below formula
First co-ordinate of the vector is = ( term 1 frequency ) / sqrt [ ( term 1 freq )2 + ( term 2 freq )2 +......+ ( term n freq )2 ]
Similarly all the other co-ordinates of the document vector and query ( Note: The query is also treated as a document ) is calulated
The Euclidean Distance Formula is used then to calulate the distance between the query and the document.
E.D(D,Q) = sqrt[ sigma( tk - qk ) 2 ]
The Cosine Measure Formula is used to calulate the cosine measure between the query and the document
C.M(D,Q)= sigma(tkqk) / [ sqrt( sigma(tk)2 ) * sqrt( sigma(qk)2 ) ]
The results are then ranked. For the Euclidean Distance (E.D) ranking is done from lowest distance to highest distance, i.e. lowest E.D are placed first
For the Cosine Measure (C.M) ranking is done from highest value to the lowest value, i.e. highest C.M are placed first.
The reason for this : If the angle between the vectors is small they are said to be near each other and a small angle means a high cosine value (For Ex: cos 0o = 1)