The Vector Model demonstrates how documents are converted into vectors and how Cosine Measure and Euclidean Distance is calculated
| In this model each document is treated as a vector |
| First the frequency of each term is found and a preliminary document vector is formed |
| Then the document vectors are normalized using the below formula |
| First co-ordinate of the vector is = ( term 1 frequency ) / sqrt [ ( term 1 freq )2 + ( term 2 freq )2 +......+ ( term n freq )2 ] |
| Similarly all the other co-ordinates of the document vector and query ( Note: The query is also treated as a document ) is calulated |
| The Euclidean Distance Formula is used then to calulate the distance between the query and the document. |
| E.D(D,Q) = sqrt[ sigma( tk - qk ) 2 ] |
| The Cosine Measure Formula is used to calulate the cosine measure between the query and the document |
| C.M(D,Q)= sigma(tkqk) / [ sqrt( sigma(tk)2 ) * sqrt( sigma(qk)2 ) ] |
| The results are then ranked. For the Euclidean Distance (E.D) ranking is done from lowest distance to highest distance, i.e. lowest E.D are placed first |
| For the Cosine Measure (C.M) ranking is done from highest value to the lowest value, i.e. highest C.M are placed first. |
| The reason for this : If the angle between the vectors is small they are said to be near each other and a small angle means a high cosine value (For Ex: cos 0o = 1) |