Text Analysis
relevanz = sum ( g i * G i) = 0 .. 1
sum ( g i ) = 1
- G 1 = number of terms in text / total number of terms
- G 2 = 1 - sum ( sum f ( minTermDistances )) / C
"the closer the terms are, the better"
- G 3 = f ( singleTermDistribution )
"the more the terms are spread over the text, the better"
- G 4 = number of different words / total number of words in text
spam recognition
Diploma Stefan Heineke, RRZN/RVS Uni Hannover
start
(C) RRZN, W.Sander-Beuermann