Saturday, 5 May 2012

Keyword density

I was thinking today about ranking web search results (or general text data search results) based on keyword density, ie the number of times the word appears divided by the total number of words in the document.

One mention of the word in a long document suggests that the document is not particularly relevant, but a single word document would get the highest score and is also not particularly useful.

What is the optimum keyword density in a relevant source?

 My guess is something like 1/50 or 1/100.

Another thought that comes to mind is about user control over search results ranking, for advanced users using searching in a professional or research capacity. It seems to me that there are so many good ways of ranking results, but that the advanced user would benefit from being able to control them. Perhaps through a series of slides for: PageRank; keyword density; inclusion in page title; inclusion in metadata; etc, that could be tuned with the results below dynamically changing.

