Of course, there will be numerous spelling mistakes (and in fact data on common spelling mistakes is useful for both spelling correction, understanding the way people type, and the way people think), but it should be possible to filter these out based on similarity rules.
What would you want to do this? To answer the question "what is the most common undefined word?", but also to set about adding definitions for those words to relevant dictionaries to increase the extent of structured human knowledge.
No comments:
Post a Comment