20 January 2009

Semantic Keywords: Latent Semantic Indexing and SEO

Latent Semantic Index (LSI), in a nutshell, seeks to index documents (e.g. webpages) based on semantically related concepts and keywords, not just syntactically related keywords.

Let me illustrate with a couple simple examples. We all have seen how Google recognizes that when you search for "car", you might also want webpages that contain "cars". This is a syntactical similarity - plural / singular, different tenses of a verb, etc. Search engines have long used syntactical variations on searches. Makes sense, seems pretty obvious. Latent Semantic Indexing, however, takes this to the next level. LSI basically learns from the web what keywords are most commonly related to "cars", for example. Then these keywords, which may or may not be synonyms for cars, are used in ranking webpages. If, for example, the top related keywords to "car" are "used cars", "new cars", "car rentals", and "car reviews", then perhaps when I search for "car", webpages with these additional keywords might rank higher than a page that mentions "car" but does not contain any of the commonly related keywords.

Why LSI and what does it mean for Search Engine Optimization? Both questions are answered the same way. Search engines seek to optimize the user experience by providing the most relevant, authoritative content for any given search. Search Engine Optimization (SEO) is our effort to organize our information in a way that search engines can easily understand so that it ranks high for the keywords we are targeting. Both work together to ultimately bring people to our websites who are looking for what we have to offer. How to do this is both the art and science (and yes, a little of both) of SEO.

So why LSI? Latent Semantic Indexing gives the search engines more information about the content of a webpage and helps rank the pages better. I might create a webpage about cars, but only use the keyword "car" a couple times, but when I am describing either the car parts, the economy around the car industry, or issues related to car rentals, I am using language specific to those subjects. Basically, LSI uses natural language processing techniques to learn this language. Now, using LSI, a search engine can take webpages that contains the keyword "car" and differentiate which ones are more relevant based on what other keywords are used.

Example:
I drove to school in my car, took my test, then came home early.
I can't decide if I want to buy a new or used car. Since the accident, I have been driving a rental.

Both of the above sentences mention the word "car" one time, but the former is not about a car at all, whereas the latter is entirely about a car. In fact, typical Information Retrieval techniques would rank the first sentence higher because it is shorter (meaning the word "car" represents a larger percentage of the total content). Using LSI techniques, a search engine might learn "new", "used", "rental", and "accident" are related to "car" and therefore rank the webpage containing the second sentence higher than that of the first when someone searches for "car".

LSI enables search engines to better rank relevant webpages to searches. There is plenty of evidence to suggest that LSI techniques are being used more and more by Google and other top search engines. This does have an effect on SEO techniques.

At Semantic Discovery, we combine our web crawling and natural language processing expertise into building SEO tools that really make a difference. We have developed a keyword suggestion crawler tool that employs Latent Semantic Indexing techniques. Our keyword suggestion tool doesn't just provide syntactic variations of keywords. There are plenty of free keyword tools out there that do that. Instead, what we focus on is bringing back the top related concepts to the keywords you are targeting. If the search engines are using LSI techniques to rank pages, then why not use LSI techniques to learn what related keywords are best to use?

Try the SD Keyword Suggestion Crawler for free!

No comments:

Post a Comment