As we mentioned in Chapter 1, search engines are sorting machines. They have become extremely useful to explore the ever growing world wide web. They discover, read and organize all of the content on the internet in order to offer the most relevant results to the questions searchers are asking.
The first set to showing up in the search results is to be visible. If your site can’t be found, there is no way you will ever show up in the search engine results page (SERPs).
Not all search engines are equal
Google by far has the largest market share. If you include Google images, Google Maps, and YouTube, more then 90% of web searches happen on google. That is more then 20x that of Bing and Yahoo combined
How search engines work.
A search engine like Google search, has three primary functions:
- Crawl: The process of combing the internet discovering URLs and looking over the code and content on every web page
- Index: Organizing and sorting the content found during the crawling process. Once a page is in the index it can then move around the the rankings of keywords
- Rank:Is the second part of the sorting process where pieces of content are put in order of relevance to a specific keyword.
Search engine crawling
Crawling is the discovery process in which search engines send out a team of bots, also known as crawlers, to find new and updated content. The content analyzed can vary from web pages, to an image, a video, .PDF and more.
The Googlebot starts out by fetching a few web pages, and then follows the links on those web pages to new web pages and so on and so on. The Google’s crawlers continue to find new content each time adding it to it index called Caffeine.
Search engine index and the Google Caffeine update
On June 8th 2010, Google rolled out their new web indexing system. This new system allowed Google to crawl and store data far more efficiently. Google says that they were not only able to increase their index but also provide far faster results (50% faster results).
So how does it work? Basically in their old indexing system, pages and content types were put into a category based on the perceived freshness requirements. Different crawlers were sent out, some looking for new urls others re-indexing updated pages all based on the classification of content.
If a site was in the fresh category it was crawled by different bots that would add the content to the index quickly, however most sites and their content would be reindexed every couple of weeks.
With the Caffeine update, Google gained the ability to crawl, collect data, and add it to their index in seconds. Further it was build with an understanding of the growth ahead and in how changing devices and media types can impact the resources needed. Caffeine wasn’t an algorithm update. Caffeine was a complete rebuild of their indexing system.
Search engine ranking
When someone performs a search , search engines run through their index for highly relevant content and then orders that content by relevance known as ranking.
It is possible to block search engine crawlers from part or all of your site. There are reasons to do this however if you want your content found by searchers, you have to first make sure its accessible to crawlers and is indexable.
One way to see if your webpages are index is to use the exact match advance search operator in Google Search. If you don’t know what I am talking about just type parenthesis around your domain name in the search bar. (“www.yoursite.com”) This will return all of the results Google has in its index for your domain. This isn’t exact but it will give you a rough idea.
For more accurate results, monitor and use the Index Coverage report in Google Search Console. You can sign up for free at Google Search Console. I highly recommend this tool! Not only can monitor your rankings on keywords you can submit sitemaps directly to Google to be indexed.
If your site is not showing up in the search ranking it could one of the following reasons.
- Site is brand new and just hasn’t been crawled yet.
- Your site isn’t linked to from any external websites.
- Your site’s navigation makes it hard for a bot to crawl.
- You have robot.txt code blocking search engines.
- Your site has been penalized by Google.