Starting from the S log to find and solve several ideas included

spider crawling over URL


There is notThis

, the excessive grab

within the whole chain structure to

to aid the script, to find your website all URL, then find out the spider crawling over URL, then find out the contrast, has never been a spider crawling over the URL, and then analyze the reasons, is the reason that URL is not too deep or URL directory links, too many parameters to determine the correct? Why, after continued observation included.

, the second is duplicate contentAfter

You look at the overall

access speed is affected by many aspects, from the server to the back to the front to look carefully, whether there is room for optimization. In the premise of ensuring the page effect, reduce the overall volume of HTML code. JS and CSS individually, HTML alone, if considered carefully, static URL is a must, because of the dynamic URL long will also affect the transmission speed.

URL Fifth, how Included Third, access speed

web site in the chain structure, you click test, see page to page for a few clicks, if you click repeatedly to reach some pages, then the crawler from the home page to page will need more time. This nature is a waste of time, so adjust the chain structure, the more content through the chain better let spiders crawl.

website has been a big problem, so how to solve the problems included the existence of the site, we will need to find the reason from the source, that is the IIS log. The IIS log is a record of the search engines crawl the site, it can clearly see the total time of spider web crawling, single page time, crawling depth, have repeatedly crawl, then we need to see an antidote against the disease, so as to solve the problems of the website fundamentally. Well, here from the following aspects analysis.

first look at whether there is excessive capture problem, this is very simple, the IIS log DW open, and then copy a URL search all you can, or use some advanced IIS log analysis tools also can see directly. If there is a lot of URL spiders visited many times, then it is probably because the home or home distance clicks closer to the page, the general adjustment is needed to reduce the number of links URL. The URL will grab excessive waste of time the spider crawling.

in the first step, but also found a problem is to repeat the content, if some URL spiders crawl many times, then it is possible that URL is different in the same content, such as static and dynamic, such as some sort of B2C page, these functional page offers are not too much difference however, URL may be different. To use the robots shield.

