Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WWW used in research URL, or not to WWW
Long time user, infrequent poster.... thanks for taking my question... When I go to gather a series of data elements on a company's URL, the data changes (sometime dramatically) depending on whether the 'www.' is added to the URL & it seems related more to Page data than Domain. My question is about which data I should be using to assess the real strength of the site / page? Is there a 'best practice' question here, a personal preference or is there an actual difference in the performance of the www vs the non-www version? aquGYdz
Moz Pro | | SWGroves0 -
5XX (Server Error) on all urls
Hi I created a couple of new campaigns a few days back and waited for the initial crawl to be completed. I have just checked and both are reporting 5XX (Server Error) on all the pages it tried to look at (one site I have 110 of these and the other it only crawled the homepage). This is very odd, I have checked both sites on my local pc, alternative pc and via my windows vps browser which is located in the US (I am in UK) and it all works fine. Any idea what could be the cause of this failure to crawl? I have pasted a few examples from the report | 500 : TimeoutError http://everythingforthegirl.co.uk/index.php/accessories.html 500 1 0 500 : Error http://everythingforthegirl.co.uk/index.php/accessories/bags.html 500 1 0 500 : Error http://everythingforthegirl.co.uk/index.php/accessories/gloves.html 500 1 0 500 : Error http://everythingforthegirl.co.uk/index.php/accessories/purses.html 500 1 0 500 : TimeoutError http://everythingforthegirl.co.uk/index.php/accessories/sunglasses.html | 500 | 1 | 0 | Am extra puzzled why the messages say time out. The server dedicated is 8 core with 32 gb of ram, the pages ping for me in about 1.2 seconds. What is the rogerbot crawler timeout? Many thanks Carl
Moz Pro | | GrumpyCarl0 -
How do i get the crawler going again?
The initial crawl only hit one page. Set up another campaign for another site and it crawled 260 pages. How can I get the crawler started up again or do I really have to wait a week ?
Moz Pro | | martJ0 -
How do I retrieve crawl and ranking data about a site from the past?
Hey. One of my main clients has asked to see the crawl data and rankings data for the past eight months. He wants to have tangible evidence of the effects of Penguin. I would like that info too. Is it possible to retrieve that information on a weekly crawl and ranking basis through SEO Moz and if so, how do you do it? I simply want to show a graph, timeline and brief explanation across several main keywords... Help me as you guys always do - You rock Best Ben
Moz Pro | | creativeguy0 -
Are the CSV downloads malformatted, when a comma appears in a URL?
Howdy folks, we've been a PRO member for about 24 hours now and I have to say we're loving it! One problem I am having with however is a CSV exported from our crawl diagnostics summary that I've downloaded. The CSV contains all the data fine, however I am having problems with it when a URL contains a comma. I am making a little tool to work with the CSVs we download and I can't parse it properly because there sometimes URLs contain commas and aren't quoted the same as other fields, such as meta_description_tag, are. Is there something simple I'm missing or is it something that can be fixed? Looking forward to learn more about the various tools. Thanks for the help.
Moz Pro | | Safelincs0 -
How long does it take for a campaign website crawl to be completed?
Our campaign website crawl has been 'crawling' now for 5 days. Is this a normal phenomenon or is something hanging up?
Moz Pro | | Discountvc0 -
Only 1 page has been crawled. Why?
I set a new profile up a fortnight ago. Last week seomoz crawled the entire site (10k pages), and this week has only crawled 1 page. Nothing's changed on the site that I'm aware of, so what's happened?
Moz Pro | | tompollard0