Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Weird client errors . . .
SeoMoz is reporting a number of weird client errors. The 404 links all look like the following: http://www.bluelinkerp.com/http%3A/www.bluelinkerp.com/corporate/cases/Nella.asp What might be causing these weird links to be picked up? I couldn't find any way within the SEOmoz interface to track down the source of these links . . .
Moz Pro | | BlueLinkERP0 -
Was there another issue crawling rankings on 9th October?
Again this month I don't appear to have rankings logged for 9th October. Was there an issue earlier in the month?
Moz Pro | | NathanP0 -
Crawl Diagnostics - Canonical Question
On one of my sites I have 61 notices for Rel Canonical. Is it bad to have these or is this just something that's informative?
Moz Pro | | kadesmith0 -
Finding the source of duplicate content URL's
We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible) However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz. My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?
Moz Pro | | DocdataCommerce0 -
Confusion about how SEOMOZ crawler works...
So according to my SEOMOZ dashboard, I'm ranking between #3-4 for one of my keywords. My keyword is 'Boston Wedding Photographer'. My site is http://www.symbolphoto.com I show up in google places, true. But i was wanting to rank organically. Am i right in the assumption that Google Places and Google Organic are not the same thing? SEOMOZ claims 3,4th but not organically(Assuming they aren't the same thing) I get pretty good traffic right now being in Places, but i can't help but feel that organically ranking would bring more traffic. Any suggestions or advice is greatly appreciated. TIA! -Brendan
Moz Pro | | symbolphoto0 -
404 errors in SEOMoz crawl tool
I currently have several 404 errors in the latest crawls from SEOMoz. Here is an example of the error. http://dealerplatform.com/blog/2011/10/23/videos-for-auto-dealers/www.dealerplatform.com/ In all cases the error is a result of www.dealerplatform.com being added to the real url. Anyone seen this before? The site is a wordpress mutlisite. I don't see where this incorrect link is showing up anywhere on the website. Any advice would be helpful. Thanks
Moz Pro | | Chris_Gregory0 -
Crawl diagnostic Notices for rel Canonical increased
Hello, We just signed up for SEO Moz, and are reviewing the results of our second web crawl. Our Errors and Warnings summary have been reduced, but our Notices for Rel Canonical have skyrocketed from 300 to over 5,500. We are using a WP with the Headway theme and our pages already have the rel=canonical along wiht rel=author. Any ideas why this number would go up so much in one week? Thank you, Michael
Moz Pro | | MKaloud0 -
How can I clean up my crawl report from duplicate records?
I am viewing my Crawl Diagnostics Report. My report is filled with data which really shouldn't be there. For example I have a page: http://www.terapvp.com/forums/Ghost/ This is a main forum page. It contains a list of many threads. The list can be sorted on many values. The page is canonicalized, and has been since it was created. My crawl report shows this page listed 15 times. http://www.terapvp.com/forums/Ghost/?direction=asc http://www.terapvp.com/forums/Ghost/?direction=desc http://www.terapvp.com/forums/Ghost/?order=post_date and so forth. Each of those pages uses the same canonicalization reference shared above. I have three questions: Why is this data appearing in my crawl report? These pages are properly canonicalized. If these pages are supposed to appear in the report for some reason, how can I remove them? My desire is to focus on any pages which may have an issue which needs to be addressed. This site has about 50 forum pages and when you add an extra 15 pages per forum, it becomes a lot harder to locate actionable data. To make matters worse, these forum indexes often have many pages. So if I have a "Corvette" forum there that is 10 pages long, then there will be 150 extra pages just for that particular forum in my crawl report. Is there anything I am missing? To the best of my knowledge everything is set up according to the best SEO practices. If there is any other opinions, I would like to hear them.
Moz Pro | | RyanKent0