Crawlers crawl weird long urls
-
I did a crawl start for the first time and i get many errors, but the weird fact is that the crawler tracks duplicate long, not existing urls.
For example (to be clear):
there is a page: www.website.com/dogs/dog.html
but then it is continuing crawling:
www.website.com/dogs/dog.html
www.website.com/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dog.html
www.website.com/dogs/dogs/dogs/dogs/dogs/dog.htmlwhat can I do about this? Screaming Frog gave me the same issue, so I know it's something with my website
-
Answer from Screaming Frog!
The reason the SEO spider is crawling these URLs, is due to incorrect relative linking on the site from the login URL.
It's actually when the spider crawls the login page, http://www.website.com/login?returnurl=%2F which then leads to this URL http://www.website.com/Home/ctl/SendPassword?returnurl=http:/www.website.com/ and then this /home/ sub directory URL http://www.website.com/Home/ctl/page/dogs.aspx which links to http://www.website.com/Home/ctl/page/page/dogs.aspx and so on and so forth. This is the path to the incorrect relative linking (attached for you).To stop this, you can correct the incorrect relative linking, or easier, simply exclude the login page.
-
Wow, Big mistakes are made one Home
maybe because of the .aspx. extension? alle pages have seo-friendly urls
Thanks Wesley and Paddy Displays
-
I see a link to http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/HeutinkICT.aspx from http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx.
It's the bottom left block which causes this link. This way you will get a big nesting effect.
-
OK found one problem
on this page
http://www.odin-groep.nl/Home/ctl/OverOdin/ReindersICT.aspx
you have a link to
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/LesscherIT.aspx
which i think should be
-
ok I did a quick screaming fog and I think I have an idea, you just have to follow the breadcrumbs
You said in you example "In Links 9", you need to find out what those pages are and follow it back to the point of origin As I think its just one bad link that cause this nested link effect.
eg
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/OverOdin/HeutinkICT.aspx
is being linked from
http://www.odin-groep.nl/Home/ctl/OverOdin/OverOdin/OverOdin/StationtoStation.aspx (as well as others)
You just have to follow that trail till you find the source of the problem
-
every link, except the hompage itself
-
I can't see any source:
The pages are like:
| URL | www.website.com/page/ |
| Status Code | 200 |
| Status | OK |
| Type | text/html; charset=utf-8 |
| Size | 55811 |
| Title | |
| Level | 10 |
| In Links | 9 |
| Out Links | 38 | -
Which URL(s) is/are causing problems?
-
please be free to check: http://tinyurl.com/lox7le9
-
You don't necessarily have to remove the link. As long as you can verify that it directs to the right page.
But curious to see what caused the problem
-
I think Screaming Frog will tell you the page it found the weird url, then you can check the source, and find out whats producing that link.
-
That is a good one! It's true that I have the same linking to the page itself. I will remove all that kind of links first and crawl again. I'll keep you in touch!
-
Are you somehow linking to www.website.com/dogs/dog.html from the page itself? There could be something wrong with that link.
I made a small mistake not so long ago with a redirection plugin. I told it to go to domain.com. This plugin was looking at the base + what i told it to. So it went to: domain.com/domain.com. Perhaps you made a similar mistake.Maybe you can send me the URL and i can take a look at it?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do I exclude my blog subfolder from being crawled with my main domain (www.) folder?
I am trying to setup two separate campaigns for my blog and for my main site. While it is easy enough to do through the wizard the results I am getting for my main site still include pages that are in my blog sub folder. Please Advise!
Moz Pro | | sameufemia0 -
Unable to crawl pages
Hi, I am trying to set up a campaign for our website - www.salvationarmy.org.au however, I can't seem to get a scan of more than three pages. I have tried the following: www.salvationarmy.org.au (only 2 pages) www.salvationarmy.org.au/home (only 1 page) salvationarmy.org.au (only 3 pages) There is a geo IP redirect on www.salvationarmy.org.au but the second domain listed above should resolve the full site. I'm a newbie to SEOmoz so any help would be appreciated! Thanks, Mel
Moz Pro | | KingPings0 -
How to get past PA and DA value for a specific URL ?
Hi everyone, I was wondering if there is a way to get the past PA and DA value for a specific URL ? I did run a small SEO campaign targeting a couple of deep pages over a month on my site and I would like to measure the efficiency of this campaign but I forgot to write down what was the PA (I know more aloess the DA) of those pages before the starting the campaign. Is their a way to retrieve the historical data of PA/DA ? thanks
Moz Pro | | Gus_Martin0 -
Can I specify a url for a keyword in the rank checker tool?
Hello! I'm new to seomoz and excited to learn the system. I created a campaign and added keywords but I'm not clear how the seomoz campaign rankings tool works. As an example, one of my keywords 'cigar cutters' is reporting at position 20 for url http://www.cheaphumidors.com/c_guillotine-cutters.html. However, I think it would be better target to focus that keyword on http://www.cheaphumidors.com/c_cutters.html. as a search for 'cigar cutters' could encompass either a guillotine cutter, punch cutter or cigar scissors. Is there any way to assign http://www.cheaphumidors.com/c_cutters.html to the term 'cigar cutters' in the campaign ranking report? Brian
Moz Pro | | davesabot0 -
Only one page has been crawled
I am running a campaing for three weeks now and first two crawls was ok but the last one is showing only one page crawled. the subdomain I am tracking is: www.cubaenmiami.com I have everything correct in my site. Regards Alex
Moz Pro | | esencia0 -
Can you change crawl day of week?
Can I somehow sync the day of the week for each of my campaigns' crawls, so that all campaigns are updated on the same day?
Moz Pro | | ATShock0 -
SEOmoz crawl error questions
I just got my first seomoz crawl report and was shocked at all the errors it generated. I looked into it and saw 7200 crawl errors. Most of them are duplicate page titles and duplicate page content. I clicked into the report and found that 97% of the errors were going off of one page It has ttp://legendzelda.net/forums/index.php/members/page__sort_key__joined__sort_order__asc__max_results__20 http://legendzelda.net/forums/index.php/members/page__sort_key__joined__sort_order__asc__max_results__20__quickjump__A__name_box__begins__name__A__quickjump__E etc Has 20 pages of slight variations of this link. It is all my members list or a search of my members list so it is not really duplicate content or anything. How can I get these errors to go away and make search my site is not taking a hit? The forum software I use is IPB.
Moz Pro | | NoahGlaser780 -
Dismiss crawl diagnostics error
Hello everyone, Is there a way to dismiss some errors in the Crawl Diagnostics tool so they don't appear again? It happens so that some of the errors are never going to be fixed because of their nature. For example, 'Title too long' errors that point to some of the threads on my forum - it doesn't make sense to change the title of a thread posted by user just for the sake of the error disappearing from the 'Crawl Diagnostics' tool. 🙂 Otherwise the CD interface gets a little bit cluttered with errors which I will never fix anyway. I wonder how others deal with this problem. Thanks.
Moz Pro | | MaratM0