Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.
-
Hi there,
I just made a crawl of the website of one of my clients with the crawl tool from moz.
I have 2900 403 errors and there is only 140 pages on the website.
I will give an exemple of what the crawl error gives me.
|
http://www.mysite.com/en/www.mysite.com/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en
|
http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en
|
|
|
|
|
|
|
|
|
|
There are 2900 pages like this.
I have tried visiting the pages and they work, but they are only html pages without CSS.
Can you guys help me to see what the problems is. We have experienced huge drops in traffic since Septembre.
-
Thank you so much for your response!
Yes. Could you please email me at eliotostiguy@gmail.com? I will be able to give you the url via email
-
Almost right, but 'just about' wrong; the 403 error is only served once an URL 'is' accessed. The content may not be accessible (as it's forbidden) but the URL itself, still is. Whilst it's unlikely that these URLs would ever be indexed, there's still an infinite loop in the link architecture which could impact upon crawl allowance and site health metrics
I'd get it sorted out!
-
but 403 is a forbidden error so those pages wouldn't be getting accessed from google. Google can't access them which in this case is a good thing right.
-
This is almost assuredly a link-based architectural error. It will be something similar to this:
- You load a page on EN
- You click the EN flag or language icon
- Instead of just reloading the page you are already on (since you're already on EN) the link is coded wrong and adds another /EN/ layer to the URL
- Once the new URL loads, the problem can be repeated
- This creates infinity URLs on your site
- Bad for Google, and Moz's crawler
Bet you it's something like that. If you give me the exact URL I might even be able to find the flaw and detail it for you via email or something
-
Hi there,
Thanks so much for reaching out - Sam from Moz's Help Team here!
I'm just going to be reaching out to you directly from help@moz.com about this, after taking a look into your campaign and crawl. I'll be in touch soon!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website crawl error
Hi all, When I try to crawl a website, I got next error message: "java.lang.IllegalArgumentException: Illegal cookie name" For the moment, I found next explanation: The errors indicate that one of the web servers within the same cookie domain as the server is setting a cookie for your domain with the name "path", as well as another cookie with the name "domain" Does anyone has experience with this problem, knows what it means and knows how to solve it? Thanks in advance! Jens
Technical SEO | | WeAreDigital_BE0 -
Site Crawl -> Duplicate Page Content -> Same pages showing up with duplicates that are not
These, for example: | https://im.tapclicks.com/signup.php/?utm_campaign=july15&utm_medium=organic&utm_source=blog | 1 | 2 | 29 | 2 | 200 |
Technical SEO | | writezach
| https://im.tapclicks.com/signup.php?_ga=1.145821812.1573134750.1440742418 | 1 | 1 | 25 | 2 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=blog&utm_campaign=brightpod-article | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=marketplace&utm_campaign=homepage | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=blog&utm_campaign=first-3-must-watch-videos | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?_ga=1.159789566.2132270851.1418408142 | 1 | 5 | 31 | 2 | 200 |
| https://im.tapclicks.com/signup.php/?utm_source=vocus&utm_medium=PR&utm_campaign=52release | Any suggestions/directions for fixing or should I just disregard this "High Priority" moz issue? Thank you!0 -
Google has deindexed 40% of my site because it's having problems crawling it
Hi Last week i got my fifth email saying 'Google can't access your site'. The first one i got in early November. Since then my site has gone from almost 80k pages indexed to less than 45k pages and the number is lowering even though we post daily about 100 new articles (it's a online newspaper). The site i'm talking about is http://www.gazetaexpress.com/ We have to deal with DDoS attacks most of the time, so our server guy has implemented a firewall to protect the site from these attacks. We suspect that it's the firewall that is blocking google bots to crawl and index our site. But then things get more interesting, some parts of the site are being crawled regularly and some others not at all. If the firewall was to stop google bots from crawling the site, why some parts of the site are being crawled with no problems and others aren't? In the screenshot attached to this post you will see how Google Webmasters is reporting these errors. In this link, it says that if 'Error' status happens again you should contact Google Webmaster support because something is preventing Google to fetch the site. I used the Feedback form in Google Webmasters to report this error about two months ago but haven't heard from them. Did i use the wrong form to contact them, if yes how can i reach them and tell about my problem? If you need more details feel free to ask. I will appreciate any help. Thank you in advance C43svbv.png?1
Technical SEO | | Bajram.Kurtishaj1 -
Test site got indexed in Google - What's the best way of getting the pages removed from the SERP's?
Hi Mozzers, I'd like your feedback on the following: the test/development domain where our sitebuilder works on got indexed, despite all warnings and advice. The content on these pages is in active use by our new site. Thus to prevent duplicate content penalties we have put a noindex in our robots.txt. However off course the pages are currently visible in the SERP's. What's the best way of dealing with this? I did not find related questions although I think this is a mistake that is often made. Perhaps the answer will also be relevant for others beside me. Thank you in advance, greetings, Folko
Technical SEO | | Yarden_Uitvaartorganisatie0 -
Home page indexed but not ranking...interior pages with thin content outrank home page??
I have a Joomla site with a home page that I can't get to rank for anything beyond the company name @ Google - the site works fine @ Bing and Yahoo. The interior pages will rank all day long but the home page never shows up in the results. I have checked the page code out in every tool that I know about and have had no luck....by all account it should be good to go...any thoughts/comments/help would be greatly appreciated. The site is http://www.selectivedesigns.com Thanks! Greg
Technical SEO | | DougHosmer0 -
Googlebot take 5 times longer to crawl each page
Hello All From about mid September my GWMT has show that the average time to crawl a page on my site has shot up from an average of 130ms to an average of 700ms and peaks at 4000ms. I have checked my server error logs and found nothing there, I have checked with the hosting comapny and there are no issues with the server or other sites on the same server. Two weeks after this my ranking fell by about 950 places for most of my keywords etc.I am really just trying to eliminate this as a possible cause, of these ranking drops. Or was it the Pand/ EMD algo that has done it. Many Thanks Si
Technical SEO | | spes1230 -
Page crawling is only seeing a portion of the pages. Any Advice?
last couple of page crawls have returned 14 out of 35 pages. Is there any suggestions I can take.
Technical SEO | | cubetech0 -
Recently revamped site structure - now not even ranking for brand name, but lots of content - what happened? (Yup, the site has been crawled a few times since) Any ideas? Did I make a classic mistake? Any advise appreciated :)
I've completely disappeared off Google - what happened? Even my brand name keyword does not bring up my website - I feel lost, confused and baffled on what my next steps should be. ANY advice would be welcome, since there's no going back to the way the site was set up.
Technical SEO | | JeanieWalker0