Strange Webmaster Tools Crawl Report
-
Up until recently I had robots.txt blocking the indexing of my pdf files which are all manuals for products we sell. I changed this last week to allow indexing of those files and now my webmaster tools crawl report is listing all my pdfs as not founds.
What is really strange is that Webmaster Tools is listing an incorrect link structure: "domain.com/file.pdf" instead of "domain.com/manuals/file.pdf"
Why is google indexing these particular pages incorrectly? My robots.txt has nothing else in it besides a disallow for an entirely different folder on my server and my htaccess is not redirecting anything in regards to my manuals folder either. Even in the case of outside links present in the crawl report supposedly linking to this 404 file when I visit these 3rd party pages they have the correct link structure.
Hope someone can help because right now my not founds are up in the 500s and that can't be good
Thanks is advance!
-
Hello,
Did you check the "linked From" tab? click on each error and see which are the sites that are linked from
-
Thanks for the help Wissam!
What I have done is changed all relative paths to direct- then I ran screaming frog and it did not pick up any 404s at all - this was last Thursday. Unfortunately webmaster tools is still reporting the same style 404s having been discovered since then. Is there a reason why screaming frog and webmaster tools would be seeing different crawl results?
-
all link reported in the GWT is based on a crawl.( so there is either an external or internal link pointing to these.com/file.pdf)
So what i would do is fire up Screaming Frog or Xenu and do a full site crawl and check the reports. You might find some pages linking or using relative urls in the a href elements.
If you land into a situation where you have external links pointing to wrong URLS I would recommend either by contacting them or just 301 /file.pdf to /manuals/file.pdf
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Webmaster Tools - content keywords containing spam?
Hi all, When I looked in Google Webmaster Tools today I found under the menu Google Index, Content Keywords, that the list is full of spammy keywords (E.g. Viagra (no. 1) and stuff like that) Around april we built a whole new website, uploaded a new xml-sitemap, and did all the other things Google Webmaster Tools suggest when one is creating a Google Webmaster Account. Under the menu "Security Issues" nothing is mentioned. All together I find it har d to believe that the site is hacked - so WHY is Google finding these content keywords on our site?? Should I fear that this will harm my SEO efforts? Best regards, Christian
Technical SEO | | Henrik_Kruse0 -
How do I get my pages to go from "Submitted" to "Indexed" in Google Webmaster Tools?
Background: I recently launched a new site and it's performing much better than the old site in terms of bounce rate, page view, pages per session, session duration, and conversions. As suspected, sessions, users, and % new sessions are all down. Which I'm okay with because the the old site had a lot of low quality traffic going to it. The traffic we have now is much more engaged and targeted. Lastly, the site was built using Squarespace and was launched the middle of August. **Question: **When reviewing Google Webmaster Tools' Sitemaps section, I noticed it says 57 web pages Submitted, but only 5 Indexed! The sitemap that's submitted seems to be all there. I'm not sure if this is a Squarespace thing or what. Anyone have any ideas? Thanks!!
Technical SEO | | Nate_D0 -
GWT crawl errors: How big a ranking issue?
For family reasons (child to look after) I can't keep a close eye on my SEO and SERPs. But from top 10 rankings in January for a dozen keywords I'm now not in top 80 results -- save one keyword for which I'm ~18-20.
Technical SEO | | Jeepster
Not a sitewide penalty: some of my internal pages are still ranking top 3 or so. In GWT, late March I received warning of a rise in server errors:
17 Server Errors/575 soft 404s/17 Not Founds/Access Denied 1/Others 4
I've also got 2 very old sitemaps (from two different ex-SEO firms) & I'm guessing about 75% of the links on there no longer exist. Q: Could all this be behind my calamitous SERPS drop? Or should I be devoting my -- limited -- time to improving my links?0 -
My website pages are not crawled, what to do?
Hi all. I have made some changes on the website so i like to crawled them by the search engines Google especially. I have made these changes around 2 weeks ago. I have submitted my website on good bookmarking websites. Also i used a tool available in Google webmasters "Fetch as Google", Resubmitted a sitemap.xml. Still my pages are not crawled your opinion please. Thanks
Technical SEO | | lucidsoftech0 -
Backlink density & disavow tool
I am cleaning up my backlink profile for www.devoted2vintage.co.uk but before I start removing links I wanted some advice on the following: I currently have over 2000 backlinks from about 200 domains. Is this a healthy ratio or should I prune this? Is there a recommended max number of backlings per domain? Should I delete links to all or some of the spun PR articles (some of the article web pages have over 40 articles with links back to us)
Technical SEO | | devoted2vintage0 -
Firefox Add-On for crawl frequency??
Hi all, a short one: is there a firefox add-on available, which lets you see the crawl frequency of your page(s)? Would be interesting to see if google bot comes around more lately... There are some statistics in the webmaster tools, but I don't find them very attractive 🙂 I know there is something for Wordpress, but we don't use it... I don't to put up an excel-sheet and check the cache-version for myself. And I would love to see how deep the crawler gets and which sites do not get crawled... So, any existing add-ons or tools that are for free?? 🙂 Thanx....
Technical SEO | | accessKellyOCG0 -
How do crawl errors from SEOmoz tool set effect rankings?
Hello - The other day I presented the crawl diagnostic report to a client. We identified duplicate page title errors, missing meta description errors, and duplicate content errors. After reviewing the report we presented it to the clients web company who operates a closed source CMS. Their response was that these errors are not worthy of fixing and in fact they are not hurting the site. We are having issues getting the errors fixed and I would like your opinion on this matter. My question is, how bad are these errors? Should we not fix them? Should they be fixed? Will fixing the errors have an impact on our site's rankings? Personally, I think the question is silly. I mean, the errors were found using the SEOmoz tool kit, these errors have to be effecting SEO.....right? The attached image is the result of the Crawl Diagnostics that crawled 1,400 pages. NOTE: Most of the errors are coming from Pages like blog/archive/2011-07/page-2 /blog/category/xxxxx-xxxxxx-xxxxxxx/page-2 testimonials/147/xxxxx--xxxxx (xxxx represents information unique to the client) Thanks for your insight! c9Q33.png
Technical SEO | | Gabe0 -
Tool which checks cache date of pages?
Does anyone know of a tool which can check the cache date of each page of a site? i can get each page of the site into a .csv or xml file
Technical SEO | | Turkey1