Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Replication on Search
Hi. We recently created a Christmas category page on our eCommerce website (christowhome.co.uk). Earlier today, I Googled ‘Christow Christmas Silhouette Lights’ (Christow being the name of our website and Christmas silhouette lights being one of the sub-categories we recently created). I was curious to see how the page appeared on search. Bizarrely, the page appeared multiple times on search (if you click on the link above, it should show you the search results). As you can see, multiple meta titles and descriptions have been created for the same page. This is something that is affecting a number of our Christmas category pages. I don't quite understand why this has happened. We recently added filters to the category. Could the filters be responsible? Any idea how I can prevent this from happening? How I can stop Google indexing these weird replica pages? Many thanks, Dave
Technical SEO | | Davden0 -
Is the content on my website is garbage?
I received a mail from google webmasters, that my website is having low quality content. Website - nowwhatmoments.com
Technical SEO | | Green.landon0 -
Will links be counted?
We are considering a redesign of our website and one of the options we are considering is to come up with something along the lines of http://www.tesco.com/, with rotating top offers. The question I am wondering is whether or not the links (ie. the blue links on the left side of the main graphic) will be visible to the spiders, and if not, whether there is a way to code it so they are?
Technical SEO | | simonukss0 -
GWT shows 38 external links from 8 domains to this PDF - But it shows no links and no authority in OSE
Hi All, I found one other discussion about the subject of PDFs and passing of PageRank here: http://moz.com/community/q/will-a-pdf-pass-pagerank But this thread didn't answer my question so am posting it here. This PDF: http://www.ccisolutions.com/jsp/pdf/YAM-EMX_SERIES.PDF is reported by GWT to have 38 links coming from 8 unique domains. I checked the domains and some of them are high-quality relevant sites. Here's the list: Domains and Number of Links
Technical SEO | | danatanseo
prodiscjockeyequipment.com 9
decaturilmetalbuildings.com 9
timberlinesteelbuildings.com 6
jaymixer.com 4
panelsteelbuilding.com 4
steelbuildingsguide.net 3
freedocumentsearch.com 2
freedocument.net 1 However, when I plug the URL for this PDF into OSE, it reports no links and a Page Authority if only "1". This is not a new page. This is a really old page. In addition to that, when I check the PageRank of this URL, the PageRank is "nil" - not even "0" - I'm currently working on adding links back to our main site from within our PDFs, but I'm not sure how worthwhile this is if the PDFs aren't being allocated any authority from the pages already linking to them. Thoughts? Comments? Suggestions? Thanks all!0 -
Too many navigational links
Hi there, I have an issue with the amount of internal links on my webpages. Moz campaign manager gives a lot of 'too many on page links' issues. Over 7000.
Technical SEO | | MarcelMoz
I know the importance of a good internal linking structure. 1. Not too many internal links (over approximately 100) is good for flowing through some authority from authoritive pages.
2. Too many internal links can spend all of the 'crawler budget' so the crawlers won't crawl the complete website anymore (right?). This can cause problems with indexing new webpages (right?). This is the situation: The website is a webshop The header contains 6 links, the footer contains 32 links, the homepage contains 42 links, the body content of some category pages contains a variated amount of links from 30 to a maximum of 100 links. Product pages do contain a maximum of 25 links. There is no problem here. Now here's the problem: The website navigation is a dropdown menu that contains 167 links to tier 2. These links are very important for our visitors. They can immediately find the right category/product by it. Removing or shrinking this dropdown is not an option. But the dropdown navigation is causing all of the 'too many on page links' issues. Question: is there a SEO (indexing, PA) problem in this situation which i should solve? What should I solve and how should I solve this? Note: pages have good organic positions and authority. Thanks a lot. Marcel0 -
Is it possible to export Inbound Links in a CSV file categorized by Linking Root Domains ?
Hi, I am performing an analysis of the total inbound links to my homepage and I would like to have the total amount of inbound links categorized by the Linking root domains. For example, the Open Site explorer does offer the feature to show you the Linking Root Domains to your page. Then when you click on the first Linking Root Domain, it also shows you the Top Linking Pages ( Which means all the pages that link to your page from this particular top level domain) Now I would like to export this data to a CSV file, but open site explorer only exports the total amount of top level linking domains. Does anyone has a solution to this problem ? Thank you very much for the help in advance!
Technical SEO | | Feweb0 -
Why is my website banned?
IMy website is Costume Machine at www.costumemachine.com . My site has been banned for 1 year now. I have requested that google reconsider my site 3 times without luck. The site is dynamic and basically pulls in feeds from affiliate sites. We have added over 1,500 pages of original content. The site has been running great since 2008 without any penalties. I don't think I got hit with any linking penalty. I cleaned up all questionable links last November when the penalty hit. Am I being hit with a "thin" site penalty? If that is the issue what is the best way to fix the problem?
Technical SEO | | tadden0 -
Website has been penalized?
Hey guys, We have been link building and optimizing our website since the beginning of June 2010. Around August-September 2010, our site appeared on second page for the keywords we were targeting for around a week. They then dropped off the radar - although we could still see our website as #1 when searching for our company name, domain name, etc. So we figured we had been put into the 'google sandbox' sort of thing. That was fine, we dealt with that. Then in December 2010, we appeared on the first page for our keywords and maintained first page rankings, even moving up the top 10 for just over a month. On January 13th 2011, we disappeared from Google for all of the keywords we were targeting, we don't even come up in the top pages for company name search. Although we do come up when searching for our domain name in Google and we are being cached regularly. Before we dropped off the rankings in January, we did make some semi-major changes to our site, changing meta description, changing content around, adding a disclaimer to our pages with click tracking parameters (this is when SEOmoz prompted us that our disclaimer pages were duplicate content) so we added the disclaimer URL to our robots.txt so Google couldn't access it, we made the disclaimer an onclick link instead of href, we added nofollow to the link and also told Google to ignore these parameters in Google Webmaster Central. We have fixed the duplicate content side of things now, we have continued to link build and we have been adding content regularly. Do you think the duplicate content (for over 13,000 pages) could have triggered a loss in rankings? Or do you think it's something else? We index pages meta description and some subpages page titles and descriptions. We also fixed up HTML errors signaled in Google Webmaster Central and SEOmoz. The only other reason I think we could have been penalized, is due to having a link exchange script on our site, where people could add our link to their site and add theirs to ours, but we applied the nofollow attribute to those outbound links. Any information that will help me get our rankings back would be greatly appreciated!
Technical SEO | | bigtimeseo0