Mod rewrite question
-
Sorry in advance if this isn't the best place to ask this question.
Google Webmaster Tools has recently identified a ton of "Not Found" pages, which are actual pages with some digits appended at the end.
For example, suppose an actual page on my blog is:
(A) http://www.example.com/blog/2012/09/my-post-title/
This page works just fine.
However, GWT has identified the following page as a "not found" page:
(B) http://www.example.com/blog/2012/09/my-post-title/9157586677/1846732913010
This appears to be happening to hundreds of posts on my site. In each case, the "9157586677" portion of the URL is identical, but the remaining 13 digits change from page to page.
I haven't been able to determine exactly what is causing this to happen - it's probably a social plug-in for Wordpress, or perhaps Disqus, but I'm not sure which one. I'll go through a process of elimination to narrow it down over the coming week.
As a quick fix, I'd like to create a ModRewrite rule so that requests for (B) get 301 redirected to (A). Since there are hundreds of posts, I need to do this in a way that works regardless of what's in the "/2012/09/my-post-title/" part of the URL.
Unfortunately, mod-rewrite is outside of my area of expertise. Can somebody please suggest how I can handle this? Thanks in advance.
PS - As for tracking down the cause, I've looked at the source of the pages in the "Linked From" area of GWT and the Not Found link is nowhere to be found. That is why I assume the bad link is being generated by some javascript that is a part of one of my plug-ins.
Update: It seems like Disqus is the source of these phantom links. There's considerable discussion here. I'll continue searching for a long-term solution. Meanwhile, I'd still appreciate help with the mod-rewrite question above. Thanks again.
-
I've found a solution and am posting it here in case anybody else is having the same problem:
RewriteRule ^([0-9]{4})/([0-9]{2})/([^/]+)/[0-9]+ /blog/$1/$2/$3/ [L,R=301]
-
I hadnt seen the update over Disquss at the end of the post.
Please, post all your advances on this topic Ahirai
Best regards!
-
Hi ahirai,
I was gonna say you should check the linked from tab in GWT but since you actually did it, for me its pretty sure that a plugin that drives content is creating this issue from scratch.
Since i´m neither an apache expert, i can´t give you a method to do the dirty work, but i can tell you the problem is created by some 3rd party plugin driving content of site.
Please, post your advances in the topic!
Good luck!!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google My Business Service Area Question
Hello Moz Friends I just wanted to make sure I'm doing things correctly. On google my business your given the option to list your service area. I serve the entire state of Colorado with my internet marketing services. So I listed Colorado as my service area. but Moz Friends, is this the wrong idea? Like should I list the major cities and call it good? So instead of service area Colorado, I should put Denver, Colorado Springs, Pueblo etc Thank you for your friendly help Chris
Technical SEO | | asbchris0 -
Question on Google's Site: Search
A client currently has two domains with the same content on each. When I pull up a Cached version of the site, I noticed that it has a Cache of the correct page on it. However, when I do a site: in Google, I am seeing the domain that we don't want Google indexing. Is this a problem? There is no canonical tag and I'm not sure how Google knows to cache the correct website but it does. I'm assuming they have this set in webmaster tools? Any help is much appreciated! Thanks!
Technical SEO | | jeff_46mile0 -
Duplicate Content Question
I have a client that operates a local service-based business. They are thinking of expanding that business to another geographic area (a drive several hours away in an affluent summer vacation area). The name of the existing business contains the name of the city, so it would not be well-suited to market 'City X' business in 'City Y'. My initial thought was to (for the most part) 'duplicate' the existing site onto a new site (brand new root domain). Much of the content would be the exact same. We could re-word some things so there aren't entire lengthy paragraphs of identical info, but it seems pointless to completely reinvent the wheel. We'll get as creative as possible, but certain things just wouldn't change. This seems like the most pragmatic thing to do given their goals, but I'm worried about duplicate content. It doesn't feel as though this is spammy though, so I'm not sure if there's cause for concern.
Technical SEO | | stevefidelity0 -
Website Redesign / Switching CMS / .aspx and .html extensions question
Hello everyone, We're currently preparing a website redesign for one of our important websites. It is our most important website, having good rankings and a lot of visitors from Search Engines, so we want to be really careful with the redesign. Our strategy is to keep as much in place as possible. At first, we are only changing the styling of the website, we will keep the content, the structure, and as much as URLs the same as possible. However, we are switching from a custom build CMS system which created URLs like www.homepage.com/default-en.aspx
Technical SEO | | NielsB
No we would like to keep this URL the same , but our new CMS system does not support this kind of URLs. The same with for instance the URL: www.homepage.com/products.html
We're not able to recreate this URL in our new CMS. What would be the best strategy for SEO? Keep the URLs like this:
www.homepage.com/default-en
www.homepage.com/products Or doesn't it really matter, since Google we view these as completely different URLs? And, what would the impact of this changes in URLs be? Thanks a lot in advance! Best Regards, Jorg1 -
Question/Concern about URL structure
Hey! I have some doubts concerning structuring a websites URL’s and what would be the best practise for this case. The site has 4 (main) categories with a maximum of 4 products in each category. For example: domain -> category (natural-stones) -> product (flooring) Which I would give the follow url: www.companysite.com//natural-stones/flooring Nothing odd so far, but here is the tricky part: the category isn’t an actual page a user wouldn’t be able to visit. The category is just an item in the mainmenu. If a user hovers over the category in the main menu they will get a dropdown in which they can select a product. E.g. flooring, wall strips etc. My question is: Is the url structure as I suggested: www.companysite.com//natural-stones/flooring the best practise. Even though the category isn’t an actually page. Or would it be better to structure the site: www.companysite.com/flooring My concern with this type of structure would be that the site would seem ‘flat’ with in-depth structure. Or would a third (and maybe best?) option be to create an actual page for the category itself. Thanks for taking the time to help me with my question/concern. If you need more information let me know.
Technical SEO | | RvG0 -
Google Rewriting PDF Titles
Has anyone else noticed Google rewriting the title of PDF documents?
Technical SEO | | waynekolenchuk0 -
Robots.txt question
Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_20050