Question about Robot.txt
-
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format.
For example this URL:
www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap)
Another error is
www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands
Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time.
Can somebody help me on the format of Robot.txt. Please? thanks
-
Thank you for your reply. This surely helps. I will probably edit the htaccess.
-
That's the problem with most sitebuilder type prgrams, they are very limited.
Perhaps look at your site title, and page titles. Usually the site title will be the included on all of your webpages followed by the page title so you could simply name your site www.yourcompany.com then add an individual page title to each page.
A robots.txt file is not supposed to be added to every page and only tells the bots what to crawl, and what not to.
If you can edit the htaccess, you should be able to get to the individual pages and insert/change the code for titles, just be aware that doing it manually can work, but sometimes when you go back to make an edit in the builder it may undo all of your manual changes, if that's the case, get your site perfect, then do the individual code changes as the last change.
Hope this helps.
-
I have no way of adding those too. Ooops thanks for the warning. I guess I would have to wait for Google to filter out the parameters.
Thanks for your answer.
-
You certainly don't want to block your sitemap file in robots.txt. It takes some time for Google to filter out the parameters and that is the right approach. If there is no way to change the title, I wouldn't be so concerned over a few pages with duplicate titles. Do you have the ability to add a noindex,follow meta tag on these pages?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No index tag robots.txt
Hi Mozzers, A client's website has a lot of internal directories defined as /node/*. I already added the rule 'Disallow: /node/*' to the robots.txt file to prevents bots from crawling these pages. However, the pages are already indexed and appear in the search results. In an article of Deepcrawl, they say you can simply add the rule 'Noindex: /node/*' to the robots.txt file, but other sources claim the only way is to add a noindex directive in the meta robots tag of every page. Can someone tell me which is the best way to prevent these pages from getting indexed? Small note: there are more than 100 pages. Thanks!
Technical SEO | | WeAreDigital_BE
Jens0 -
Little confused regarding robots.txt
Hi there Mozzers! As a newbie, I have a question that what could happen if I write my robots.txt file like this... User-agent: * Allow: / Disallow: /abc-1/ Disallow: /bcd/ Disallow: /agd1/ User-agent: * Disallow: / Hope to hear from you...
Technical SEO | | DenorL0 -
GWT returning 200 for robots.txt, but it's actually returning a 404?
Hi, Just wondering if anyone has had this problem before. I'm just checking a client's GWT and I'm looking at their robots.txt file. In GWT, it's saying that it's all fine and returns a 200 code, but when I manually visit (or click the link in GWT) the page, it gives me a 404 error. As far as I can tell, the client has made no changes to the robots.txt recently, and we definitely haven't either. Has anyone had this problem before? Thanks!
Technical SEO | | White.net0 -
Wordpress question
I was curious when i run an OSE report on certain websites and their name.wordpress.com shows up with a PA of whatever and a DA of 100. But when I created my wordpress site and post on it, it only has a PA and DA of 1. is this because SEOmoz has not indexed it yet? It is a month old. http://shiftinsurance.wordpress.com/ Can anyone help pls?
Technical SEO | | greasy0 -
Google Places Question......
Hi Guys. I am working with a photographer they do not have a studio they shoot on location. However I noticed many photographers within their industry have their home address listed in their google places, and they too shoot on location. My client doesn't want their home address listed so I wondered what options there would be? Do you think renting mail forwarding address would suffice?
Technical SEO | | RankStealer0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050 -
Meta tags question - imagetoolbar
We inherited some sites from another vendor & they have these tags in the head of all pages. Are they of any value at all? Thanks for the help! Wick Smith
Technical SEO | | wcksmith0 -
Sitemap question
My sitemap includes www.example.com and www.example.com/index.html, they are both the same page, will this have any negative effects, or can I remove the www.example.com/index.html?
Technical SEO | | Aftermath_SEO0