Question about Robot.txt
-
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format.
For example this URL:
www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap)
Another error is
www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands
Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time.
Can somebody help me on the format of Robot.txt. Please? thanks
-
Thank you for your reply. This surely helps. I will probably edit the htaccess.
-
That's the problem with most sitebuilder type prgrams, they are very limited.
Perhaps look at your site title, and page titles. Usually the site title will be the included on all of your webpages followed by the page title so you could simply name your site www.yourcompany.com then add an individual page title to each page.
A robots.txt file is not supposed to be added to every page and only tells the bots what to crawl, and what not to.
If you can edit the htaccess, you should be able to get to the individual pages and insert/change the code for titles, just be aware that doing it manually can work, but sometimes when you go back to make an edit in the builder it may undo all of your manual changes, if that's the case, get your site perfect, then do the individual code changes as the last change.
Hope this helps.
-
I have no way of adding those too. Ooops thanks for the warning. I guess I would have to wait for Google to filter out the parameters.
Thanks for your answer.
-
You certainly don't want to block your sitemap file in robots.txt. It takes some time for Google to filter out the parameters and that is the right approach. If there is no way to change the title, I wouldn't be so concerned over a few pages with duplicate titles. Do you have the ability to add a noindex,follow meta tag on these pages?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Question Regarding Website Architecture
Hello All, Our website currently has a general solutions subdirectory, which then links to each specific solution, following the path /solutions/ => /solutions/solution1/. As our solutions can be quite complex, we are adding another subdirectory to target individuals by profession. I would like to link from our profession pages to the varying solutions that help. As both subdirectories will be top level pages in the main menu, would linking from our professions to **solutions **be poor architecture? In this case the path would look like: /professions/ => /professions/profession1/ => /solutions/solution1/. Thanks!
Technical SEO | | Tom3_150 -
Robots.txt
Hi All Having a robots.txt looking like the below will this stop Google crawling the site User-agent: *
Technical SEO | | internetsalesdrive0 -
Moving Blog Question
Site A is my primary site. I created a blog on site B and wrote good content and gave links back to site A. I think this is causing a penalty to occur. I no longer want to update site B and want to move the entire blog and it's content to sitea.com/blog. Is this a good idea or should I just start a fresh/new sitea/blog and just remove the links from site B to site A?
Technical SEO | | CLTMichael0 -
Rel canonical question
Hi, I have an e-commerce site hosted on Volusion currently the rel canonical link for the homepage points to www.store.com/default.asp. I spoke with the Volusion support people and they told me that whether the canonical link points to store.com/default.asp or store.com does not really matter as long as there is a canonical version. I thought this sounded odd, so looked at other websites hosted on volusion and some sites canonicalize to default.asp and others .com. (volusion.com canonicalizes to .com fwiw). The question is...I have a majority of my external links going to www.store.com , and since that page has default.asp as it canonical version, am I losing link juice from those incoming links? If so, should I change the canonical link? If I do what are the potential issues/penalties? Hopefully this question makes sense and thanks in advance.
Technical SEO | | IOSC0 -
Summarize your question.Google places listing has gone AWOL :-(
<cite>Bonjour from sunny wetherby UK :-)</cite> <cite>Ive got a rogue Google places listing. I want the listing to sit under http://www.barrettsteel.com/ not under www.barrettonline.co.uk</cite> <cite>Here is the problem illustrated:</cite> <cite>http://i216.photobucket.com/albums/cc53/zymurgy_bucket/local-listing-attached-badly.jpg</cite> <cite>So my question is please. How do move the Google Pla ces lisrting from under www.barrettonline.co.uk to underwww.barrettsteel.com</cite> <cite>Thanks in advance,</cite> <cite>David</cite>
Technical SEO | | Nightwing0 -
Should search pages be disallowed in robots.txt?
The SEOmoz crawler picks up "search" pages on a site as having duplicate page titles, which of course they do. Does that mean I should put a "Disallow: /search" tag in my robots.txt? When I put the URL's into Google, they aren't coming up in any SERPS, so I would assume everything's ok. I try to abide by the SEOmoz crawl errors as much as possible, that's why I'm asking. Any thoughts would be helpful. Thanks!
Technical SEO | | MichaelWeisbaum0 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
Confused about robots.txt
There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots. User-agent: * Disallow: javascript.js Disallow: /images/ Disallow: /embedconfig Disallow: /playerconfig Disallow: /spotlightmedia Disallow: /EventVideos Disallow: /playEpisode Allow: / Sitemap: http://www.example.tv/sitemapindex.xml Sitemap: http://www.example.tv/sitemapindex-videos.xml Sitemap: http://www.example.tv/news-sitemap.xml Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools! Help someone, anyone! Can't seem to understand this robotic business! Regards,
Technical SEO | | Netpace0