Can you see the 'indexing rules' that are in place for your own site?
-
By 'index rules' I mean the stipulations that constitute whether or not a given page will be indexed.
If you can see them - how?
-
Unfortunately, that would be specific to your own platform and server-side code. When you look at the SEOmoz source code, you're either going to see a nofollow or you're not. The code that drives that is on our servers and is unique to our build (PHP/Cake, I think).
You'd have to dig into the source code generating the Robots.txt file. I don't think you can have a fully dynamic Robots.txt (it has to have a .txt extension), so there must be a piece of code that generates a new Robots.txt file, probably on a timer. It could be called something similar, like Robots.php, Robots.aspx, etc. Just a guess.
FYI, dynamic Robots.txt could be a little dicey - it might be better to do this with a META NOINDEX in the header of the user profile pages. That would also avoid the timer approach. The pages would dynamically NOINDEX themselves as they're created.
-
To hopefully clarify what I'm talking about, I want to provide this example: SEOmoz will remove the "no-follow" tag from the first link in your profile if you get 200 mozpoints.
This is a set rule which I believe will automatically occur once a user reaches the minimum. On my site, a similar rule exists where the meta noindex tag will be removed from a user page if you submit 10 'files'.
There were other rules similar to this created and I need to know what they are. How?
-
On my site, there was a rule created where users are blocked by robots unless they have submitted a minimum number of 'files'. This was done to ensure that only quality user profile pages are being indexed and not just spam/untouched profiles.
There have been other rules like this created but I don't know what they are and I'd like to find out.
-
Hi David,
Do you mean how robots.txt is configured and if the robots file is blocking a certain page from being indexed? If so, yes. If the file is complex and you're not sure if it's blocking a particular page, you can go into Google Webmaster Tool and they have a robots.txt utility where you can input a particular URL and it will tell you if the robots.txt file you are using (or proposing) blocks that URL.
If you mean whether the page is quality enough for a search engine to choose to index it? No, that's part of the algorithm and none of the major engines are that nice and open.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My keywords aren't performing for my Umbraco site
A client of mine has just redesigned their site which was pretty small (homepage, about us page and contact us page) and now it includes homepage, about us page, 3 services pages, 5 blog posts and a contact us page. their domain authority is 5, so that gives you an idea of their size. We updated their key pages with keyword optimised content and added the keyword to their meta title and meta description. they're in the process of adding the alt tags and also they need to enable meta tags for the blog posts. Everything is quite in the process at the moment and their organic traffic is low. But I believe that some of the keywords should start moving places for the pages that have been optimised and they haven't. Is there any reason for this? I believe the services pages which have meta tags should have started ranking at least in very low position for the selected keywords. Is there something I'm missing? Thank you!
Intermediate & Advanced SEO | | Chris_Wright0 -
Site shows up after re-indexing, then disappears.
I have a site, natvest.com, with which I sell real estate in Alabama and Georgia. I need to show up in an "Alabama Land for Sale" search. Same thing for Georgia. If I re-index my site, I show up for roughly one day, before disappearing again. Happens every time I re-index. Ideas?
Intermediate & Advanced SEO | | natvest0 -
Moving site to new domain without access to redirect from old to new. How can I do this with as little loss to SERP results as possible?
I've been hired to build a new site for a customer. They were duped by some shady characters at goglupe.com (If you can reach them, tell them they are rats--phone is disconnected, address is a comedy club on Mission in SF). Glupe owns the domain name and would not transfer or give FTP access prior to dropping off the face of the earth. The customer doesn't want to chase after them with lawyers, so we are moving on. New domain, new site with much of the same content as previous site. All that I have access to is the old wordpress site. I plan to build the new site, then remove all pages/posts from the old site. Is there anything I can do to salvage the current page 1 ranking? Obviously, the new domain will take some time to get back there. Just hoping to avoid any pitfalls or penalties if I can. If I had complete access, I would follow all the standard guidelines. But I don't. Any thoughts? Thanks! Chris
Intermediate & Advanced SEO | | c_estep_tcbguy0 -
The images on site are not found/indexed, it's been recommended we change their presentation to Google Bot - could this create a cloaking issue?
Hi We have an issue with images on our site not being found or indexed by Google. We have an image sitemap but the images are served on the Sitecore powered site within <divs>which Google can't read. The developers have suggested the below solution:</divs> Googlebot class="header-banner__image" _src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx"/>_Non Googlebot <noscript class="noscript-image"><br /></span></em><em><span><div role="img"<br /></span></em><em><span>aria-label="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>title="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>class="header-banner__image"<br /></span></em><em><span>style="background-image: url('/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx?mw=1024&hash=D65B0DE9B311166B0FB767201DAADA9A4ADA4AC4');"></div><br /></span></em><em><span></noscript> aria-label="Arctic Safari Camp, Arctic Canada" title="Arctic Safari Camp, Arctic Canada" class="header-banner__image image" data-src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx" data-max-width="1919" data-viewport="0.80" data-aspect="1.78" data-aspect-target="1.00" > Is this something that could be flagged as potential cloaking though, as we are effectively then showing code looking just for the user agent Googlebot?The devs have said that via their contacts Google has advised them that the original way we set up the site is the most efficient and considered way for the end user. However they have acknowledged the Googlebot software is not sophisticated enough to recognise this. Is the above solution the most suitable?Many thanksKate
Intermediate & Advanced SEO | | KateWaite0 -
Dfferent url of some other site is shown by Google in cace copy of our site's page
Hi, When i check cached copy of url of my site http://goo.gl/BZw2Zz , the url in cache copy shown by Google is of some other third party site. Why is Google showing third party url in our site's cached url. Did any of you guys faced any such issue. Regards,
Intermediate & Advanced SEO | | vivekrathore0 -
Dev Site Out of SERP But Still Indexed
One of our dev sites get indexed (live site robots.txt was moved to it, that has been corrected) 2-3 weeks ago. I immediately added it to our Webmaster Tools and used the Remove URL tool to get the whole thing out of the SERPs. A site:devurl search in Google now returns no results, but checking Index Status in WMT shows 2,889 pages of it still indexed. How can I get all instances of it completely removed from Google?
Intermediate & Advanced SEO | | Kingof50 -
Site Search Results in Index -- Help
Hi, I made a mistake on my site, long story short, I have a bunch of search results page in the Google index. (I made a navigation page full of common search terms, and made internal links to a respective search results page for each common search term.) Google crawled the site, saw the links and now those search results pages are indexed. I made versions of the indexed search results pages into proper category pages with good URLs and am ready to go live/ replace the pages and links. But, I am a little unsure how to do it /what the effects can be: Will there be duplicate content issues if I just replace the bad, search results links/URLs with the good, category page links/URLs on the navi. page? (is a short term risk worth it?) Should I get the search results pages de-indexed first and then relaunch the navi. page with the correct category URLs? Should I do a robots.txt disallow directive for search results? Should I use Google's URL removal tool to remove those indexed search results pages for a quick fix, or will this cause more harm than good? Time is not the biggest issue, I want to do it right, because those indexed search results pages do attract traffic and the navi. page has been great for usability. Any suggestions would be great. I have been reading a ton on this topic, but maybe someone can give me more specific advice. Thanks in advance, hopefully this all makes sense.
Intermediate & Advanced SEO | | IOSC1 -
My warning report says I have too many on page links - 517! I can't find 50% of them but my q is about no follow
if we put 'no follow' on some of these links does that mean the search engines won't index the no follow pages even if those pages are linked to from elsewhere? no link juice will flow from the page with the (no follow) links on? Just trying to understand why my rankings have dropped so dramatically in the last 6 weeks or so since we redesigned the site, and it might be that now we have too many links on the homepage. This is the page http://www.suffolktouristguide.com/ All suggestions appreciated!
Intermediate & Advanced SEO | | SarahinSuffolk0