Steps you can take to ensure your content is indexed and registered to your site before a scraper gets to it?
-
Hi,
A clients site has significant amounts of original content that has blatantly been copied and pasted in various other competitor and article sites.
I'm working with the client to rejig lots of this content and to publish new content.
What steps would you recommend to undertake when the new, updated site is launched to ensure Google clearly attributes the content to the clients site first?
One thing I will be doing is submitting a new xml + html sitemap.
Thankyou
-
There are no "best practices" established for the tags' usage at this point. On the one hand, it could technically be used for every page, and on the other, should only be used when it's an article, blog post, or other individual person's writing.
-
Thanks Alan.
Guess there's no magic trick that will give you 100% attribution.
Regarding this tag, do you recommend I add this to EVERY page of the clients website including the homepage? So even the usual about us/contact etc pages?
Cheers
Hash
-
Google continually tries to find new ways to encourage solutions for helping them understand intent, relevance, ownership and authority. It's why Schema.org finally hit this year. None of their previous attempts have been good enough, and each has served a specific individual purpose.
So with Schema, the theory is there's a new, unified framework that can grow and evolve, without having to come up with individual solutions.
The "original source" concept was supposed to address the scraper issue, and there's been some value in that, though it's far from perfect. A good scraper script can find it, strip it out or replace the contents.
rel="author" is yet one more thing that can be used in the overall mix, though Schema.org takes authorship and publisher identity to a whole new, complex, and so far confused level :-).
Since Schema.org is most likely not going to be widely adopted til at least early next year, Google's encouraging use of the rel="author" tag as the primary method for assigning authorship at this point, and will continue to support it even as Schema rolls out.
So if you're looking at a best practices solution, yes, rel="author" is advisable. Until it's not.
-
Thanks Alan... I am surprised to learn about this "original source" information. There must not have been a lot of talk about it when it was released or I would have seen it.
Google recently started encouraging people to use the rel="author" attribute. I am going to use that on my site... now I am wondering if I should be using "original source" too.
Are you recommending rel="author"?
Also, reading that full post there is a section added at the end recommending rel="canonical"
-
Always have a sitemap.xml file with all the URLs you want indexed included in it. Right after publishing, submit the sitemap.xml file (or files if there are tens of thousands of pages) through Google Webmaster Tools and Bing Webmaster Tools. Include the Meta "original-source" tag in your page headers.
Include a Copyright line at the bottom of each page with the site or company name, and have that link to the home page.
This does not guarantee with 100% certainty that you'll get proper attribution, however these are the best steps you can take in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our subdomain hosts content that can not be optimized (static videos) - should I de-index it?
We host static tours on a subdomain that points to the video tour host. I can not add meta or optimize any of these video pages and there are thousands. Should I de-index the entire subdomain? I am seeing errors for no meta, dup content etc in MOZ reporting. If yes, do I add the Disallow: /subdomain.root.nl/ to the primary domain's website/CMS or in DNS records ? Our web company is saying the no follow needs to be added in DNS but I feel like it should be added to the robots.txt file if SERP's are going to acknowledge the sub is no longer to be crawled and the primary is no longer to be penalized. Thank you so much in advance!
Intermediate & Advanced SEO | | masonmorse0 -
E-Commerce Site Collection Pages Not Being Indexed
Hello Everyone, So this is not really my strong suit but I’m going to do my best to explain the full scope of the issue and really hope someone has any insight. We have an e-commerce client (can't really share the domain) that uses Shopify; they have a large number of products categorized by Collections. The issue is when we do a site:search of our Collection Pages (site:Domain.com/Collections/) they don’t seem to be indexed. Also, not sure if it’s relevant but we also recently did an over-hall of our design. Because we haven’t been able to identify the issue here’s everything we know/have done so far: Moz Crawl Check and the Collection Pages came up. Checked Organic Landing Page Analytics (source/medium: Google) and the pages are getting traffic. Submitted the pages to Google Search Console. The URLs are listed on the sitemap.xml but when we tried to submit the Collections sitemap.xml to Google Search Console 99 were submitted but nothing came back as being indexed (like our other pages and products). We tested the URL in GSC’s robots.txt tester and it came up as being “allowed” but just in case below is the language used in our robots:
Intermediate & Advanced SEO | | Ben-R
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /9545580/checkouts
Disallow: /carts
Disallow: /account
Disallow: /collections/+
Disallow: /collections/%2B
Disallow: /collections/%2b
Disallow: /blogs/+
Disallow: /blogs/%2B
Disallow: /blogs/%2b
Disallow: /design_theme_id
Disallow: /preview_theme_id
Disallow: /preview_script_id
Disallow: /apple-app-site-association
Sitemap: https://domain.com/sitemap.xml A Google Cache:Search currently shows a collections/all page we have up that lists all of our products. Please let us know if there’s any other details we could provide that might help. Any insight or suggestions would be very much appreciated. Looking forward to hearing all of your thoughts! Thank you in advance. Best,0 -
Knowledge Graph Quick Answer Box: Is there anything we can do to get our content to appear there?
Hi everyone, The quick answers box can be really helpful for searchers by pulling through content which answers their question or provides a clear description of an item or entity. Our client appeared in the quick answer box for a period of time with their description of a product, but have since been replaced by one of their competitors. Previously, the answer was provided by Wikipedia. Is there anything we can do to help get our client's content back in there? We've been looking at possible structured data we can use but haven't found anything. Also suggesting our client ensures they have a paragraph within their copy which is a clear, concise description of the product that Google can pull. Can anyone give any suggestions? Thanks Laura
Intermediate & Advanced SEO | | tomcraig860 -
Multiple 301 redirects and old site content appearing in Google results
I have found that for some Google searches the old version of the site on a completely different domain is appearing on page one of the results, while the newer site is only on page 3. The old site is redirecting to the new site with a 301 redirect, however there is also an additional redirect on the new site to force SSL. Despite this when you view the Google cache of the result that appears in Google the content of the page is still the old site. Is this normal or is Google not following the chain of 301 redirects? Edit: I just found out that downloading the page by right clicking a link and clicking download rather than viewing it in a browser leads to the old site appearing and the 301 redirect not being followed.
Intermediate & Advanced SEO | | freshleafmedia0 -
If I only Link to Page via Sitemap, can it still get indexed?
Hi there! I am creating a ton of content for specific geographies. Is it possible for these pages to get indexed if I only put them in my sitemap and don't link to them through my actual site (though the pages will be live). Thanks!
Intermediate & Advanced SEO | | Travis-W
Travis0 -
Malicious site pointed A-Record to my IP, Google Indexed
Hello All, I launched my site on May 1 and as it turns out, another domain was pointing it's A-Record to my IP. This site is coming up as malicious, but worst of all, it's ranking on keywords for my business objectives with my content and metadata, therefore I'm losing traffic. I've had the domain host remove the incorrect A-Record and I've submitted numerous malware reports to Google, and attempted to request removal of this site from the index. I've resubmitted my sitemap, but it seems as though this offending domain is still being indexed more thoroughly than my legitimate domain. Can anyone offer any advice? Anything would be greatly appreciated! Best regards, Doug
Intermediate & Advanced SEO | | FranGen0 -
Lots of city pages - How do I ensure we don't get penalized
We are planning on having a job posting page for each city that we are looking to hire new CFO partners in. But, the problem is, we have LOTS of locations. I was wondering what would be the best way to have similar content on each page (since the job description and requirements are the same for each job posting) without being hit by Google for having duplicate content? One of the main reasons we have decided to have location based pages is that we have noticed visitors to our site are searching for "cfo job in [location] but we notice that most of these visitors then leave. We believe it to be because the pages they land on make no mention of the location that they were looking for and is a little incongruent with what they were expecting. We are looking to use the following URLs and TItle/Description as an example: | http://careers.b2bcfo.com/cfo-jobs/Alabama/Birmingham | CFO Careers in Birmingham, AL | | Are you looking for a CFO Career in Birmingham, Alabama ? We're looking for partners there. Apply today! | | Any advice you have for this would be greatly appreciated. Thank you.
Intermediate & Advanced SEO | | B2B.CFO0 -
Will an RSS feed help new product get indexed? How to create one for product?
Hi I've read that creating an RSS feed for one of our ecommerce sites will help the products get indexed faster. Currently it takes google 4-5 days to index our new products, we want to speed that up. Will an RSS feed of the new products we have help? How do you create an RSS feed for this? Our blog gets indexed within minutes, but our main website, 4 days. Help!
Intermediate & Advanced SEO | | xoffie0