Google and PDF indexing
-
It was recently brought to my attention that one of the PDFs on our site wasn't showing up when looking for a particular phrase within the document. The user was trying to search only within our site. Once I removed the site restriction - I noticed that there was another site using the exact same PDF. It appears Google is indexing that PDF but not ours. The name, title, and content are the same. Is there any way to get around this?
I find it interesting as we use GSA and within GSA it shows up for the phrase. I have to imagine Google is saying that it already has the PDF and therefore is ignoring our PDF. Any tricks to get around this?
BTW - both sites rightfully should have the PDF. One is a client site and they are allowed to host the PDFs created for them. However, I'd like Mathematica to also be listed.
Query: no site restriction (notice: Teach for america comes up #1 and Mathematica is not listed). https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#q=HSAC_final_rpt_9_2013.pdf+"Teach+charlotte"+filetype:pdf&as_qdr=all&filter=0
Query: site restriction (notice that it doesn't find the phrase and redirects to any of the words) https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#as_qdr=all&q="Teach+charlotte"+site:www.mathematica-mpr.com+filetype:pdf
-
Hi Rose, Jeff provided a great response. Did it answer your question? If so, please mark it "good answer", thanks!
Christy
-
It sounds like it could be a few different issues, but my initial look at the site makes it seem to me that it's duplicate content, and that Google is only returning the results for their PDF and not for yours.
Google does hate duplicate content, and this is probably why they are only showing their PDF.
It may be that they posted theirs first, or they linked to it first.
As you probably know, not all PDFs are created equally. Some are more difficult for a search engine to read (i.e. might have text tied up in graphics). And PDFs are really not a great end user experience.
If you really want to rank for this, I might suggest creating this page as an HTML page instead of a PDF, as it might be able to be indexed more easily, and you might rank for it better.
I hope this helps?
Thanks,
-- Jeff
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Specific page does not index
Hi, First question: Working on the indexation of all pages for a specific client, there's one page that refuses to index. Google Search console says there's a robots.txt file, but I can't seem to find any tracks of that in the backend, nor in the code itself. Could someone reach out to me and tell me why this is happening? The page: https://www.brody.be/nl/assistentiewoningen/ Second question: Google is showing another meta description than the one our client gave in in Yoast Premium snippet. Could it be there's another plugin overwriting this description? Or do we have to wait for it to change after a specific period of time? Hope you guys can help
Intermediate & Advanced SEO | | conversal0 -
Google index
Hello, I removed my site from google index From GWT Temporarily remove URLs that you own from search results, Status Removed. site not ranking well in google from last 2 month, Now i have question that what will happen if i reinclude site url after 1 or 2 weeks. Is there any chance to rank well when google re index the site?
Intermediate & Advanced SEO | | Getmp3songspk0 -
Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page?
Does Google still don't index Hashtag Links ? No chance to get a Search Result that leads directly to a section of a page? or to one of numeras Hashtag Pages in a single HTML page? If I have 4 or 5 different hashtag link section pages , consolidated into one HTML Page, no chance to get one of the Hashtag Pages to appear as a search result? like, if under one Single Page Travel Guide I have two essential sections: #Attractions #Visa no chance to direct search queries for Visa directly to the Hashtag Link Section of #Visa? Thanks for any help
Intermediate & Advanced SEO | | Muhammad_Jabali0 -
Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site). And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
Intermediate & Advanced SEO | | nicole.healthline0 -
Google Plus Authorship
Situation Description: I have a website called Website A. I wish to migrate alot of the content from Website A to Website B. Website B will be on a completely different domain name and environment. Authors of Website A will act as contributing authors for Website B. It is also possible that other contributing authors of other websites C and D commit to writing content on Website B. Questions (1) Does it make sense to create a google plus profile under UserA@websiteA.com and link from content on websiteB to their google plus profile under UserA@websiteA.com? (2) Does AuthorRank affect PageRank? If yes, if I take the above approach would websiteA be effected or websiteB since the content writers of websiteA are contributing to websiteB? (3) Is it ok for userA to have a corporate google plus profile assuming he might also have another google plus profile under a different address? I always think it make sense that there exists a google plus profile at an employee level and another google plus profile at a personal level. (4) If an employee leaves the company, do I leave his/hers Google Plus profile alive? The fact that no more content would be published under that particular profile, would that negatively effect author rank over time? (5) Another interesting observation is that UsaToday, CNN etc do not use authorship? No authors link to their twitter profile or google plus profile. Shouln't they be doing this in terms of author rank or is author rank not that important? Thank you
Intermediate & Advanced SEO | | seo12120 -
Does Google index more than three levels down if the XML sitemap is submitted via Google webmaster Tools?
We are building a very big ecommerce site. The site has 1000 products and has many categories/levels. The site is still in construccion so you cannot see it online. My objective is to get Google to rank the products (level 5) Here is an example level 1 - Homepage - http://vulcano.moldear.com.ar/ Level 2 - http://vulcano.moldear.com.ar/piscinas/ Level 3 - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/ Level 4 - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/autocebantes.html/ Level 5 - Product is on this level - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/autocebantes/autocebante-recomendada-para-filtros-vc-10.html Thanks
Intermediate & Advanced SEO | | Carla_Dawson0 -
What Sources to use to compile an as comprehensive list of pages indexed in Google?
As part of a Panda recovery initiative we are trying to get an as comprehensive list of currently URLs indexed by Google as possible. Using the site:domain.com operator Google displays that approximately 21k pages are indexed. Scraping the results however ends after the listing of 240 links. Are there any other sources we could be using to make the list more comprehensive? To be clear, we are not looking for external crawlers like the SEOmoz crawl tool but sources that would be confidently allow us to determine a list of URLs currently hold in the Google index. Thank you /Thomas
Intermediate & Advanced SEO | | sp800 -
Google Indexed the HTTPS version of an e-commerce site
Hi, I am working with a new e-commerce site. The way they are setup is that once you add an item to the cart, you'll be put onto secure HTTPS versions of the page as you continue to browse. Well, somehow this translated to Google indexing the whole site as HTTPS, even the home page. Couple questions: 1. I assume that is bad or could hurt rankings, or at a minimum is not the best practice for SEO, right? 2. Assuming it is something we don't want, how would we go about getting the http versions of pages indexed instead of https? Do we need rel-canonical on each page to be to the http version? Anything else that would help? Thanks!
Intermediate & Advanced SEO | | brianspatterson0