Can Google index the text content in a PDF?
-
I really really thought the answer was always no. There's plenty of other things you can do to improve search visibility for a PDF, but I thought the nature of the file type made the content itself not-parsable by search engine crawlers...
But now, my client's competitor is ranking for my client's brand name with a PDF that contains comparison content.
Thing is, my client's brand isn't in the title, the alt-text, the url... it's only in the actual text of the PDF.
Did I miss a major update? Did I always have this wrong?
-
Yes they can crawl and index also the contents of PDF's and they are doing that extensively. Its nothing new actually. As long as the contents of the PDF is not only images but also text they will be able to scan the actual text.
Interesting article with tips to make your PDF's SEO-friendly: https://www.searchenginejournal.com/pdf-seo-best-practices/59975/
Cheers,
Cesare
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you use no-index to counter duplicate content across separate domains?
Hi Moz Community, I have a client who is splitting out a sub brand from a company website to its own domain. They have lots of content around the theme and they want to migrate most of the content out to the new domain, but they also wanted to keep that content on the main site as the main site gets lots of traffic. My question is, as they want search traffic to go to the new site, but want to keep the best content on the original site too, so it can be found in the nav, if they no-index identical content on main site and index content on the new site will they still be penalised for duplicate content? Our advice has been to keep the thematic content on both sites but make them different enough so they are not considered duplicate - we routinely write the same blog post in 50 different ways for them but their Head of Web asked if the no-index is a route, which means they don't need to pay for and wait for brand new content? They are comfortable in losing traffic until the new domain gets traction. In theory, if they are telling Google not to index or rank the main site content, the new site shouldn't be penalised but I'm not confident giving that advice as I've never been asked to do this before. Thoughts?
Technical SEO | | Algorhythm_jT0 -
How can I provide titles and descriptive text for our list of USPs on the same page optimized both for usability and SEO
I am rebuilding our website together with an agency and I am stuck with the following problem: We have a page which will provide the visitor with a quick and convincing impression why he should chose our enterprise. On this page we want to show our USPs (Unique Selling Points) each with a title and a short description. Now my preferred way of presenting those USPs would be of a list of the titles (which permits to see all USPs without having to read a lot of text) where each title can be clicked to expand the description (in case you want to know more about this specific USP) and if you click on another title the previously clicked title description will collapse and the new description expand and so on (similar to this page: http://www.berlin-city-immobilien.de/38.html - I'm talking about the list in the middle of the page starting with the headline "Dabei profitieren Sie von folgenden Vorteilen"). Since I also want to use these descriptions as on page SEO-texts I checked whether Google might not index or at least value "click to expand content" less than plain text in the body of the page and I stumbled over this article: https://www.seroundtable.com/google-hidden-tab-content-seo-19489.html. According to this article Google will definitely discount the descriptions on my page. Does anyone have an idea how to solve this problem? Either by suggesting a different way to show titles and descriptions on the page or maybe by suggesting a workaround so Google will not treat the descriptions as "click to expand text". Thank you already in advance for your input.
Technical SEO | | Benni
Ben0 -
Page disappeared from Google index. Google cache shows page is being redirected.
My URL is: http://shop.nordstrom.com/c/converse Hi. The week before last, my top Converse page went missing from the Google index. When I "fetch as Googlebot" I am able to get the page and "submit" it to the index. I have done this several times and still cannot get the page to show up. When I look at the Google cache of the page, it comes up with a different page. http://webcache.googleusercontent.com/search?q=cache:http://shop.nordstrom.com/c/converse shows: http://shop.nordstrom.com/c/pop-in-olivia-kim Back story: As far as I know we have never redirected the Converse page to the Pop-In page. However the reverse may be true. We ran a Converse based Pop-In campaign but that used the Converse page and not the regular Pop-In page. Though the page comes back with a 200 status, it looks like Google thinks the page is being redirected. We were ranking #4 for "converse" - monthly searches = 550,000. My SEO traffic for the page has tanked since it has gone missing. Any help would be much appreciated. Stephan
Technical SEO | | shop.nordstrom0 -
Google rankings strange behaviour - our site can only be found when searching repeatedly
Hello, We are experiencing something very odd at the moment I hope somebody could shed some light on this. The rankings of our site dropped from page 2 to page 15 approx. 9 months ago. At first we thought we had been penalised and filed a consideration request. Google got back to us saying that there was no manual actions applied to our site. We have been working very hard to try to get the ranking up again and it seems to be improving. Now, according to several serps monitoring services, we are on page 2/3 again for the term "holiday lettings". However, the really strange thing is that when we search for this term on Google UK, our site is nowhere to be found. If you then right away hit the search button again searching for the same term, then voila! our website is on www.alphaholidaylettings.com page 2 / 3! We tried this on many different computers at different locations (private and public computers), making sure we have logged out from Google Accounts (so that customised search results are not returned). We even tried the computers at various retail outlets including different Apple stores. The results are the same. Essentially, we are never found when someone search for us for the first time, our site only shows up if you search for the same term for the second or third time. We just could not understand why this is happening. Somebody told me it could be due to "Google dance" when indices on different servers are being updated, but this has now been going on for nearly 3 months. Has anyone experienced similar situations or have any advice? Many thanks!
Technical SEO | | forgottenlife0 -
Staging site and "live" site have both been indexed by Google
While creating a site we forgot to password protect the staging site while it was being built. Now that the site has been moved to the new domain, it has come to my attention that both the staging site (site.staging.com) and the "live" site (site.com) are both being indexed. What is the best way to solve this problem? I was thinking about adding a 301 redirect from the staging site to the live site via HTACCESS. Any recommendations?
Technical SEO | | melen0 -
Can You Use More Then One Google Local Rich Snippet on a single site/ on a single page.
I am currently working on a website for a business that has multiple office locations. As I am trying to target all four locations I was wondering if it is okay to have more then one Local Rich Snippet on a single page. (For example they list all four locations and addresses within their footer and I was wondering if I could make these local rich snippets). What about having more then one on a single website. For example if a company has multiple offices located in several different cities and have set up individual contact pages for these cities, can each page have it's own Local Rich Snippet? Will Google look at these multiple "local rich snippets" as spaming or will they recognize the multiple locations and count it towards their local seo?
Technical SEO | | webdesignbarrie1 -
How do I get content to be indexed at the top?
I have a paragraph at the top of my homepage. I was told I could use css to make the content visually appear at the bottom of the page but it would still get indexed at the top of the page, still giving it the same level of importance. Can anyone tell me how to do this?
Technical SEO | | BradBorst0 -
Getting Google to index new pages
I have a site, called SiteB that has 200 pages of new, unique content. I made a table of contents (TOC) page on SiteB that points to about 50 pages of SiteB content. I would like to get SiteB's TOC page crawled and indexed by Google, as well as all the pages it points to. I submitted the TOC to Pingler 24 hours ago and from the logs I see the Googlebot visited the TOC page but it did not crawl any of the 50 pages that are linked to from the TOC. I do not have a robots.txt file on SiteB. There are no robot meta tags (nofollow, noindex). There are no 'rel=nofollow' attributes on the links. Why would Google crawl the TOC (when I Pinglered it) but not crawl any of the links on that page? One other fact, and I don't know if this matters, but SiteB lives on a subdomain and the URLs contain numbers, like this: http://subdomain.domain.com/category/34404 Yes, I know that the number part is suboptimal from an SEO point of view. I'm working on that, too. But first wanted to figure out why Google isn't crawling the TOC. The site is new and so hasn't been penalized by Google. Thanks for any ideas...
Technical SEO | | scanlin0