Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to tell if PDF content is being indexed?
-
I've searched extensively for this, but could not find a definitive answer.
We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines.
When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed?
I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct?
Since WordPress has you upload PDFs like they are an image could this be causing the problem?
Would it make sense to take the time and extract all of the PDF content to html?
Thanks for any assistance, this has been driving me crazy.
-
Kyle,
Thanks for the quick response. The data is being displayed in the title and meta description field. I also did some searches for specific terms with my parameter search from our site and filetype:pdf, which shows that the content is being indexed. It also shows that the PDF titles and meta descriptions are not optimized, so I have some work there.
Thanks,
Anthony
-
Is the data being displayed in the title and meta description in the SERP content from the PDF?
If so, then yes, they are being indexed/crawled.
Regards,
Kyle
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Collapsible sections - content
**Hi,****I am looking to improve the aesthetics of some pages on my website by adding written content into collapsible tabs. I was wondering whether the content that is ‘hidden’ by tabs is given less weight by Google from the perspective of SEO? **Some articles I have read suggest that tabbed content is weighted equally with the content which is already immediately visible to the user, but others suggest that this may not be the case. **Please, can I request opinions on the matter? Any advice would be greatly appreciated, many thanks.**Katarina
Technical SEO | | Katarina-Borovska0 -
What to do with old content after 301 redirect
I'm going through all our blog and FAQ pages to see which ones are performing well and which ones are competing with one another. Basically doing an SEO content clean up. Is there any SEO benefit to keeping the page published vs trashing it after you apply a 301 redirect to a better performing page?
Technical SEO | | LindsayE0 -
Content in Accordion doesn't rank as well as Content in Text box?
Does content rank better in a full view text layout, rather than in a clickable accordion? I read somewhere because users need to click into an accordion it may not rank as well, as it may be considered hidden on the page - is this true? accordion example: see features: https://www.workday.com/en-us/applications/student.html
Technical SEO | | DigitalCRO1 -
No Index PDFs
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
Technical SEO | | MonicaOConnor0 -
No index on subdomains
Hi, We have a subdomain that is appearing in the search results - I want to hide this as it looks really bad. If I were to add the no index tag to the sub domain would URL would this affect the whole domain or just that sub domain? The main domain is vitally important - it is just that sub domain I need to hide. Many thanks
Technical SEO | | Creditsafe0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0