Ensuring Assets (PDFs, PowerPoint Files, Word Docs, etc.) are Indexable on Site
-
Hi there - I'm working on an educational site in which users will be able to search our repository of PDF articles, PowerPoint files, and so on through an on-site search engine. What is the best way to ensure each of these documents/assets are indexable by Google since they technically don't reside on an HTML page....they are just pulled up if the user searches for them? The site itself is just a few pages, but the files, articles, and videos in the repository are in the hundreds. Should I just name and tag them properly and make sure they're all included in an XML site map? Anything else suggested?
Thanks very much!
-
The more links a sitemap the it harder it is for people to follow but should be ok for search spiders.
-
Thanks for your response Chris! Good suggestion on the HTML sitemap. Any concerns if there are a couple of hundred links on this HTML site map page?
-
I would build 2 sitemaps for these files, 1 XML sitemap and 1 HTML sitemap, separate from the main sitemap and add these to Google WMT. The HTML Sitemap could also be used as a directory for visitors too.
Where possible link to the documents from the site too, this will increase the chances that the assets are indexed by Google.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google indexing .com and .co.uk site
Hi, I am working on a site that is experiencing indexation problems: To give you an idea, the website should be www.example.com however, Google seems to index www.example.co.uk as well. It doesn’t seem to honour the 301 redirect that is on the co.uk site. This is causing quite a few reporting and tracking issues. This happened the first time in November 2016 and there was an issue identified in the DDOS protection which meant we would have to point www.example.co.uk to the same DNS as www.example.com. This was implemented and made no difference. I cleaned up the htaccess file and this made no difference either. In June 2017, Google finally indexed the correct URL, but I can’t be sure what changed it. I have now migrated the site onto https and www.example.co.uk has been reindexed in Google alongside www.example.com I have been advised that the http needs to be removed from DDOS which is in motion I have also redirected http://www.example.co.uk straight to https://www.example.com to prevent chain redirects I can’t block the site via robot.txt unless I take the redirects off which could mean that I lose my rankings. I should also mention that I haven't actually lost any rankings, it's just replaced some URLs with co.uk and others have remained the same. Could you please advise what further steps I should take to ensure the correct URL’s are indexed in Google?
Technical SEO | | Niki_10 -
Search Console rejecting XML sitemap files as HTML files, despite them being XML
Hi Moz folks, We have launched an international site that uses subdirectories for regions and have had trouble getting pages outside of USA and Canada indexed. Google Search Console accounts have finally been verified, so we can submit the correct regional sitemap to the relevant search console account. However, when submitting non-USA and CA sitemap files (e.g. AU, NZ, UK), we are receiving a submission error that states, "Your Sitemap appears to be an HTML page," despite them being .xml files, e.g. http://www.t2tea.com/en/au/sitemap1_en_AU.xml. Queries on this suggest it's a W3 Cache plugin problem, but we aren't using Wordpress; the site is running on Demandware. Can anyone guide us on why Google Search Console is rejecting these sitemap files? Page indexation is a real issue. Many thanks in advance!
Technical SEO | | SearchDeploy0 -
Number of index pages in web master is different from site:mydomainname
Google says one to discover whether my pages is index in Google is site:domain name of my website: https://support.google.com/webmasters/answer/34444?hl=enas mention in web page above so basically according to that i can know totally pages indexed for my website right:it shows me when type (site:domain name ) 300 but it says in Google web master that i have 100000so which is the real number of index page 300 or 1000000 as web master says and why i get 300 when using site:domain name even Google mention that it is way to discover index paged
Technical SEO | | Jamalon0 -
Any ideas why this site is being penalized?
http://www.my-french-house.com/ has been online since around 2004 and has nearly always been in the top 10 serps for terms like 'property for sale in france'. However, over the last 12 months we've been hit really hard by Google and have fallen dramatically in rank. Can anyone give any insight into what may have happened? As an aside, we've had no message in the Google Webmaster Console and have not contacted Google about the apparent penalty / penalization. Any help or advice would be greatly appreciated. Cheers Jim
Technical SEO | | jimpannell0 -
Do index.php extensions count as duplicate content on Joomla sites?
When i run my error report, i see 2 duplicate pages, but both are the main domain and then the /index.php extension. how do i fix this? does it really count as duplicate content?
Technical SEO | | valetseo0 -
Remove Site from Google
How can I get my website out of google? I want all pages completely gone. Thanks!
Technical SEO | | tylerfraser0 -
Site maps, Is there any benefit?
I have a relatively simple and small site (60 pages). All of it is crawlable and there is nothing I want non follow. So, is there any real benefit to a sitemap since Google can get to all the site anyway? Do they give the site more credence or something because it's there? I guess as an aside, are there any favorite sites that will generate a site map? Thanks!
Technical SEO | | Banknotes0