How long does it take for customized Google Site Search to show results from pdf files?
-
The site in question is http://www.ejmh.eu
I am pretty unsatisfied with the results I am getting from the Site Search provided by Google.
We have over 160 pdf files in this subfolder: http://www.ejmh.eu/mellekletek
The files are the digital versions of articles. When I search for content in those pdf files, Google does not show results. It does show results from older pages, dating back 1-2 years but it is certainly not showing anything from pdf files that I have just put up 3 weeks ago.
My questions:
If I place a Google Search on a site, does it not automatically display results from ALL the content in the root domain?
Is there any correlation between how the Site Search is indexing the files and how Google is indexing the urls in general?
Should I just wait and see whether site search performance improves or should I switch to another Search software like Zoom Search?
It is vital to have a proper, high-quality search functioning on that site in the very near future.
What are your experiences? Any tips are greatly appreciated.
-
Hi, everyone: problem solved.
Here is what I did: I created a seperate sitemap-xml and linked to all the new pdfs.
I updated the general sitemap.xml and linked to the new sitemap as well.
I (re)submitted both sitempas via the Webmaster Tools.
Within a few hours, most of pdfs got indexed and the overall quality of search has improved dramatically. Thanks for all your help.
-
It may be a good idea to include all the pdf files on the sitemap, even if it is a troublesome process.
Otherwise it just takes too long for Google to index them.
What still surprises me is that even for a site search, you need to win the 'indexing battle'. I thought that Google indexes everythig within the map for the 'sake of the site search' and displays the results when a visitor is searching within the site. Less fancy softwares are actually doing the job. I thought a Google Site Search provides something even better.
-
Last crawl - thanks, great info.
yes, all new pdfs are linked from the html files.
This the summary page of one article: http://www.ejmh.eu/5archives_ppr_jaggle_061.html
In the middle of the page, you see 'download full text' - this is from where the individual papers (pdf) are linked.
-
Do you have the new PDFs Linked from pages like the old ones?
Try to create a page listing all the new PDFs, and basically Google might take time to recrawl your site and add these new PDFs ( by the way the last copy saved in Google Cache is from Feb 11)
-
You are great, thanks for your time. Yeah, I did check things out with this google command: there are pdf's listed but these are all old pdfs I have put up a long time ago. None of the pdfs I have put up recently are among those indexed.
Do you think that only those urls come up through a customized site search that are indexed by Google? Does Google not crawl the site and make a list of urls for the sake of the search purely? (Zoom search does it, for example) In theory, there could be two different type of 'crawls': one for the site search and one for the larger world, searching in the browser.
As for the settings...can you plase help me further: what exactly would you change?
-
if you check here all the pdf are indexed in google
so i will check the settings on CSE
reference here http://www.google.com/cse/docs/resultsxml.html#wsQueryTerms
-
Thanks for the tip, it's a good one. But they are all 100% texts.
-
If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.
so make sure all your PDF are 100% text that was converted to a PDF and not a "Scan" (image) of the original document that was saved as a PDF
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Bing search results - Site links
My site links in Bing search results are pulling through the footer text instead of the meta description (see image). Is there any way of controlling this? 2L2VusT
Technical SEO | | RWesley0 -
Google Serps Not Showing HTTPS in Front of URL
Hi Everyone, We implemented the HTTPS change to our four websites about 6 months ago. I have found something that I feel is strange. The homepage of each website shows www.domain.com, but all the internal pages show https://www.domain.com/page. If you click through it shows it as secure, but I feel that because it is happening on all four websites, that something was done incorrectly. Here is one Google SERP: https://www.google.com/search?client=firefox-b-1&biw=1920&bih=947&ei=gq9GWpizBuuF_Qa_p5e4Bw&q=tanzanite+jewelry+designs&oq=tanzanite+jewelry+designs&gs_l=psy-ab.3..0l2.130446.136028.0.136152.29.17.4.7.9.0.207.2214.7j9j1.17.0....0...1c.1.64.psy-ab..1.28.2350...0i131k1j0i22i30k1.0.BA5-meGmuA0 As you can see, our site displays with no https, but all the internal pages do. It just worries me as I have seen our internal pages increasing in positioning, but not our homepage. Any ideas?
Technical SEO | | vetofunk0 -
Link's that are an internal site search?
Hi hope your're all well. I sell Red, Blue, Green Widgets within each color I have many sub types, the subtypes change all the time,and a sub type has many variations in itself. I'd like to set up links that direct customers to popular searches of sub types say: widgets.com/red/blue-spots....search string... Will Google crawl these search links and see that there is good content behind it? How does Google handle links that are also a site search? Can it be bad and should I "no follow" them? Hope someone can give me some direction on these, many thanks in advance!
Technical SEO | | Thea880 -
How to: Sub headings appears in google results
I notice that many sites have sub headings appearing below their google results. http://i.imgur.com/A5JxMKD.png
Technical SEO | | kevinbp
See image for example. Q: what are these called? and how do we get them? A5JxMKD.png1 -
Do multipe empty search result pages count as duplicate content?
I am writing an online application that among other things allows the users to search through our database for results. Pretty simply stuff. My question is this. When the site is starting out, there will probably be a lot of searches that will bring back empty pages since we will still be building it up. Each page will dynamically generate the title tags, description tags, H1, H2, H3 tags - so that part will be unique - but otherwise they will be almost identical empty results pages until then. Would Google Count all these empty result pages as duplicate content? Anybody have any experience with this? Thanks in advance.
Technical SEO | | rayvensoft0 -
Google having trouble accessing my site
Hi google is having problem accessing my site. each day it is bringing up access denied errors and when i have checked what this means i have the following Access denied errors In general, Google discovers content by following links from one page to another. To crawl a page, Googlebot must be able to access it. If you’re seeing unexpected Access Denied errors, it may be for the following reasons: Googlebot couldn’t access a URL on your site because your site requires users to log in to view all or some of your content. (Tip: You can get around this by removing this requirement for user-agent Googlebot.) Your robots.txt file is blocking Google from accessing your whole site or individual URLs or directories. Test that your robots.txt is working as expected. The Test robots.txt tool lets you see exactly how Googlebot will interpret the contents of your robots.txt file. The Google user-agent is Googlebot. (How to verify that a user-agent really is Googlebot.) The Fetch as Google tool helps you understand exactly how your site appears to Googlebot. This can be very useful when troubleshooting problems with your site's content or discoverability in search results. Your server requires users to authenticate using a proxy, or your hosting provider may be blocking Google from accessing your site. Now i have contacted my hosting company who said there is not a problem but said to read the following page http://www.tmdhosting.com/kb/technical-questions/other/robots-txt-file-to-improve-the-way-search-bots-crawl/ i have read it and as far as i can see i have my file set up right which is listed below. they said if i still have problems then i need to contact google. can anyone please give me advice on what to do. the errors are responce code 403 User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/0 -
Google Search memory
Hi we have had the following statement from a member of our Japan office with regards google displaying search results, would anyone be able to give us a definitive answer on this. Google remembers previous non-mobile related searches For example, we already know that we come up on the first page if you select “kaigai keitai” (mobile phone for use abroad) and “UK” where as we don’t for searches where you replace the UK with the US or other countries. This means that if a customer, for example, does a search just on the UK e.g. using words like UK travel, London, millennium dome, etc. and then does a separate search just using the words “kaigai keitai” that google could show us as a link on the first page. However, if an individual did a search on Paris, France, Eiffel Tower, and then did a search for “kaigai keitai”, our link might not appear on the page. I don’t know if we have tested this already, but Google seems to have a very long “memory” and I could see this kind of aspect of Google resulting in us missing significant business from people going to the US, France, Italy, etc. Any thoughts?
Technical SEO | | -Al-0 -
How do I get Google to display categories instead of the URL in results?
I've seen that for some domains Google will show a nice clickable site heirarchy in place of the actual URL of a search result. See attached for an example. How do I go about achieving this type of results? categorized.png
Technical SEO | | Carlito-2569610