PDF web traffic hitting our site
-
Hi there,
Over the last few months our traffic has spiked due to irrelevant pdf documents sending us crap traffic, our bounce rate is sky high as well as other metrics. I don't want to just filter out this traffic in GA rather try and stop our site from being attacked.
Any advice on a way forward would be great.
Thanks
-
Based on this I don't think you have anything to worry about. It doesn't appear to be an attack, as you described in your original post. An actual attack on your website would have much higher volume. The worst this could possibly be is spam, which is mainly just annoying.
Easy solution: you don't want to filter out this traffic from GA because it may be useful at some point. So just create another view in GA, and name it "unfiltered". This view will have no filters and you can see all traffic in its raw glory. In your main view, name it something like "master" or "the one view to view them all" or whatever you want and set filters to remove that traffic from view.
Personally it looks more to me like these are old pdfs that other websites are linking to, which is what your hosting provider has also said. Your best move here is actually to setup redirects to relevant pages to recapture some of those links that are probably ending in 404s and get some link equity to important pages.
-
HI Alick, seems to be coming from an external source, I've included a screen grab for you too.
I've also discussed this with our hosting provider who gave the following response:
Thanks for the info from Webmaster Tools. That screenshot that shows the HTTP response is just showing that a request to http://www.icmp.co.uk/lulu-the-lioness-a-heroines-story.pdf throws a 301 redirect over to https://www.icmp.ac.uk/lulu-the-lioness-a-heroines-story.pdf — this runs because of the standard HTTPS/primary domain redirect code in settings.php and unfortunately doesn’t tell us much here.
I pulled down the database again and ran a search for a few of these filenames, and those came up empty. Looks like these don’t touch Drupal at all. When we saw them in the database before, in the sessions table, that was likely just because that filter module was storing browser history in user session data for some reason.
I did a little research here, and I think that leaves a few potential causes:
Another site is linking to these files (even though they don’t exist), and this is where Google is picking up/indexing the URLs from. This should be checkable in Google Analytics if you look at Referrals to those files.
These were listed on the sitemap at some point (but not any longer: https://www.icmp.ac.uk/sitemap.xml).
These files existed at some point in the past, but have since been deleted.
There was a DNS misconfiguration at some point, and that domain name was pointing to a different server where these files did exist.
While these are a little annoying to see in Analytics, from what I’ve read, 404s don’t negatively impact the site from an SEO standpoint, and there’s no evidence that the site itself is compromised at all, so unless we see evidence otherwise, I wouldn’t worry about these.
-
Hi,
Pdf trafic from your own site or other sites?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How do I exclude fake direct load traffic from networks in Google Analytics?
Starting on Friday 1/20, we noticed a huge, unnatural spike in Direct Load traffic. While researching where it was coming from, the big flags were huge spikes in countries that normally only have <5 sessions a month like Russia, Singapore, Brazil, etc., each sending 1400 a week, with >99% bounce rate and <0:00:05 average session duration. While looking into networks, we saw an influx in Networks that had never sent traffic before, each with >1300 sessions a week, 100% bounce rate, and 0:00:00 session duration. The list of these Networks are: astute hosting usa incorporated
Reporting & Analytics | | ServiceMichael
nephoscale inc.
network transit holdings llc
serverbeach
coreix ltd
2ezhost llp
nforce entertainment b.v.
mir telematiki ltd
servers australia pty ltd wholesale services provider for abuse
reliablehosting
dimenoc servicos de informatica ltda
c0715718213 I have seen a lot of guides of filtering out Referral traffic, but these are all coming in as Direct Load and are skewing our Direct Load results. Any idea how to filter or remove this traffic from Google Analytics?0 -
Alternative tools for Keyword Traffic
Hi There, Wondering if anyone has any other tools they would recommend using for finding out keyword traffic on websites. Currently (and I'm sure like most), my website is connected to Google Analytics and Google Search Console. My biggest frustration becomes the "(not set)" variable that appears when I go to review the keywords section. It's always such a large number and I have no way of finding out what people might be typing in and coming across my website. Of course, I understand the privacy factor as to why Google must do this but it's certainly difficult to analyze what's working and what's not. Any tips, tricks or suggestions are greatly appreciated! Thanks, Lindsay
Reporting & Analytics | | MainstreamMktg0 -
Free Media Site / High Traffic / Low Engagement / Strategies and Questions
Hi, Imagine a site "mediapalooza dot com" where the only thing you do there is view free media. Yet Google Analytics is showing the average view of a media page is about a minute; where the average length of media is 20 - 90 minutes. And imagine that most of this media is "classic" and that it is generally not available elsewhere. Note also that the site ranks terribly in Google, despite having decent Domain Authority (in the high 30's), Page Authority in the mid 40's and a great site and otherwise quite active international user base with page views in the tens of thousands per month. Is it possible that GA is not tracking engagement (time on site) correctly? Even accounting for the imperfect method of GA that measures "next key pressed" as a way to terminate the page as a way to measure time on page, our stats are truly abysmal, in the tenths of a percentage point of time measured when compared with actual time we think the pages are being used. If so, will getting engagement tracking to more accurately measure time on specif pages and site signal Google that this site is actually more important than current ranking indicates? There's lots of discussion about "dwell time" as this relates to ranking, and I'm postulating that if we can show Google that we have extremely good engagement instead of the super low stats that we are reporting now, then we might get a boost in ranking. Am I crazy? Has anyone got any data that proves or disproves this theory? as I write this out, I detect many issues - let's have a discussion on what else might be happening here. We already know that low engagement = low ranking. Will fixing GA to show true engagement have any noticeable impact on ranking? Can't wait to see what the MOZZERS think of this!
Reporting & Analytics | | seo_plus0 -
What penalty might have hit here (screenshot attached)
I've had a spiky organic traffic profile for some time and wondered if anyone could suggest what penalty might have hit me. Something did in July but then when it recovered (not sure how) I ended up with double the traffic. Then it dipped, recovered a but and has slowly declined since. I've had no manual penalties so it must be algorithmic which means I can fix it. The problem is I don't know what the issues is! Any help appreciated! Ae00Lzl.png
Reporting & Analytics | | SamCUK0 -
What is the impact of a panda refresh on a Pandalized site?
When a panda refresh hits and you have a pandalized site, If the site were to de-pandalized, would you see traffic back to pre-panda levels right away? Or any type of movement right away?
Reporting & Analytics | | jessefriedman0 -
Tips for migrating Google News traffic?
We are about to relaunch a site that gets a lot of Google News traffic. We are not changing domain, but the site structure is changing greatly, with the URLs of both news index pages and articles being shaken up. Obviously, we've 301-ed every page to its closest equivalent on the new site. We've also got a news sitemap. As we are not changing domain, is there anything further we need to do to help protect our Google News traffic. On a related note, does anyone have a relaible way of measuring traffic from Google News listings in universal search?
Reporting & Analytics | | Dennis-529610 -
Google News traffic spike mystery; referring URLs all blank, Omniture tags didn't fire.
Our content is occasionally featured in Google News. We recently have had two episodes where this happened, but (a) nearly all the referring URLs were blank, and (b) our backend logs show 3-4x more requests for the article in question than Omniture does. In other words, hundreds of thousands of visitors requested a URL from our site (as proven by the traffic logs), but don't seem to have come from Google News (because HTTP_REFERER was blank), and didn't execute the onpage javascript tag to notify Omniture of the pageview. Perhaps this has nothing to do with Google News, but it is too strong a coincidence that the two times we were on there recently, the same thing happened: big backend traffic spike that is not seen by Omniture. It is as if Google News causes browsers to pre-fetch our article without executing the javascript on the page. And without sending a referring URL. Has anyone else seen anything like this before? Stats from the recent episode:
Reporting & Analytics | | mcglynn
- 835,000 HTTP requests for the article URL (logged by our servers) - these requests came from 280,000 distinct IP addresses (70% US) - the #1 referring URL is blank. This accounts for 99.4% of requests. Which, in itself, is hard to believe. These people had to come from somewhere. I believe browsers don't pass HTTP_REFERER when you click from an SSL page to a non-SSL page, but I think Google News doesn't bounce users to SSL by default.That said, we do see other content pages with 70-90% blank referring URLs. Rarely 99+% though.0 -
Google Analytics internal Site Search - Destination pages dispaly Search results
Hi, Im having a bit of an issue with Google Analytics internal site search, I am able to currently track the search terms through my website internal search but when I click onto destination pages I just get the search result page. When clicking destination pages I would expect to get the pages on which the user ended up after the results page, instead I just get the results page which is pretty much useless ?submitsearchXXXXXX hope you can help, look forward to your response. Thanks,
Reporting & Analytics | | Tug-Agency1