Having issues crawling a website
-
We looked to use the Screaming Frog Tool to crawl this website and get a list of all meta-titles from the site, however, it only resulted with the one result - the homepage.
We then sought to obtain a list of the URLs of the site by creating a sitemap using https://www.xml-sitemaps.com/. Once again however, we just go the one result - the homepage.
There is something that seems to be restricting these tools from crawling all pages. If you anyone can shed some light as to what this could be, we'd be most appreciative.
-
That robots.txt should be fine.. its not blocking anything.
The reason the crawl is stopping on the homepage is this code:
<meta name="<a class="attribute-value">robots</a>" content="<a class="attribute-value">nofollow</a>">
Which tells bots to not follow any links on the page. Remove that and you should be good.
-
Hi,
I think it is your robots.txt file that is causing the issue. At the moment you have the following:
**User-agent: ***
Disallow:
I would recommend updating it to the following:
**User-agent: ***
Allow: /
Moz also has a good post about what else you can include in your robots.txt file for best practices etc. :
https://moz.com/learn/seo/robotstxt
Hope that helps
Thanks
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I want to use a photo from an official website for my own website.IF YES HOW?
Lets suppose i downloaded a photo from a XYZ website and want to use it on my own website, and also i want to rank for same keyword, and would like to rank just below XYZ site, i know there could be copyright issue. what can be done to avoid this issue. Can i manipulate the picture in a such way that it is usable. if yes how? How can i use that official websites picture for my website, i mean, can i cite that website as a source? what is the best practice in this case? i dont want to use stock photo,i really like xyz sites pics.
Intermediate & Advanced SEO | | Sam09schulz0 -
How to manage user profiles in your website?
We have a real estate website in which agents and builders can create their profiles. My question is shall we use h1 or h2 tags in business profile pages or make them according to web 2.0 standards? In case header tags are used, if two agents have the same name and we have used h2 tag for them, then search result page will end up having two same h2's. Can someone please tell me the right way to manage business profiles in a website? Thanks
Intermediate & Advanced SEO | | dailynaukri1 -
D.A. and Link Juice from certain websites
Hello, i'm a bit confused concerning some link buildings. What happens with backlinks from blogs coming from powerful domains such as abc.yahoo.com or abc**.over-blog.com or abc.blogspot.com?** Meaning, everyone that creates a blog over there will have a PA of 0/100 but a DA of 80 or 90/100.
Intermediate & Advanced SEO | | prozis
Will Google consider DA on these cases? I'm confused because it can't be that simple that someone creates a website, and after some months they will have like 10 PA but still the 90/100 DA and their links can be a powerful backlink. Can you explain me how Google sees that? So, if I have a link coming from a blog on those domains will it be better than any other with same PA but lower DA?0 -
AMP pages for a responsive Ecommerce website?
Howdy guys, I'm wondering if AMP is worthwhile intergrating into a responsive e-commerce site? I'm under the impression that the benefits of AMP would be focused around speed, however it may come at the cost of conversion rate if it was to be delivered for product pages, etc. I'm presuming that even if AMP was on every page across a responsive ecommerce site, Google would only display AMP pages in the carousel for news articles, such as on the integrated blog? Any advice would be awesome! Thanks guys 🙂
Intermediate & Advanced SEO | | JAR8970 -
Prevent Google from crawling Ajax
With Google figuring out how to make Ajax and JS more searchable/indexable, I am curious on thoughts or techniques to prevent this. Here's my Situation, we have a page that we do not ever want to be indexed/crawled or other. Currently we have the nofollow/noindex command, but due to technical changes for our site the method in which this information is being implemented if it is ever displayed it will not have the ability to block the content from search. It is also the decision of the business to not list the file in robots.txt due to the sensitivity of the content. Basically, this content doesn't exist unless something super important happens, and even if something super important happens, we do not want Google to know of its existence. Since the Dev team is planning on using Ajax/JS to pull in this content if the business turns it on, the concern is that it will be on the homepage and Google could index it. So the questions that I was asked; if Google can/does index, how long would that piece of content potentially appear in the SERPs? Can we block Google from caring about and indexing this section of content on the homepage? Sorry for the vagueness of this question, it's very sensitive in nature and I am trying to avoid too many specifics. I am able to discuss this in a more private way if necessary. Thanks!
Intermediate & Advanced SEO | | Shawn_Huber0 -
Http - Https Issue
Hey there Mozzers, I have a site that few months ago went from being http - https. All the links redirect perfect but after scanning my site with Screaming Frog i get a bunch of 503 errors. After looking into my website I see that a lot of links in my content and menu have as a link the http url. For example my homepage has content that interlinks to the http version of the site. And even though when I test it it redirects correctly after scanning with Screaming frog it reports back as 503. Any ideas what's going on? Thanks in advance
Intermediate & Advanced SEO | | Angelos_Savvaidis0 -
API to power all websites
I spoke to one of my lead web developers and we are planning on powering all of our desktop, mobile and tablet sites with our new API. Everything will be populated through javascript and is cloud-based rather than through actual HTML. How do we incorporate all of our SEO?
Intermediate & Advanced SEO | | recbrands0 -
Different domains for multilingual website
Hey guys, A site that I'm currently working on as different domains for each website language. So for example: word1word2.com for the english version word3word4.com for the french version word5word6.com for spanish version .... Is it better to move all of the different languages to the same domain and use subfolders for each language /fr/... Please note that the domains being used bring in organic traffic as well as they are EMDs. Thank You.
Intermediate & Advanced SEO | | BruLee0