Hi all, our tech team inherited a bit of an SEO pickle. I manage a freemium React JS app built for 80k unique markets worldwide (and associated dedicated URL schema). Ex/ https://www.airdna.co/vacation-rental-data/app/us/california/santa-monica/overview
Mistake - App, in its entirety, was indexed by Google in July 2018, which basically resulted in duplicate content penalties because the unique on-page content wasn't readable.
Partial Solution - We no indexed all app pages until we were able to implement a "pre-render" / HTML readable solution with associated dynamic meta data for the Overview page in each market. We are now selectively reindexing only the free "Overview" pages that have unique data (with a nofollow on all other page links), but want to persist a noindex on all other pages because the data is not uniquely "readable" before subscribing. We have the technical server-side rules in place and working to ensure this selective indexing.
Question - How can we force google to abandoned the >300k cached URLs from the summer's failed deploy? Ex/ https://screencast.com/t/xPLR78IbOEao, would lead you to a live URL such as this which has limited value to the user, https://www.airdna.co/vacation-rental-data/app/us/arizona/phoenix/revenue (Note Google's cached SERPs also have an old URL structure, which we have since 301ed, because we also updated the page structure in October). Those pages are currently and will remain noindexed for the foreseeable future. Our sitemap and robots.txt file is up-to-date, but the old search console only has a temporary removal on a one-by-one basis. Is there a way to do write a rule-based page removal? Or do we simply render these pages in HTML and remove the nofollow to those links from the Overview page so a bot can get to them, and then it would see that there's a noindex on them, and remove them from the SERPs?
Thanks for your help and advice!