Sitemap.xml Update Cycle

Hello,

We have added several Unit Newsrooms in our https://news.iu.edu.

We have updated sitemaap.custom.xml with the new sites :

	<url>
		<loc>https://news.iu.edu/kokomo/</loc>
		<changefreq>daily</changefreq>
		<priority>0.9</priority>
	</url>
	<url>
		<loc>https://news.iu.edu/it/</loc>
		<changefreq>daily</changefreq>
		<priority>0.9</priority>
	</url>
	<url>
		<loc>https://news.iu.edu/columbus/</loc>
		<changefreq>daily</changefreq>
		<priority>0.9</priority>
	</url>
	<url>
		<loc>https://news.iu.edu/east/</loc>
		<changefreq>daily</changefreq>
		<priority>0.9</priority>
	</url>

However, while /kokomo/ does crawl and sitemap.news.xml does get updated with Kokomo’s stories, east is not crawled and the file sitemap.news.xml is not updated with their stories.

We are wondering if there is CRON job that runs and would like to know the frequency of that.

Thanks,

Akbar

Hi Akbar,

The sitemap.news.xml gets updated daily. If you have news stories that aren’t being included, it may be for some other reason?

Here’s a list of what gets included and excluded, taken from the document Sitemaps and robots.txt - LiveWhale Support

What is included

  • All content meeting these criteria:
    • marked Live
    • visible to “Everyone”
  • All pages meeting these criteria:
    • marked Live
    • visible to “Everyone”
    • page is in one of your navigations

What is excluded

  • All content and pages marked Hidden
  • All content and pages visible to “This group only,” “Logged-in users,” or “Anyone with the link”
  • All content that is archived
  • Pages that are not in any navigation
  • Old content (e.g., news stories that are several years old)
  • Anything in your /_ingredients/ folder (e.g., templates)
  • Anything in a folder that starts with /_ (e.g., /_sample/index.php)
  • Any page that contains “.test.” in the filename (e.g., index**.test.**php)
  • Any page that contains “/test/“ in the filepath (e.g., /admissions**/test/**open-house/index.php)

For example, maybe if stories are archived in your “East” group, or have dates older than a few years, that could explain why they are not included in the daily sitemap update?

Karl,
I checked on the server and it seems sitemap.livewhale.xml, sitemap.news.xml and sitemap.profiles.xml have not been updated since 11/17/2023 at 1:07 pm.

sitemap.custom.xml is the only updated file where we added new news sites to be included in the XML file.

Also, robots.txt file have only disallow for college.

IU East has stories from 2024:

https://news.iu.edu/east/

Thanks,

Akbar

Thanks Akbar – it looks like manually refreshing (using /live/sitemap/update?refresh=1 from a logged-in user) has worked in the short term, and I’ll relay to our team to help investigate why that scheduled daily job wasn’t completing the process of updating those .xml sitemaps.