When beginning the architecting of the site I’ve been working on I knew I would need to address two issues (among many others, but for now we’ll just cover these two): 1) How to structure a multilingual website physically, and 2) How to address the sitemap.xml structure for the site as a whole.
First I had to decide how the site should be physically structured. Would a subdomain-per-language be good, e.g, en.mysite.com for English, es.mysite.com for Spanish, and ru.mysite.com for Russian? Or would it be better to use directories for the distinction, e.g. www.mysite.com/en/ for English, etc? If I chose the subdomain route it would be easy to build sitemap.xml files for each domain. But how would I structure the sitemap.xml if using directories?
OK, so now I have my structure in place, how do I build the sitemap.xml? I don’t want one huge monolithic file for the entire site. Even though at current count there are only around 100 html files per translation (not huge by any means, but also not insignificant), I would just personally prefer to keep the translations in their own separate sitemap.xml files. Those of you familiar with sitemaps will have been shouting at your monitors by now “Use a sitemap index, dork!”, and you’d be right. I just wasn’t sure that Google would support this. Google didn’t seem to mention it anywhere in their webmaster tools documentation (though I could have just missed it).
I’m happy to report that Google does in fact support sitemap indexes, and I’m fairly certain that MSN and Yahoo! do as well. So, simply build yourself a sitemap_index.xml (the filename is arbitrary) file that looks like this:
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.mysite.com/sitemap_en.xml</loc> </sitemap> <sitemap> <loc>http://www.mysite.com/sitemap_es.xml</loc> </sitemap> <sitemap> <loc>http://www.mysite.com/sitemap_ru.xml</loc> </sitemap> </sitemapindex>
Then build your individual sitemap files as you normally would. You can find the full specifications for sitemaps at sitemaps.org, and a nifty utility to help you automatically build sitemap files at the google-sitemap_gen project. Dont forget to include your new sitemap index file in your robots.txt file! Enjoy.
UPDATED: December 8, 2009 – Corrected my syntax on the xml. D’oh!