Introducing EPS.Extensions.SiteMapIndex

06/22/2020

Introducing EPS.Extensions.SiteMapIndex

We built a library for building large site maps and site map indexes.

We're happy to announce the release of the EPS.Extensions.SiteMapIndex package. You may be asking yourself, aren't there already plenty of SiteMap packages on NuGet? 🤔 The answer is yes - yes there are. But there didn't appear to be anything that supported site map indexes in addition to site maps, so here we are.

In our package, we have a SiteMap object which takes a System.Collections.Stack of Location objects which are transformed into an XML document stored in a MemoryStream, which can then be used to copy into an HttpResponse, a ContentResult, or however you intend to deliver the payload - it's really up to you at that point. The SiteMapIndex essentially works the same way, only it has the SiteMaps in a collection that it parses into an XML document stream.

What gets tricky is trying to map all of these files to routes in ASP.NET Core. Fortunately, packages like Carter (the sequel to the much-vaunted Nancy project from the 'traditional' ASP.NET days) makes this almost trivial.

We built a sample project in our repository to show how this works. We load three sitemaps with 50,000 URLs and build a sitemap index to show the search engines where they are.

 public class SiteMapModule: CarterModule
    {
        private readonly IAppCache cache;
        public SiteMapModule(IAppCache appCache)
        {
            cache = appCache;
            cache.GetOrAdd("sitemap1.xml", () => getSiteMap("https://localhost:10623/sitemap1.xml"));
            cache.GetOrAdd("sitemap2.xml",() => getSiteMap("https://localhost:10623/sitemap2.xml"));
            cache.GetOrAdd("sitemap3.xml",() => getSiteMap("https://localhost:10623/sitemap3.xml"));
            cache.GetOrAdd("sitemap.xml",() => getSiteMapIndex("https://localhost:10623/sitemap.xml"));
            var list = new List<string>();
            list.AddRange(new []{"sitemap1.xml","sitemap2.xml","sitemap3.xml","sitemap.xml"});
            foreach (var item in list)
            {
                Get($"/{item}", async (req, resp) =>
                {
                    if (item.Equals("sitemap.xml"))
                    {
                        var ms = await cache.Get<Extensions.SiteMapIndex.SiteMapIndex>("sitemap.xml").Parse();
                        resp.ContentType = "application/xml";
                        resp.StatusCode = 200;
                        await ms.CopyToAsync(resp.Body);
                        return;
                    }

                    var smap = cache.Get<SiteMap>(item);
                    var ms2 = await smap.Parse();
                    resp.ContentType = "application/xml";
                    resp.StatusCode = 200;
                    await ms2.CopyToAsync(resp.Body);
                    return;
                });
            }
        }
...
}

Now this code isn't exactly elegant - we're loading and retrieving our site maps in the same module. You might want to do things quite differently when it comes to building your site maps. But the important part is how we're able to map each file to the router so that they're explicitly a part of our site's requests.

Let's add [AspNetCore.RouteAnalyzer] to our project and double-check. In Startup.cs:

endpoints.MapGet("/routes", request =>
{
    request.Response.Headers.Add("content-type", "application/json");

    var ep = endpoints.DataSources.First().Endpoints.Select(e => e as RouteEndpoint);
    return request.Response.WriteAsync(
        JsonSerializer.Serialize(
            ep.Select(e => new
            {
                Method = ((HttpMethodMetadata)e.Metadata.First(m => m is HttpMethodMetadata)).HttpMethods.First(),
                Route = e.RoutePattern.RawText
            })
        )
    );

When we run the app, we see this in our /routes:

[
    {
        "Method": "GET",
        "Route": "/sitemap1.xml"
    },
    {
        "Method": "HEAD",
        "Route": "/sitemap1.xml"
    },
    {
        "Method": "GET",
        "Route": "/sitemap2.xml"
    },
    {
        "Method": "HEAD",
        "Route": "/sitemap2.xml"
    },
    {
        "Method": "GET",
        "Route": "/sitemap3.xml"
    },
    {
        "Method": "HEAD",
        "Route": "/sitemap3.xml"
    },
    {
        "Method": "GET",
        "Route": "/sitemap.xml"
    },
    {
        "Method": "HEAD",
        "Route": "/sitemap.xml"
    },
    {
        "Method": "GET",
        "Route": "openapi"
    },
    {
        "Method": "GET",
        "Route": "/routes"
    }
]

That's exactly what we'd want to see right there.

It's also worth pointing out that we could explicitly map our sitemaps etc. the same way we're setting up the /routes path, by doing something in the endpoints.MapGet routine. Or, we could create a custom IStartupFilter and load your cache with all of the locations during application startup. It really boils down to how you wish to handle it.

Another consideration too is that the search engines will accept compressed data. To that end, we implemented Response Compression in our sample to try to shrink the size of our sitemap files, which despite having 50,000 locations in each file come out to about 11MB in size. We're also caching our sitemap objects in LazyCache, so when we deploy into a production environment we'll want to take those file sizes into consideration as we store those objects in memory.

In any case, hopefully this project helps someone who has a lot of URLs that need indexing. Enjoy!