Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
90 views
in Q2A Core by

Here's the except:

To request indexing of many new or updated pages, your best choice is to submit a sitemap, with the updated pages marked by <lastmod>.

https://support.google.com/webmasters/answer/9012289#request_indexing

But Q2A sitemap has no <lastmod> tags!!!!!!

Some other issues are:

  • Sitemap items can be larger than >5000, while it is advised that we should have separated sitemaps in such case.
  • The order of items should be newer first, older second, because Googlebot can be "tired" halfway, and come back later, so newer items should be prioritized on the top of the list.
Q2A version: 1.8.4
by
Even though we can rewrite the plugin, what we get is minimal. What we really expect is a question get "update" when it has a new answer. But the Q2A system has no mechanism to make the questions updated. I'm thinking of some event module to get the "updated" time updated when a question is answered.
by
+1
Something like this should give you the date of the most recent change (question or answer):

SELECT q.postid,
  MAX(CASE
    WHEN COALESCE(q.updated, q.created) > COALESCE(a.updated, a.created) THEN COALESCE(q.updated, q.created)
    ELSE COALESCE(a.updated, a.created)
  END) AS lastmod
FROM qa_posts q JOIN qa_posts a ON q.postid = a.parentid
GROUP BY postid;
by
+2
That's interesting to hear that direct from Google, might be worth looking into updating the sitemap plugin.

1 Answer

+2 votes
by
edited by

TL;DR What you're asking would basically require a rewrite of the Sitemap plugin.


To add <lastmod> tags to a sitemap you would first need to update the function that creates the XML snippets to take a timestamp argument and add it to the XML output:

private function sitemap_output($request, $priority, $modified = NULL) {
  echo "\t<url>\n" .
    "\t\t<loc>" . qa_xml(qa_path($request, null, qa_opt('site_url'))) . "</loc>\n" .
    (is_null($modified) ? '' : "\t\t<lastmod>" . $modified . "</lastmod>\n") .
    "\t\t<priority>" . max(0, min(1.0, $priority)) . "</priority>\n" .
    "\t</url>\n";
}

You'd also have to actually pass that information to the function, meaning you must adjust the database queries to fetch it from the database. The information shown in a sitemap can come from four database tables:

  • questions
  • users
  • tags
  • categories

The tables for tags and categories don't contain timestamps, so adding a <lastmod> tag for those is not feasible.

For users and questions the simplest approach would look like this

SELECT userid, handle, created FROM ^users WHERE ...;

and this

SELECT postid, title, hotness, created FROM ^posts WHERE ...;

respectively. However, that would not account for edited posts or user profiles, so you'd rather use the COALESCE() function to get the timestamp of the last update and fall back to the creation timestamp if that doesn't exist:

SELECT userid, handle, COALESCE(written, created) FROM ^users WHERE ...;

and

SELECT postid, title, hotness, COALESCE(updated, created) AS lastmod FROM ^posts WHERE ...;

But you also want the timestamp in a proper format in the XML, so you'd use something like

DATE_FORMAT(CONVERT_TZ(COALESCE(...), @@session.time_zone, '+00:00'), '%Y-%m-%dT%T+00:00')

instead of just COALESCE(...).

Then adjust the calls to sitemap_output() for questions and users, e.g.

$this->sitemap_output(..., ..., $question['lastmod']);

If you want newer pages first you need to reverse the processing order. The Sitemap plugin fetches database records in chunks of 100 and ordered by ID. Since the ID is a simple auto-incremented integer lower ID means older entry. You could change the sort order from ascending to descending to get highest ID first instead of lowest ID first, but beware that you must also adjust the condition from >= to <= for that to work:

SELECT userid, handle, DATE_FORMAT(...) AS lastmod FROM ^users WHERE userid<=# ORDER BY userid DESC LIMIT 100;

And you must adjust the calculation of the next ID in the PHP code to decrement as well.

However, that simple approach won't suffice, since it doesn't account for edits (for which the timestamp is stored in a different column, see above). You can't order by the result of a COALESCE() call, though, so you need something like an auto-generated row index to use instead of the ID:

SET @row := 0;
SELECT * FROM (SELECT (@row := @row + 1) as row, handle, lastmod FROM (SELECT userid, handle FROM ^users) t1 ORDER BY lastmod DESC
) t2 WHERE row>=# LIMIT 100;

which is pretty ugly due to the nested sub-selects. I'm also not sure about how well it would perform for tables with many rows. Arguably a better approach might be to define views for the users and posts tables, fetch all data with a single query per table, and then sort the data in PHP.

As you can see, this is getting pretty complex pretty fast. And I'm not even touching splitting the sitemap here, which AFAICS would require creating different responses depending on the requested file.

by
Whoa, so much hassle for a small enhancement.
...