Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
in Q2A Core by
Is it possible to add trending topics on the right sidebar by using tags?
Q2A version: 1.5.1

2 Answers

+2 votes
Yes it's possible using a plugin. I guess you'd want to select the most popular tags, but only from recent questions.

Not sure if there is an efficient way to do that though - selecting 100+ questions and counting the tags on each page load would be excessive, but you could set something to run each hour or two and store that in the database.
+2 votes
edited by
What you want to do for this is choose an update period. With twitter, their update period could be as little as a minute, because in a minute's time, they handle such a diverse cross-section of tweets that it's not likely a popular topic will go a full minute without being discussed. For a website that doesn't get as much traffic, you may want to choose an hour, or 3 hours, or any amount of time where you can be relatively certain to collect a good general sample of posts.

Once you know what that time should be, set up a cron job. Just to use an example, I'll say your time is 15 minutes. Every 15 minutes, your cron job should run a script that will select all the posts from the past 15 minutes and make what basically amounts to an associative array of hashtags => number of times used. So if you find 20 posts with the tag #tsunami and 15 with the word #earthquake, once you loop through all of the posts your array will know {"tsunami": 20, "earthquake": 15}.

Now let's say you have a database that stores entries like that-- simply hash tags, how many times they were used within the last cycle, and how many times they were used in the cycle before that. What you want to do first is select all the hashtags you didn't find in this cycle and delete those entries. Now write this cycle's findings to the database. Remember to save what the database currently has stored in the "number of uses in the last cycle" to the field that tracks how many were found in the cycle before that.

Note that having the hashtags in a separate table makes it easier. So something like

id | date                | text
1  | 2011-03-14 11:16 AM | This is a #test
2  | 2011-03-14 11:18 AM | Yet #another #test

post_id | date                | hashtag
1       | 2011-03-14 11:16 AM | test
2       | 2011-03-14 11:18 AM | another
2       | 2011-03-14 11:18 AM | test
post_hashtags doesn't need a surrogate key; you can simply use a multi-column primary key (forget the exact name) on both post_id and hashtag.

Then you'd just need to use the correct WHERE and GROUP BY clauses to get the data. PErhaps something like:
SELECT hashtag, COUNT(post_id) count
FROM post_hashtags
GROUP BY hashtag

In your main webapp, now, you can query your database for trending topics. You'll have to fine-tune this algorithm yourself, but you'll want to select the most-used hashtags that are showing a certain minimum increase in use. You get to figure out what that minimum increase should be and how popular a certain hashtag should be before it shows up in your trending topics :)

You'll probably want to make it even more sophisticated than that, down the line. Because if #tsunami shows uses like this over time: [20, 30, 40, 39, 42, 41, 60, 55], then that's still a trending topic when it's at 55 because if you were to graph that out, you'd still be seeing more and more uses over time.

You could smooth your data with an averaging filter (simple average over n elements or, of you're a perfectionist, with a Gaussian filter). Using the smoothed data, I'd define trending topics by a Boolean of the following structure:
trending = ( (f(s)>t1)&&(d(s)>t2) ) || ( (d(s)>t3) && (d2(s)>t4) ) with
s the series of values
f(s) the current (last) value
d(s) the derivative of s
d2(s) the second derivative of s
and t1-t4 thresholds you would have to define/adapt depending on what / how
much you want to show.

Note that the algorithm not only shows topics which are highly used and increasing (first term of the expression) but also accounts for topics who are just about to take off (high derivative and high second derivative, second term). Whether or not this makes sense for you depends on the application.

But it will give you a simple start...