Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
  • Register
Welcome to the Q&A for Question2Answer.

If you have questions about the platform, click here to ask and please use English.

If you just want to try Q2A, please use the demo, which also grants admin access.

Jan 18: 1.5 release

Why do you break all words of all questions and answers into individual word and save it in to separated table?

+1 vote
As I see you have a table named 'qa_words' that stores all individual words from all questions and answers. For example, if I enter a question: "What is the question" then we have 4 record in that table for "What", "is", "the" and "question". Is it a wise idea when the number of questions and answers are rapidly increasing?
asked Aug 27, 2010 in Q2A Core by anonymous

1 Answer

+1 vote
The reasoning for that is better indexing of content and faster searches, I believe.

Actually the one you want to look at is qa_contentwords, which links every word to every post. It will be much bigger than qa_words.

Here are the stats from my site which has been running for several months now:
qa_contentwords = 22.3 MB
qa_posts = 5.3 MB (with > 10,000 posts)
qa_titlewords = 2.2 MB
qa_uservotes = 1.8 MB
qa_words = 2.0 MB

Your opinion may differ but I don't think it's a huge problem since qa_contentwords is only 4x bigger than qa_posts.

There are only so many different words in the world! Some of the stuff will end up being junk but if you run the reindex script after deleting garbage posts, that junk will be removed. IMO this could maybe be improved if the app just ignored the most common words, since they are not useful in searches and will be taking up the largest amount of room.
answered Aug 27, 2010 by DisgruntledGoat
edited Aug 27, 2010 by DisgruntledGoat
Thanks for posting the explanation. As you say, it's just not a big deal on today's servers. The reason Q2A doesn't have a fixed common word list is that I want it to be completely language-neutral. It still automatically ignores words that are used more than 10,000 times when searching (set by the QA_IGNORED_WORDS_FREQ constant in qa-config.php).
For the language thing, could you add one string to the language file with a list of common words? e.g.
   'common_words' => 'the,a,i,to,is,it,of....'
Then different languages would list the common words in their language.
Perhaps, but this would require translators to think a little more than is usually necessary...!