Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
2.6k views
in Q2A Core by
I am sorry if it is an existing question or documentation. Please teach the movement of the search box on the right-top of the screen.

1. Separator when I input plural words

space(" ") / comma(",") ?

2. Search object

Question, Answer, Comment, Tag, Category, Static-Pages ?

3. Default extraction logic when I input plural words

AND? / OR?

4. Advanced search (Feature request ???)

Is the search letting and AND/OR coexist possible?

Is the search that appointed a specific condition possible?
  ex.)
    Question-only
    Answer-only
    Comment-only
    Specific User
    Specific Period
    Number of vote
    ...

I have a hard time recently to look for it because articles increased. Because Tag depend on the skill/feeling of the user, the tag does not help a correct search. The Google custom search is good, but thinks that some functionality is necessary for inner search feature.

Thanks in advance.

1 Answer

+4 votes
by
The search box works as follows:

It separates the words based on any word separator, including space or comma or quote marks, etc... (see QA_PREG_INDEX_WORD_SEPARATOR and $qa_utf8punctuation in qa-util-string.php).

Each of these words is then matched against any of the following:

* Title of question
* Tag of question
* Content of question, answer or comment
* Handle of user asking question

The matches are weighted based on (1/frequency in database) so that a match of a word that appears 100 times in a particular way is worth half of a word that appears 50 times. If the word appears more than 10000 times in the database in a particular way it is ignored (this value can be changed in QA_IGNORED_WORDS_FREQ in qa-config.php).

There is also additional weighting as follows:

* Let's say a title match is worth 1 (if the word appears several times in the title it's still just worth 1)
* Tag match is worth 2 (since it's an explicit indication by the user)
* Question content match is worth 0.5 to 1 depending on frequency of word in the question content
* Answer content match is worth 0.25 to 0.5
* Comment content match is worth 0.125 to 0.25
* User handle match is worth 1

For each question, these values are added up, with any matches for an answer or comment counting towards the parent question. Questions are then shown from the search in order of their total score.

This is all done through a single fast (but rather complex!) MySQL query, built in qa_db_search_posts_selectspec(...) in qa-db-selects.php.

As to your question, AND and OR aren't so relevant, since words in the search string that match nothing are simply ignored. This is unlike search engines like Google. The reason is that the search needs to work for people entering whole questions, as well as individual words, and the sentence for a whole question is very likely to have words that match nothing.

The search does not yet support other sorts of advanced features you mention. I'm not sure they will all be in such high demand, but in general improving the search function is certainly there on the roadmap.
by
Thank you for your detailed explanation. Correct-Search seems to be very difficult by Japanese and other languages that do not have an explicit word separator.

"Thank you gidgreen." / "gidgreenさんありがとうございます"
...