Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+7 votes
2.0k views
in Q2A Core by
I'm currently running Q&A site for Koreans. http://ask.xguru.net
I've translated all messages to Korean and working almost perfectly.

But, there are one problem that your search is based on words.
In Korean language, one or more suffixes are added to noun and verb.
In that case, search can't find word that has suffixes or postfixes.

How can I enable full-text search using like '%word%' ?

1 Answer

+5 votes
by
Great question, and I'll admit that it's a tough one, because the current architecture uses a query structure that can't easily be converted to queries with preceding or following % wildcards.

The index structure would allow queries with suffixes (word%) to be done quickly, but queries with prefixes (%word) or both (%word%) will run slowly.

In any event, you could change qa_db_search_posts_selectspec() in three places to create a query fragment that looks something like this, based on the elements in each parameter $titlewords, $contentwords and $tagwords (example below if it contains 'a', 'b', 'c'):

WHERE word LIKE 'a%' OR word LIKE 'b%' OR word LIKE 'c%'

That query fragment could then be used in place of this fragment in the three places in qa_db_search_posts_selectspec():

word IN ($)

You'd need to make sure to remove the corresponding $titlewords, $contentwords or $tagwords parameter from the array_push coming immediately after each substitution, because you'll have removed the $.

Another completely different approach is to use some Korean PHP stemming library (if it exists) to convert the words into their canonical form (without prefixes and suffixes) before indexing. In this case, you would apply the stemming in qa_post_index(), and then also correspondingly in qa-page-search.php.

Anyway, I hope that this helps a little. It won't be feasible to offer this kind of sophisticated internationalization integrated into Q2A at this stage of the project, but I know it has to get there in the fullness of time!
by
Thank you! I successfully added full text search on my site!
by
I try it,but i does not work,i just wanna ask what the sentence below mean and how to make it come true(i download the code and didn't change anything,what exactly the elements mean)

based on the elements in each parameter $titlewords, $contentwords and $tagwords

thanks a lot!
by
edited by
this is an older question, but I've come up with a query using a regex.  It's not completely optimized yet, but will post it once I've got it to a point that I like.
by
here's the code I ended up using in qa-db-selects.php, line 491.  You can probably spend a bit more time to optimize the regex for speed.

if (!empty($titlewords)) {
            // At the indexing stage, duplicate words in title are ignored, so this doesn't count multiple appearances.
           
            $selectspec['source'].=($selectparts++ ? " UNION ALL " : "").
                "(SELECT postid AS questionid, LOG(#/titlecount) AS score, _utf8 'Q' AS matchposttype, postid AS matchpostid FROM ^titlewords JOIN ^words ON ^titlewords.wordid=^words.wordid WHERE word IN (";
           
            //regex begin
            $regex = "SELECT word FROM qa_words WHERE word REGEXP '\w{0,1}[.^^\s]*";
            $rString = implode('|',$titlewords);
           
            // '\w?[.^\s]*CFL[.^\s]*\w?'";
               
            $selectspec['source'].=$regex.$rString."[.^^\s]*\w{0,1}') AND titlecount<#)";
           
            //array_push($selectspec['arguments'], QA_IGNORED_WORDS_FREQ, $titlewords, QA_IGNORED_WORDS_FREQ);
            array_push($selectspec['arguments'], QA_IGNORED_WORDS_FREQ, QA_IGNORED_WORDS_FREQ);
        }

i also modified qa-db.php, line 183:

        $query=strtr($query, array('^^'=>'^','^' => QA_MYSQL_TABLE_PREFIX));
...