Welcome to the Question2Answer Q&A. There's also a demo if you just want to try it out.
+1 vote
in Plugins by
edited by

What I have is just the 1000s of html pages which consist of question and answers. Now how can I move then into QA system. Example: What I actually have is (sample only):-

question2answer.org folder > qa folder / 52327 folder / best-users-per-month.html page (contains question and answers) for url like http://www.question2answer.org/qa/52327/best-users-per-month. This the structure of my site. They are just folders and pages. how can i move them into q/a system.

This is how the files present on my Pc.

2 Answers

+2 votes
selected by
Best answer
You can write a small PHP parser for this. It won't be difficult to construct post format from a HTML dump provided it is from a Q2A site. Otherwise also its not difficult- you have to match all items like category, tags, date, user etc. I had once done this to restore my site content from cached HTML pages.
That won't work. Even if you strip HTML, in database content has a fixed size. So, it wont affect. To reduce size you can do alternate ways.

Duplicates - How is it coming even? I suppose the HTML files were unique. To eliminate I guess you have to modify the code and check for duplicates, or remove duplicate content from the SQL file - in the later case you might have some missing "IDs" in post table.
Some HTML files has copies and I'm in a shared hosting. Will running those will cause any problem?
Guess you can start a new Q2A site just for your doubts :O
It doesn't cause any problem, but just the CPU limit given by the hosting provider increases. So if it reaches 100% then I have to wait for sometime I think.

I have dropped a message in your profile and do u think u can answer that.
0 votes

I can get the description inside the command prompt.

What I did?

Create a Example.rb file on my PC. Entered the below code inside the file and saved it.

require 'nokogiri'
require 'open-uri'

url = "C:/Users/Manikandan/websites/index.html"

data = Nokogiri::HTML(open(url))

puts data.at_css("#summaryDescription").text.strip

puts data.css(".postContent").text.strip


When running the file on Command prompt, I can get the details

But I get all the answers as a single paragraph. I'm not sure what I'm doing, please help me.