The Docvert service can run well as a dedicated web appliance , on a machine that is not used for anything else. Also, it is performance-intensive, but generally used only rarely. As an experiment, I tried creating an Amazon Web Service "Elastic Cloud" machine instance to use as a web appliance. The following instructions got me from start to finish, with a working result. YMMV.
I won't go into detail here, instead check out a few useful tutorials. Their control panel has got heaps better in recent years. It does require a bit of reading to get used to the terminology, but it's cheap to test (around 50c a day) You can do all the virtual server provisioning from their Web UI, and it's a good idea to start there.
The instructions below first summarize how you could do it through the UI, and are followed by a keylog that allows you do do it from a script.
The main step by step guide to AWS/EC2 has screenshots and things.
For the Ubuntu and commandline-minded, The Ubuntu docs EC2StartersGuide is alittle old, but still valid. Hidden away a bit, here is the reference for all the EC2 commandline tools. The inline help for those commands is a bit too brief.
You are now logged in. Skip down to the steps for installing Docvert.
https://help.ubuntu.com/community/EC2StartersGuide http://docs.amazonwebservices.com/AWSEC2/latest/CommandLineReference/index.html?ApiReference-cmd-DescribeInstances.html
The keypair ???.pem file you were given has a name as well as a filename. The commandline tool need the raw name, It's probably best if you name the file the same as the given name. My example keypair is named dmanAWS and is stored in ~/.ec2/dmanAWS.pem
# Breaking the settings out so we can see them ... export EC2_KEYPAIR=dmanAWS # THIS will be different for you export EC2_KEYPAIR_FILE=~/.ec2/${EC2_KEYPAIR}.pem export EC2_AMI=ami-60582132 # This is an ubuntu natty server base export EC2_TYPE=t1.micro # I found this out when choosing the base ami export EC2_REMOTE_USER=ubuntu # This is specific to the instances, sometimes it's 'root' export EC2_GROUP=sg-72277a20 # This is your own security group ID. Set it up in the UI first ec2-run-instances $EC2_AMI --key $EC2_KEYPAIR --instance-type $EC2_TYPE --group $EC2_GROUP > ~/ec2-instance_info.txt #... And you get a running machine, though we probably should also set a name? # We have recorded the output of that command into ~/ec2-instance_info.txt # so that we can work on the info it returned - the new ID we have been assigned. # Get some info from it so we know where we are now talking about export EC2_INSTANCE=`awk '$1 == "INSTANCE" { print $2 }' ~/ec2-instance_info.txt`
Now we have a running machine (you can probably see it if you refresh the AWS GUI) but we only know its instance ID, not the public web address. It will be something horrible like http://ec2-122-248-213-221.ap-southeast-1.compute.amazonaws.com/ for now. We need to retrieve that. Though you can get it from the GUI too, here is a commandline way.
# At the time the request was made, we had not actually been assigned an address. # Wait a bit, and then ask for it. Need to parse the text response for the bit we need. sleep 10 # force a wait, it's pretty quick ec2-describe-instances --filter "instance-id=$EC2_INSTANCE" > ~/ec2-instance_info2.txt export EC2_ADDRESS=`awk '$1 == "INSTANCE" { print $4 }' ~/ec2-instance_info2.txt` export EC2_IP=`awk '$1 == "INSTANCE" { print $14 }' ~/ec2-instance_info2.txt`
# With this info, we can now connect to it like this : ssh -i $EC2_KEYPAIR_FILE $EC2_REMOTE_USER@$EC2_ADDRESS
ec2-terminate-instances $EC2_INSTANCE
This is mostly the same as the main manual instructions, just run together a bit.
The above code was run from your terminal. The below code should be run on the target server, after ssh-ing into it.
sudo apt-get update sudo apt-get dist-upgrade # Note that in recent versions, you need to list libreoffice explicitly, first # the dpkg of docvert doesn't include everything it really needs. sudo apt-get install -y libreoffice libreoffice-java-common docvert # This is an extra manual step required to enable the actual web service sudo sed -i s/#Alias/Alias/ /etc/apache2/conf.d/docvert # Also some settings so the web service can use temp files sudo apache2ctl stop sudo usermod --home /tmp www-data sudo apache2ctl start # The above instructions WORK. 2011-07 # You should be able to access http://${DOCVERT_SERVER}/docvert and see stuff # but an alternative option is also recommended by Matthew. Unfortunately, it is unfinished. # Additionally, try the python version also. # this should co-exist with version 4, so both methods are available # Now we install version 5 of Docvert # The docvert 5 version does not check dependencies! I've found we need (at least) the following sudo apt-get install -y libreoffice python-lxml python-imaging # maybe these too, if things don't work first time? # sudo apt-get install -y python-bottle pdf2svg python-rsvg # Get the package wget http://holloway.co.nz/docvert/docvert-5.tar.gz tar -xzf docvert-5.tar.gz # Put it nearby, but not in sthe same place as the other, they are incompatible sudo cp -r holloway-docvert-*/ /usr/share/docvert-5/ # Optional - take over port 80 #sudo sed -i s/port=8080/port=80/ /usr/share/docvert-5/docvert-web.py # The 'listen' only listens on known hostnames? Change that to wildcard sudo sed -i s/host=\'localhost\'/host=\'0.0.0.0\'/ /usr/share/docvert-5/docvert-web.py # Docvert expects you to start a daemon to manage office application calls # Starting this manually is optional if we are building a hybrid appliance, it will already have the correct startup script installed. # sudo /usr/bin/soffice -headless -norestore -nologo -norestore -nofirststartwizard -accept="socket,port=2002;urp;" & # Start it cd /usr/share/docvert-5/ sudo python docvert-web.py &
About now you should be able to visit http://{DOCVERT_SERVER}/docvert and see some action. And if you are really lucky, also http://{DOCVERT_SERVER}:8080 for the new alternative.