DOC2PDF

Antiword is a free MS Word reader for Linux that converts the binary files from Word 2, 6, 7, 97, 2000, 2002 and 2003 to plain text and to PostScript TM . It is available at http://www.winfield.demon.nl/.

 My intention was to convert a doc file to a pdf and store it in the Tenders' Portal. The requirement is as follows:

  1. Users would have a doc file (a tender ) that they want to upload to the site using a PHP based page.
  2. Subsequently, the file would have to be converted to a pdf on the server and stored at a particular location.
  3. Database would be populated with the comments that the user have put while uploading the file.

I already had tried converting html to pdf. The documentation for the same is available at http://www.oocities.org/subhasisg/scripts/ghostsetup.html.

This time I wanted to extract the relevant parts from the doc file and store it as a pdf.

I downloaded Antiword Source from the above site.

I copied the source files in /home/www/doc2pdf and installed using the following commands:
 (a) Make a suitable directory such as /home/www/doc2pdf//antiword' and copy the 'antiword.tar.gz' file to this directory.
(b) decompress: 'gunzip antiword.tar.gz'
(c) unpack: 'tar xvf antiword.tar'
(d) compile: 'make all'
(e) install: 'make install'. This will install Antiword in the $HOME/bin directory.
(f) copy the file 'fontnames' and one or more mapping files from the Resources directory to the $HOME/.antiword directory (note the dot
before antiword!).
NOTE: you can skip point (f) if your system administrator already copied
these files to /usr/share/antiword.
(Read the file FAQ in /home/www/doc2pdf/antiword-0.37/Docs).

Since I ran the make as a root user, the files antiword and kantiword got created in the directory /root/bin. A folder named .antiword got created in /root.

I wanted to run antiword as a web user (nobody). Therefore, though the program was working fine from the shell, the same was not working when invoked from the Web page.

I read the error pages and based on the same, I created a home directory for the nobody user and copied the files of .antiword folder in /home/nobody. However, this still did not work.

After further analyzing the error, I copied the contents of the .antiword folder in /usr/local/share in two folders antiword and .antiword. (Copied the same contents in two different folders).

This time it started working. The shell  script that I had used is as follows: I am calling the following conv.php from the web page:
 

<?php
$COMMAND='/usr/local/apache/cgi-bin/doc2pdf/a.sh'.' '.'doc.doc'.' '.'2';
echo shell_exec($COMMAND);
echo $COMMAND;
?>

The shell script a.sh is as follows:
#!/bin/bash
GS_LIB=/usr/share/fonts/default/Type1; export GS_LIB
GS_FONTPATH=/usr/share/fonts/default/Type1; export GS_FONTPATH
PATH=$PATH:/usr/local/sbin:/usr/local/bin:/bin:/usr/bin:/root/bin:/usr/local/mysql/bin:/root/bin; export PATH

SOURCEFILE=`echo $1`
HEADER=`echo $2`
TARGETFILE=`echo $1 | cut -d. -f1`
SOURCEDIR="/usr/local/apache/htdocs/doc2pdf/pdfs"
TARGETDIR="/usr/local/apache/htdocs/doc2pdf/pdfs"
ERRTAG=0
ERRDTL=""
if [ "`echo $SOURCEFILE`" = "" -o "`echo $HEADER`" = "" ]
then
echo "Source File or Header Not specified"
ERRDTL="`echo $ERRDTL.\nSource File or Header not specified.`"
ERRTAG=2;
fi

#if [ ! -f logo6.jpg ]
#then
# echo "Logo File not found in source directory"
# ERRDTL="`echo $ERRDTL.\nLogo File not found in source directory`"
# ERRTAG=3;
#fi

if [ ! -f $SOURCEDIR/$SOURCEFILE ]
then
echo "Source File not found!!"
ERRDTL="`echo $ERRDTL.\nSource File Not Found.`"
ERRTAG=4;
fi

if [ -f $TARGETDIR/$TARGETFILE.pdf ]
then
echo "Target File Already exists."
ERRDTL="`echo $ERRDTL.\nTarget File exists.`"
ERRTAG=5;
fi
if [ $ERRTAG -eq 0 ]
then
	if [ `echo $HEADER` -eq 1 ]
	then
		/bin/cat /usr/local/apache/htdocs/doc2pdf/header1.html > $TARGETDIR/$TARGETFILE.html
fi
	if [ `echo $HEADER` -eq 2 ]
	then
		/bin/cat /usr/local/apache/htdocs/doc2pdf/header2.html > $TARGETDIR/$TARGETFILE.html
	fi
if [ $? -ne 0 ]
then
echo "Failed in creating $TARGETDIR/$TARGETFILE.html"
fi

/usr/local/bin/antiword -m 8859-1.txt $SOURCEDIR/$SOURCEFILE -t >> $TARGETDIR/$TARGETFILE.html 2>> /tmp/errr.log
if [ $? -ne 0 ]
then
echo "Failed in converting doc to text."
fi
echo "</pre></body></html>" >> $TARGETDIR/$TARGETFILE.html
html2ps $TARGETDIR/$TARGETFILE.html > $TARGETDIR/$TARGETFILE.ps
if [ $? -ne 0 ]
then
echo "Failed in converting html to ps."
fi
ps2pdf $TARGETDIR/$TARGETFILE.ps $TARGETDIR/$TARGETFILE.pdf
if [ $? -ne 0 ]
then
echo "Failed in converting ps to pdf."
else
echo "Successfully converted to $TARGETDIR/$TARGETFILE.pdf"
fi
fi

# CLEANUP
rm -f $TARGETDIR/$TARGETFILE.html $TARGETDIR/$TARGETFILE.ps

 

I am presumed that html2ps and ghost is already installed in the server.