SharePoint 2007 and Adobe PDF
[Entire Post reviewed and updated: 17/03/09 to include infrastructure update, 64-bit and farm installation notes]
If you are deploying Microsoft Office SharePoint Server (MOSS) 2007, even if you are planning a vanilla deployment (i.e. no bespoke development, using only out-of-the-box features) there is one piece of bespoke configuration you will likely still want - the ability to index and search for Adobe PDF files.
There is a great post over on SharePoint blogs, written by S.S.Ahmed, detailing how to add PDF support to your MOSS box. However, Adobe have made some changes to their use of iFilters and there is now a shorter and easier way that doesn't require registry edits or resetting your web server.
The following process works on my demo build for Microsoft Office SharePoint Server 2007 and has been repeated for numerous clients with various different SharePoint deployments. Default names and file locations have been used here - e.g. SharedServices1. I named the PDF icon as 'pdficon.gif'. If you have used different names and locations, substitute as necessary.
- Download and install Adobe Acrobat Reader 7 or later on the server to be used for indexing. (Note: From version 7 onwards, the reader includes the iFilter by default, previously you had to install the iFilter separately). If you have a 64-bit environment, you will need to download the 64-bit version. See notes at the end of these instructions.
- Download the PDF icon (select 'small 17 x 17') from http://www.adobe.com/misc/linking.html
- Give the icon a name (I use pdficon.gif)
- Save the icon in c:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES
- Edit the Docicon.xml file to include the PDF icon
- Navigate to c:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\XML
- Open the DOCICON.XML file in Notepad (or an XML editor). You should see that the file has two main tags - ByProgID and ByExtension
- Within the ByExtension tag, add an entry for the PDF icon 'Mapping Key="pdf" Value="pdficon.gif" /' (replace the single quotes with angle brackets)
- Save and close the file
- Stop and restart Internet Information Server (IIS). Note: This will temporarily take SharePoint offline. Open a command line (Start - Run - and enter 'cmd') and type 'iisreset'
- Repeat steps 3 and 4 for any SharePoint web front-end servers (Note: You only need to install the iFilter on the indexing server but for the icons to appear they need to be added to all web front-ends)
- Add the PDF file type to your search index (note that this has to be completed for each index, i.e. each Shared Service)
- Open your Search Settings: SharePoint 3.0 Central Administration - SharedServices1 - Search Settings (Note: If you have installed the Infrastructure Update, you will see the option in the side bar under both Search Settings and the new Search Administration dashboard)
- Select File Types
- Click Add File Type
- Enter pdf in the text box (labelled File extension) and click OK
- Check that the pdf file type is listed and has the pdf icon showing next to it. If the icon isn't showing, something has gone wrong. Review all of the above steps. Most common problems are spelling mistakes in the docicon.xml file and not repeating the process on all SharePoint web front-ends.
- Perform a full crawl of your content sources - PDFs will not show in search results until a full crawl has completed and indexed the PDFs using the iFilter you just installed.
That's all there is to it. Check that everything is working by doing a search for a file you know is a PDF document. The document should be listed in the results and should have the PDF icon displayed next to it.
If you have installed SharePoint on 64-bit servers, you will need to use the 64-bit filter. It can be downloaded from http://labs.adobe.com/wiki/index.php/PDF_iFilter_8_-_64-bit_Support. Please do not follow the instructions provided by Adobe with the filter. They include modifying the registry (not necessary - you should add file types using SharePoint's administration pages, not by modifying the registry). Also, the icon does not appear to get installed automatically and still needs to be added using the instructions outlined above.
Again, thanks to S.S.Ahmed for writing the original 'how to' that this post is based on.
Technorati tags: SharePoint, SharePoint 2007, MOSS 2007
21 Comments:
An alternative is the Foxit PDF IFilters - unlike Adobe, they have a 64bit version too.
http://markharrison.co.uk/blog/2007/05/foxit-pdf-ifilter-x64-and-32-bit.htm
I don't know if this is a typo or an eccentricity with my MOSS server, but I couldn't get the pdf icon to work when I put the .gif file in the folder you listed. I had to put it in c:\Program Files\Common Files\Microsoft Shared\web server extensions\12\TEMPLATE\IMAGES (the difference is the TEMPLATE part).
You were right first time - it is a typo. Weirdly, I'd managed to get the full string in for the XML folder.
Many thanks for letting me know - the post has been updated.
I tried to make what you describe here, but the filtering is not working... the only difference with your description is that I used MOSS 2007 with SP1. Do you know if there are bugfixes to use this feature?
Best regards
If you see internet explorer icon [e] instead of the pdf icon...
It is because we should perform a Full Update on the Search content indexes.
-Open a Command Prompt on the Indexing Server.
-net stop osearch.
-net start osearch.
-Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing PDF files.
That should refresh it and the pdficon.gif file will be visible as it should.
[Georgia.]
This blog will show you how to use PDF iFilters for x32 & x64 servers and also how to get your document librarys to display a PDF icon.
http://mattbeaver2002.spaces.live.com/blog/cns!801AB4157DF28FDA!14442.entry
What is the step 5:
Perform a full crawl of your content sources
I have downloaded the icon of the pdf file, added the new file type and added a new line in DOCICON.XML. But the last step is 5 "Perform a full crawl of your content sources", I don't know how to do it. can someone tell me that?
Thanks
Hi Anon
You need to go into Shared Services | Search Settings. The first item on the list are your content sources. You can go in there and perform a full crawl. If you have a lot of content, it may take some time and impact your network. In which case, consider performing the full crawl in off-peak hours.
Hope that helps, let me know if you need more info.
Thank you Joining Dots,
I’ve downloaded/installed the Adobe 9.
Added the file type “pdf” in Docicon.XML file.
Added file type in the File extension in Sets Settings.
Added a new content sources and performed a Full crawl.
But the search for .pdf file is not working for me. I upload 2 documents in Shared Documents,
One is Word document file John Smith Resume.doc, another is OCR Adobe file John Smith Resume.pdf, they are having the same text. But when I search the text “University of MD”, I only get the result: “Result 1-1 of 1. your search took 0.37 seconds.” And the Word “John Smith Resume.doc” listed there.
Did I miss something? Can Sharepoint2007 really search the text in the text in pdf file?
Thanks.
Now that's interesting. The third instance of PDF indexing not working when installing Adobe 9. I suspect Adobe have changed something and broken the method. Will give it a whirl on my demo server to confirm.
Two easiest fixes - install a copy of Adobe 8 instead. If you can't get a copy of 8, it's back to the old method of extracting and installing the filters. Alternatively, use different PDF filters. Foxit are highly rated and rumoured to perform much better than Adobe too.
Thank you Joining Dots.
I got this Adobe 9 worked, Adobe 9 has something missed. Please read this blog:
http://blogs.msdn.com/ifilter/archive/2007/03/29/indexing-pdf-documents-with-adobe-reader-v-8-and-moss-2007.aspx
The post on Friday, December 19, 2008 1:05 PM by Mike Ruhl.
Thanks.
Thanks for the help. Just wanted to note that I had a little trouble and found my solution. I added the file type first in central admin, and then added the image and edited the xml. This caused the image to not be associated with file type. I went in and deleted the file type and added it back and it worked fine.
Just might want to change your steps up to make sure people don't get confused. Thanks for everything.
Ah, now that's interesting. I had some odd behaviour with my last build and now wonder if the last updates to SharePoint have made the install order important. In the past, icon first was never a problem...
Am in the process of setting up another prototype. Will follow your install order and, assuming it works, will write an updated version of the post.
Many thanks for your feedback.
One other question for you. I have my DB (SQL 2005) installed on another box separate from my sharepoint box. Do I need to also install a filter on the sql server? Thanks much.
You need to install the ifilters on your indexing/search servers
Chances are, the separate SQL DB box is for your content. Shouldn't need to install ifilters there. The indexing process pulls all content across to the indexing server for indexing - that's where the ifilters need to be.
If you've got an architecture diagram showing server layout, send it through on email (details on the contact page - http://www.joiningdots.net/about/contact.htm) and I'll have a look at it.
Hi Bleeding Crimson
You're not wrong - I should have been clearer with the order. When following it to the letter, I had the same problems you experienced :-)
And an added bonus: this time around I had to do an iisreset before the PDF icon would be recognised in Search Settings.
Will update the blog post shortly. Thanks again for the feedback.
I've followed these instructions carefully, After I run the full crawl, my pdf documents show up when I do a search but if I do a search on a word in the pdf - it fails. Any thoughts on this? Thanks.
Currently I'm doing some R&D on iFilters available including iFilters for PDF files.
Having 2-3 iFilters options in market calls for need to evaluate them.
Read the performance evaluation here
http://codeforfuture.com/2009/05/22/finding-best-sharepoint-pdf-ifilter-64-bit/
and please let me know your views if any.
This also suggets some links to other ifilters u might want, as well as some good instructions and links.
http://zebracube.wordpress.com/2009/06/21/pdf-ifilter-sharepoint/
Great article. We create searchable PDFs, and get this question all the time.
Steve
PSIGEN SharePoint Imaging
Thanks for the nice feedback Steve, always appreciated.
Post a Comment
Links to this post:
Create a Link
<< Home