Version 0.7.4-dev (Planned release:unknown)
Version 0.7.3 (10/12/2006)
Changes to the Code Base
updateUpgraded to Checkstyle 4.2(BJL)
updateUpgraded to IKVM
update[ 1546399 ] Use get/set functions for separators in PDFTextStripper(BJL)
addPDDocument.silentPrint() to print without prompting for a printer(BJL)
fix[ 1544118 ] Bug in PDFont.getCodeFromArray(BJL)
update[ 1529835 ] Add COSFloat.setValue()(BJL)
fix[ 1492555 ] PDChoiceField dead loop(BJL)
fix[ 1499521 ] NPE PDAppearance.convertToMultiLine(BJL)
fix[ 1522007 ] Error converting date(BJL)
updateUpgraded to Lucene 2.0.0(BJL)
fix[ 1451164 ] Problems filling combo and radio form fields(BJL)
updateUpgraded to lucene 1.9.1(BJL)
update[ 1023133 ] Support PDF Functions(BJL)
updateAdded command line org.pdfbox.PDFMerger(BJL)
update***API Change*** Promoted AppendDoc from example to util package, renamed to PDFMergerUtility.(BJL)
updateUpgraded to IKVM-
fix[ 1391952 ] Problem extracting embedded attachments(BJL)
fix[ 1249607 ] Fixed issue with broken PDFs that contain multiple endobj(BJL)
add[ 1153174 ] Added documentation for PDFHighlighter(BJL)
updateRemoved log4j dependency(BJL)
fix[ 974661 ] getKids() Null Pointer Exception when parsing pdf(BJL)
addAdded better support for CJK encoding(BJL)
updateChanged signature of PDFPageContentStream.drawImage to take float arguments instead of int(BJL)
fixFixed issue where form xobjects where not being drawn in the viewer(BJL)
updateChanged signature to PDDocumentCatalog.OpenAction to be an PDDestinationOrAction instead of just action.(BJL)
fixAdded tolerance to text extraction sorting where text on a line was not at the same exact y coordinate but very close(BJL)
add[ 1327133 ] Printing with form data(BJL)
fixFixed issue with DateConverter that was trying to parse an empty string(BJL)
fix[ 1324846 ] appending text to PDPageContentStream messes up fonts(BJL)
addAdded new example ReplaceURLs to show how to replace a clickable URL in a PDF(BJL)
addImplemented annotation drawing(BJL)
addImplemented EndPath and StrokeAndClosePath operators(BJL)
updateMove text extraction permission checking from PDFTextStripper to ExtractText(BJL)
addAdded support for more annotations, thanks to a contribution from Paul King(BJL)
updateCreated new FontBox project to hold all font library code(BJL)
fixFixed issue where only the first page was sent to the printer(BJL)
fixNow automatically sets the page orientation when printing(BJL)
Changes to Documentation
updateUpgraded to Apache Forrest 0.8-dev(BJL)

Version 0.7.2 (09/11/2005)
Changes to the Code Base
updateUpgraded to IKVM-
addAdded support to get annotations from a page and to create a RubberStamp annotation(BJL)
addAdded PDDocument.print() to send the PDF to a printer.(BJL)
fix[ 1276623 ] NullPointerException in PageDrawer:241 when extractin images(BJL)
updateAllow creation of PDJpeg from a BufferedImage, thanks to contribution from Paul King(BJL)
addRemoved PDTiff in favor of PDCcitt(BJL)
addPDFBox no longer requires log4j!!(BJL)
addNew class to allow you to specify ‘named’ regions where text is to be extracted.(BJL)
fix[ 1261555 ] Unexpected end of ZLIB input stream when stream has a zero length(BJL)
fix[ 1226665 ] ImportXFDF giving NPE error(BJL)
updaterenamed COSDictionary.setItem( String, boolean ) to COSDictionary.setBoolean( String, boolean )(BJL)
updateAdded sorting parameter to PDFTextStripper(BJL)
fixFixed issues with PDF encryption(BJL)
updateBetter date support, added support for PDFs that use non standard dates, support for time zone offsets(BJL)
updateFlateFilter-class now supports PNG-Predictors for decoding the imagedata, thanks to a contribution from Marcel Kammer(BJL)
addAdded support for extracting tiff images, thanks to a contribution from Marcel Kammer(BJL)
addAdded PDDocument.removePage to remove PDF pages(BJL)
addFixed issue when creating a COSString with a UTF 16 string(BJL)
addCommitted patch for type 1 PFB font parser(special thanks to Michael Niedermair)(BJL)
addCommitted patch for PNG predictors (special thanks Erik Martino)(BJL)
fix[ 1227428 ] failure of getMediaBox(BJL)
fix[ 1227426 ] null pointer in PDFToImage(ColorModel is null)(BJL)
update[ 1207113 ] Enhancement: runtime accessible version(BJL)
fix[ 1213320 ] setFfFlag() of PDField not working correctly(BJL)
fix[ 1215945 ] Error in COSString.writePDF() – fixed escape sequences(BJL)
fix[ 1198912 ] COSName with escaped characters not parsed correctly(BJL)
fixFixed issue where resources were not being cleared in PDFStreamEngine(BJL)
fix[ 1165686 ] Expected int type parse error(BJL)
fix[ 1182825 ] Wrong handling of signed/unsigned byte/int in TTF parsing(BJL)
remove[ 1182892 ] PDFHighlight.setHighlightColor was removed because it is not implemented by adobe(BJL)

Version 0.7.1 (04/10/2005)
Changes to the Code Base
fix[ 1170068 ] text field is not found(BJL)
fixfixed NPE issue where an image did not have any applied filters(BJL)
fixFixed issue where extra spaces were being added during text extraction for type3 fonts(BJL)
update[ 1119420 ] Extract and Update the Meta-Information as XML(BJL)
update[ 1119410 ] Extract text in/between bookmarks(BJL)
update[ 1164476 ] XFDFImport should fail with non XFDF document(BJL)
add[ 1119408 ] Support named target for Bookmark extraction.(BJL)
addCreated Resources/ to create a mapping for non-embedded fonts(BJL)
update**API Change** Renamed PDField.getName() to PDField.getPartialName(), added method getFullyQualifiedName() (BJL)
update**API Change** Renamed PDWidget to PDAnnotationWidget for naming consistency(BJL)
updateText is now extracted from embedded form xobjects.(BJL)
updateDeployed site to new hosting vendor.(BJL)
updatecommitted code for PDFHighlighter to highlight words in a PDF document.(BJL)
updateAdded command line application org.pdfbox.PDFToImage(BJL)
updateImplemented runlength decoding(BJL)
updateAdded patch from Jorge Hernández Sellés to append content streams to existing page.(BJL)
update**API Change**renamed package from to
update**API Change**Removed PDRadioButton, should use PDCheckbox instead(BJL)
update**API Change**COSStream now extends COSDictionary instead of containing a dictionary(BJL)
update[ 1021241 ] Text extraction should follow PDF article divisions(BJL)
addAdded implementation for PDF page articles(BJL)
addCreated TextToPDF command line application(BJL)
addCreated ImageToPDF example(BJL)
fixfixed parsing of header where a trailing % exists(BJL)
fix[ 1110029 ] Character “>” not quoted in COSName::writePDF(BJL)

Version 0.7.0 (1/22/2005)
Changes to the Code Base
updatecommitted [ 1097913 ] Enhance LucenePDFDocument streams(thanks to Olivier Parent)(BJL)
addAdded implementation for PDF Bookmarks(BJL)
addAdded implementation for PDF Destinations(BJL)
updateUpdated website for better format for documentation(BJL)
fixNow ExportFDF and ExportXFDF will default output files to pdfname.fdf and pdfname.xfdf(BJL)
fix[ 1046278 ] ClassCastException when doing FDF/XFDF(BJL)
fixExtractText now allows you to extract text if you decrypt with the owner password(BJL)
fixAdded PDF 1.5 Object Stream support(BJL)
fixAdded pdmodel.common.PDStream to represent COSStream(BJL)
fixchanged PDPage.getContents to use PDStream instead of COSStream(BJL)
fixUpdated LucenePDFDocument Javadoc to tell which Lucene fields it populates(BJL)
fixmoved HelloWorld example from persistence to pdmodel and updated to use new PD Model features(BJL)
fixRefactored PDFStreamEngine based on contributions from Christophe Huault(BJL)
fixThis class no longer uses a gigantic if/else statement for all of the operators they are defined as properties when instantiating the class(BJL)
fixUpdated AFM resources to be ones released on Adobe’s site, include AFM license as well(BJL)
fixAdded ability to embed TTF fonts, only WinAnsiEncoding is supported at this time(BJL)
fixAdded ability to extract images, thanks to contributions by Brigitte Mathiak(BJL)
fixCOSWriter now generates the document id if it does not already exist(BJL)
fiximproved performance for text extraction(BJL)
fix[ 1058693 ] TextPosition does not take account of tz operator(BJL)
fixupgraded to log4j-1.2.9(BJL)
fixinclude package-list for javadocs(BJL)
fix[ 1037145 ] Infinite loop in PDFParser.parseObject(BJL)
fixfixed error where spaces before integers was causing parse errors(BJL)

Version 0.6.7 (10/09/2004)
Changes to the Code Base
fixRevamped the way character spacing and font information is obtained(BJL)
fixImproved location information about a character drawn on the screen.(BJL)
fixChanged the PDFStreamEngine.showString to showCharacter to support the newly improved location information. This will now only show one character at a time.(BJL)
fixFixed bug in PDDocument.isOwnerPassword and isUserPassword that was using the wrong length for the encryption key(BJL)
fixUpgraded to ant 1.6.2(BJL)
fixUpgraded to checkstyle-3.4(BJL)
fixUpgraded to JUnit-3.8.1(BJL)
fixUpgraded to lucene-1.4.2(BJL)
fixIntegrated patch(1016603) for issue 943319 to fix parsing of open office documents(BJL)
fixPatch:985347 No longer throw exception for “No ‘ToUnicode’ and no ‘Encoding’ for Font”(BJL)
fixPatch:996191 Fixed case statement with missing break(BJL)
fixPatch:996781 Fixed null pointer exception in acroform fields(BJL)
fixRenamed DecryptDocument to DocumentEncryption to support encryption and decryption(BJL)
fixAdded load/save/encrypt/decrypt convenience methods on the PDDocument class(BJL)
fixCOSWriter now attempts to keep object numbers from parsed documents and writes ‘free’ entries in the xref if necessary(BJL)
fixAdded the ability to set the word separator on the PDFTextStripper(BJL)
fixFixed issue where PDFBox would throw an IOException if a PDF was incorrectly missing an endobj tag(BJL)
fixFixed 918220 where PDFBox would freeze when parsing certain cmap files(BJL)
fixAdded initial colorspace support(BJL)
fixFixed issue where AppendDoc was throwing ClassCastException(BJL)
fixFixed 1013163 Can’t parse filters that use filter abbreviation(BJL)
fixFixed 1011244 Where encrypting then decrypting was causing a problem(BJL)
fixrenamed TextPosition.getWidth to TextPosition.getCombinedHorizontalDisplacement to better reflect its actual value(BJL)
fixFixed 919215 PDFBox now support stream replacement(BJL)
fixFixed 955043 Added support for ‘ETenms-B5-H’ encoding(BJL)
fixFixed 996050 Class Cast exception when importing(BJL)
fixAdded support for Font descriptors(BJL)
fixFixed spacing issues when doing textfield FDF import(BJL)
fixFixed 1017175 Large number converted when re-written(BJL)
fixFixed 1029873 PDFBox now allows for multiple xref sections(BJL)
fixAdded support for document Viewer Preferences(BJL)
fixMade currentDocument and pdfDocument protected in util.Splitter to allow easier subclassing(BJL)
fixFixed 1034427 After Splitting page orientation is lost(BJL)
addAdded the following command line applications (BJL)

Version 0.6.6 (07/20/2004)
Changes to the Code Base
fixImproved support for setting of checkbox fields(FDF import)(BJL)
fixAdded the org.pdfbox.PDFSplit utility to split a single document into many documents(BJL)
fixPDFBox now ignore the Length field that is associated with a stream, it has been found to be wrong in some documents(BJL)
fixFixed bug when writing out PDF documents and the document contained an non alphabetic character such as ( or )(BJL)
fixFixed bug in PDFont where dictionary encodings where not being processed correctly(BJL)
fixFixed bug in COSDocument.isEncrypted which was comparing COSNull to the wrong object(BJL)
fixIntegrated patch for supporting multiple lines in the appearance stream(BJL)
fixUpgraded to lucene-1.4-final(BJL)
fixorg.pdfbox.ExtractText now uses the system encoding as the default encoding instead of ISO-8859-1(BJL)

Version 0.6.5 (03/08/2004)
Changes to the Code Base
fixFixed bug in revision 3 encryption algorithm(BJL)
fixadded support for CIDFontType0 glyph widths, which fixed issue with spaces being during text extraction(BJL)
fixFixed infinite loop when parsing a corrupt content stream(BJL)
fixAdd characterspacing + wordspacing when determining the width of a space character(BJL)
fixAdded support for more font types(BJL)
fixrefactored the pdmodel.interactive package, form fields use object delegation instead of inheritance for the widget, see PDField.getWidget and PDField.getKids(BJL)
fixFixed bug where an inheritable cropbox would cause stackoverflow exception(BJL)
fixChanged usage of PDField/PDWidget to look like object delegation instead of inheritance by adding a PDField.getWidget instead of extending PDWidget(BJL)
fixrefactored interactive package, this will break any existing code that uses the PDField/PDAnnotation classes. You will need to adjust your package names!!(BJL)
fixNow uses StandardEncoding as the default encoding(BJL)
fixBug in AppendDoc example that did not take into account groups of pages(BJL)
fixPDFont now also tries the bootstrap classloader when loading AFM resources(BJL)
fixadded -startPage and -endPage command line options to org.pdfbox.ExtractText(BJL)
fixAdded support for corrupt PDFs with garbage before the header(BJL)
fixFixed bug where there was whitespace instead of garbage characters in front of the first object(BJL)
fixperformance improvements for the Matrix implementation(BJL)
fixupgraded to lucene 1.3(BJL)
fixfixed bug in cmap parser for cmap files that all ended in ‘def'(BJL)
fixRemoved createObject method from COSDocument, COSWriter will handle all object references for you(BJL)
fixUpdated AppendDoc to use PDDocument instead of COSDocument and a couple bug fixes(BJL)
fixPDFParser now closes the document if there were parse errors(BJL)
fixTextPosition now has the PDFont that is associated with the piece of text(BJL)
fixAdded initial version of org.pdfbox.PDFViewer, a GUI application to view the internal structure of a PDF document. This can be used for debugging purposes at this time but may end up being a Adobe Reader like application if there is enough interest(BJL)
fixChanged COSNumber/COSInteger/COSFloat interface to have both intValue and longValue(BJL)
fixAdded methods isUserPassword & isOwnerPassword to PDDocument(BJL)
fixAdded cmap files for CJK languages, please give me some feedback(BJL)

Version 0.6.4 (11/02/2003)
Changes to the Code Base
fixFixed bug which caused infinite loop(BJL)
fixFixed bug in encoding where DictionaryEncoding kept a reference instead of making a copy leading to encoding problems(BJL)
fixAdded PDFTextStripper.(get|set)PageSeparator, which will allow the user to output a string after every page(BJL)
fixrefactored text stripping code to separate the logic processing of PDF operators and the logic of extracting text(BJL)
fixran findbugs on source code and fixed a couple minor issues(BJL)
fixRefactored font functionality to PDFont, some API methods are no longer available in COSObject(BJL)
fixchanged name of org.pdfbox.Main to org.pdfbox.ExtractText(BJL)
fixadded contribution of org.pdfbox.Overlay from Mario Ivankovits(BJL)
fixadded log.isDebugEnabled checks to log4j calls(BJL)
fixadded better escaping when writing COSNames(BJL)
fixfixed bug where encryption dictionary is sometimes set to COSNull instead of not being present(BJL)

Version 0.6.3 (09/13/2003)
Changes to the Code Base
fixNow contains the ability to import/set FDF data thanks to a contribution from Stefan Uldum Grinsted(BJL)
fixNo longer throw an error when stream is not followed by 0A or 0D0A to allow more PDFs to be parsed(BJL)
fixAdded -encoding argument to org.pdfbox.Main to control the encoding of the output(BJL)
fixRemove Prev entry from trailer if it exists because PDFBox automatically clears all old entries, only an issue when modifying/saving an existing PDF document(BJL)
fixFixed bug in master password encryption algorithm for Revision 3 encrypted documents(BJL)
fixCOSString no longer uses UTF-8 when encoding the byte array(BJL)
fixAdded PDDocument.getPageCount()(BJL)
fixFixed bug in PDFEncryption where(BJL)
fixNow enforces text extraction permissions(BJL)

Version 0.6.2 (4/18/2003)
Changes to the Code Base
fixModified build so that settings are no longer required(BJL)
addAdded required libraries to CVS(BJL)
addAdded log4j logging(BJL)
updateSignificant text extraction work(BJL)
fixAdded automatic handling of files encrypted with the empty password(BJL)
addAdded automated tests and test data for text extraction(BJL)
fixRemoved unimplemented decoders from filters test(BJL)
fixFixed several LZW decode bugs introduced after 0.5.6(BJL)
fixFixed bugs relating to processing out of spec PDF’s with bad # escaping in the name (“ Error: expected hex number” bug)(BJL)
fixFixed Lucene UID generation bug(BJL)
fixFixed GetFontWidths null pointer exception bug(BJL)

Version 0.6.1 (3/9/2003)
Changes to the Code Base
fixFixed bug in parsing stream objects which led to “Unexpected end of ZLIB input stream”(BJL)
fixChanged license from LGPL to BSD to allow pdfbox to be used easily in Apache projects(BJL)

Version 0.6.0 (3/5/2003)
Changes to the Code Base
fixMassive improvements to memory footprint(BJL)
fixMust call close() on the COSDocument(LucenePDFDocument does this for you)(BJL)
fixReally fixed the bug where small documents were not being indexed(BJL)
fixFixed bug where no whitespace existed between obj and start of object. Exception in thread “main” expected=’obj’ actual=’obj<</Pro(BJL) fixFixed issue with spacing where textLineMatrix was not being copied properly(BJL) fixFixed ‘bug’ where parsing would fail with some pdfs with double endobj definitions(BJL) addAdded PDF document summary fields to the lucene document(BJL) Version 0.5.6 (11/28/2002) Changes to the Code Base addFixed bug in LucenePDFDocument where stream was not being closed and small documents were not being indexed (BJL) addFixed a spacing issue for some PDF documents (BJL) addFixed error while parsing the version number (BJL) addFixed NullPointer in persistence example (BJL) addCreate example lucene IndexFiles class which models the demo from lucene (BJL) addFixed bug where garbage at the end of file caused an infinite loop (BJL) addFixed bug in parsing boolean values with stuff at the end like “true>>” (BJL)

Version 0.5.5 (10/03/2002)
Changes to the Code Base
addAdded example of printing document signature(BJL)
addAdded example to print out form fields values(BJL)
fixFixed bug when appending documents(BJL)
fixVarious other bug fixes(BJL)

Version 0.5.4 (09/17/2002)
Changes to the Code Base
fixFixed bug in text output where ‘?’ instead of the proper character(BJL)
fixFixed bug where sections of text were not being output at all(BJL)

Version 0.5.3 (09/13/2002)
Changes to the Code Base
fixFixed bug in 128 bit encryption(BJL)

Version 0.5.2 (09/06/2002)
Changes to the Code Base
fixFixed bug where FDF documents could not be appended to PDF Documents(BJL)
updateCatch all NumberFormatExceptions and wrap them with IOExceptions(BJL)

Version 0.5.1 (09/04/2002)
Changes to the Code Base
addNow supports unicode for the document summary(BJL)
updateBetter support for Type0 fonts(BJL)
fixFixed bug with an empty LZW stream(BJL)
fixFixed parsing error for ID operator(BJL)

Version 0.5.0 (08/31/2002)
Changes to the Code Base
addNow supports unicode for the document summary(BJL)
updateBetter support for Type0 fonts(BJL)
fixFixed bug with an empty LZW stream(BJL)
fixFixed parsing error for ID operator(BJL)

Version 0.4.1 (07/25/2002)
Changes to the Code Base
fixFixed bug where .notdef was being output as document text(BJL)

Version 0.4.0 (07/23/2002)
Changes to the Code Base
addAdded extract text ant task(BJL)
addImplemented AFM(Adobe Font Metrics) resource loading(BJL)
fixFixed numerous bugs submitted by users(BJL)
updateChanged project from pdfparser to pdfbox to better reflect future needs(BJL)

Version 0.3.0 (07/09/2002)
Changes to the Code Base
addAdded indexer for the lucene project(BJL)
fixInitial implementation of PDF encryption(not working yet)(BJL)

Version 0.2.0 (06/03/2002)
Changes to the Code Base
addAdded support for the various encodings(BJL)
fixImproved the accuracy of the text output(BJL)

Version 0.1.0 (05/25/2002)
Changes to the Code Base
addInitial Version(BJL)