PoDoFo::PdfParser Class Reference

#include <PdfParser.h>

Inheritance diagram for PoDoFo::PdfParser:

PoDoFo::PdfTokenizer

List of all members.

Public Member Functions

 PdfParser (PdfVecObjects *pVecObjects)
 PdfParser (PdfVecObjects *pVecObjects, const char *pszFilename, bool bLoadOnDemand=true)
 PdfParser (PdfVecObjects *pVecObjects, const char *pBuffer, long lLen, bool bLoadOnDemand=true)
 PdfParser (PdfVecObjects *pVecObjects, const PdfRefCountedInputDevice &rDevice, bool bLoadOnDemand=true)
virtual ~PdfParser ()
void ParseFile (const char *pszFilename, bool bLoadOnDemand=true)
void ParseFile (const char *pBuffer, long lLen, bool bLoadOnDemand=true)
void ParseFile (const PdfRefCountedInputDevice &rDevice, bool bLoadOnDemand=true)
bool QuickEncryptedCheck (const char *pszFilename)
const PdfVecObjectsGetObjects () const
EPdfVersion GetPdfVersion () const
const char * GetPdfVersionString () const
const PdfObjectGetTrailer () const
bool GetLoadOnDemand () const
bool IsLinearized () const
size_t GetFileSize () const
bool GetEncrypted () const
const PdfEncryptGetEncrypt () const
PdfEncryptTakeEncrypt ()
void SetPassword (const std::string &sPassword)

Protected Member Functions

void FindToken (const char *pszToken, const long lRange)
void ReadDocumentStructure ()
void HasLinearizationDict ()
void MergeTrailer (const PdfObject *pTrailer)
void ReadTrailer ()
void ReadXRef (long *pXRefOffset)
void ReadXRefContents (long lOffset, bool bPositionAtEnd=false)
void ReadXRefSubsection (long &nFirstObject, long &nNumObjects)
void ReadXRefStreamContents (long lOffset, bool bReadOnlyTrailer)
void ReadObjects ()
void ReadObjectsInternal ()
void ReadObjectFromStream (int nObjNo, int nIndex)
bool IsPdfFile ()


Detailed Description

PdfParser reads a PDF file into memory. The file can be modified in memory and written back using the PdfWriter class. Most PDF features are supported

Constructor & Destructor Documentation

PoDoFo::PdfParser::PdfParser ( PdfVecObjects pVecObjects  ) 

Create a new PdfParser object You have to open a PDF file using ParseFile later.

Parameters:
pVecObjects vector to write the parsed PdfObjects to
See also:
ParseFile

PoDoFo::PdfParser::PdfParser ( PdfVecObjects pVecObjects,
const char *  pszFilename,
bool  bLoadOnDemand = true 
)

Create a new PdfParser object and open a PDF file and parse it into memory.

Parameters:
pVecObjects vector to write the parsed PdfObjects to
pszFilename filename of the file which is going to be parsed
bLoadOnDemand If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory.
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.

See also:
SetPassword

PoDoFo::PdfParser::PdfParser ( PdfVecObjects pVecObjects,
const char *  pBuffer,
long  lLen,
bool  bLoadOnDemand = true 
)

Create a new PdfParser object and open a PDF file and parse it into memory.

Parameters:
pVecObjects vector to write the parsed PdfObjects to
pBuffer buffer containing a PDF file in memory
lLen length of the buffer containing the PDF file
bLoadOnDemand If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory.
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.

See also:
SetPassword

PoDoFo::PdfParser::PdfParser ( PdfVecObjects pVecObjects,
const PdfRefCountedInputDevice rDevice,
bool  bLoadOnDemand = true 
)

Create a new PdfParser object and open a PDF file and parse it into memory.

Parameters:
pVecObjects vector to write the parsed PdfObjects to
rDevice read from this PdfRefCountedInputDevice
bLoadOnDemand If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory.
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.

See also:
SetPassword

PoDoFo::PdfParser::~PdfParser (  )  [virtual]

Delete the PdfParser and all PdfObjects


Member Function Documentation

void PoDoFo::PdfParser::FindToken ( const char *  pszToken,
const long  lRange 
) [protected]

Searches backwards from the end of the file and tries to find a token. The current file is positioned right after the token.

Parameters:
pszToken a token to find
lRange range in bytes in which to search begining at the end of the file

const PdfEncrypt* PoDoFo::PdfParser::GetEncrypt (  )  const [inline]

Returns:
the parsers encryption object or NULL if the read PDF file was not encrypted

bool PoDoFo::PdfParser::GetEncrypted (  )  const [inline]

Returns:
true if this PdfWriter creates an encrypted PDF file

size_t PoDoFo::PdfParser::GetFileSize (  )  const [inline]

Returns:
the length of the file

bool PoDoFo::PdfParser::GetLoadOnDemand (  )  const [inline]

Returns:
true if this PdfParser loads all objects on demand at the time they are accessed for the first time. The default is to load all object immediately. In this case false is returned.

const PdfVecObjects * PoDoFo::PdfParser::GetObjects (  )  const [inline]

Get a reference to the sorted internal objects vector.

Returns:
the internal objects vector.

EPdfVersion PoDoFo::PdfParser::GetPdfVersion (  )  const [inline]

Get the file format version of the pdf

Returns:
the file format version as enum

const char * PoDoFo::PdfParser::GetPdfVersionString (  )  const

Get the file format version of the pdf

Returns:
the file format version as string

const PdfObject * PoDoFo::PdfParser::GetTrailer (  )  const [inline]

Get the trailer dictionary which can be written unmodified to a pdf file.

void PoDoFo::PdfParser::HasLinearizationDict (  )  [protected]

Checks wether this pdf is linearized or not. Initializes the linearization directory on sucess.

bool PoDoFo::PdfParser::IsLinearized (  )  const [inline]

Returns:
whether the parsed document contains linearization tables

bool PoDoFo::PdfParser::IsPdfFile (  )  [protected]

Checks the magic number at the start of the pdf file and sets the m_ePdfVersion member to the correct version of the pdf file.

Returns:
true if this is a pdf file, otherwise false

void PoDoFo::PdfParser::MergeTrailer ( const PdfObject pTrailer  )  [protected]

Merge the information of this trailer object in the parsers main trailer object.

Parameters:
pTrailer take the keys to merge from this dictionary.

void PoDoFo::PdfParser::ParseFile ( const PdfRefCountedInputDevice rDevice,
bool  bLoadOnDemand = true 
)

Open a PDF file and parse it.

Parameters:
rDevice the input device to read from
bLoadOnDemand If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory.
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.

See also:
SetPassword

void PoDoFo::PdfParser::ParseFile ( const char *  pBuffer,
long  lLen,
bool  bLoadOnDemand = true 
)

Open a PDF file and parse it.

Parameters:
pBuffer buffer containing a PDF file in memory
lLen length of the buffer containing the PDF file
bLoadOnDemand If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory.
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.

See also:
SetPassword

void PoDoFo::PdfParser::ParseFile ( const char *  pszFilename,
bool  bLoadOnDemand = true 
)

Open a PDF file and parse it.

Parameters:
pszFilename filename of the file which is going to be parsed
bLoadOnDemand If true all objects will be read from the file at the time they are accesed first. If false all objects will be read immediately. This is faster if you do not need the complete PDF file in memory.
This might throw a PdfError( ePdfError_InvalidPassword ) exception if a password is required to read this PDF. Call SetPassword with the correct password in this case.

See also:
SetPassword

bool PoDoFo::PdfParser::QuickEncryptedCheck ( const char *  pszFilename  ) 

Quick method to detect secured PDF files, i.e. a PDF with an /Encrypt key in the trailer directory.

Returns:
true if document is secured, false otherwise

void PoDoFo::PdfParser::ReadDocumentStructure (  )  [protected]

Reads the xref sections and the trailers of the file in the correct order in the memory and takes care for linearized pdf files.

void PoDoFo::PdfParser::ReadObjectFromStream ( int  nObjNo,
int  nIndex 
) [protected]

Read the object with index nIndex from the object stream nObjNo and push it on the objects vector m_vecOffsets.

All objects are read from this stream and the stream object is free'd from memory. Further calls who try to read from the same stream simply do nothing.

Parameters:
nObjNo object number of the stream object
nIndex index of the object which should be parsed

void PoDoFo::PdfParser::ReadObjects (  )  [protected]

Reads all objects from the pdf into memory from the offsets listed in m_vecOffsets.

If required an encryption object is setup first.

The actual reading happens in ReadObjectsInternal() either if no encryption is required or a correct encryption object was initialized from SetPassword.

void PoDoFo::PdfParser::ReadObjectsInternal (  )  [protected]

Reads all objects from the pdf into memory from the offsets listed in m_vecOffsets.

Requires a correctly setup PdfEncrypt object with correct password.

This method is called from ReadObjects or SetPassword.

See also:
ReadObjects

SetPassword

void PoDoFo::PdfParser::ReadTrailer (  )  [protected]

Read the trailer directory at the end of the file.

void PoDoFo::PdfParser::ReadXRef ( long *  pXRefOffset  )  [protected]

Looks for a startxref entry at the current file position and saves its byteoffset to pXRefOffset.

Parameters:
pXRefOffset store the byte offset of the xref section into this variable.

void PoDoFo::PdfParser::ReadXRefContents ( long  lOffset,
bool  bPositionAtEnd = false 
) [protected]

Reads the xref table from a pdf file. If there is no xref table, ReadXRefStreamContents() is called.

Parameters:
lOffset read the table from this offset
bPositionAtEnd if true the xref table is not read, but the file stream is positioned directly after the table, which allows reading a following trailer dictionary.

void PoDoFo::PdfParser::ReadXRefStreamContents ( long  lOffset,
bool  bReadOnlyTrailer 
) [protected]

Reads a xref stream contens object

Parameters:
lOffset read the stream from this offset
bReadOnlyTrailer only the trailer is skipped over, the contents of the xref stream are not parsed

void PoDoFo::PdfParser::ReadXRefSubsection ( long &  nFirstObject,
long &  nNumObjects 
) [protected]

Read a xref subsection

Throws ePdfError_NoXref if the number of objects read was not the number specified by the subsection header (as passed in `nNumObjects').

Parameters:
nFirstObject object number of the first object
nNumObjects how many objects should be read from this section

void PoDoFo::PdfParser::SetPassword ( const std::string &  sPassword  ) 

If you try to open an encrypted PDF file, which requires a password to open, PoDoFo will throw a PdfError( ePdfError_InvalidPassword ) exception.

If you got such an exception, you have to set a password which should be used for opening the PDF.

The usual way will be to ask the user for the password and set the password using this method.

PdfParser will immediately continue to read the PDF file.

Parameters:
sPassword a user or owner password which can be used to open an encrypted PDF file If the password is invalid, a PdfError( ePdfError_InvalidPassword ) exception is thrown!

PdfEncrypt * PoDoFo::PdfParser::TakeEncrypt (  )  [inline]

Takes the encryption object fro mthe parser. The internal handle will be set to NULL and the ownership of the object is given to the caller.

Only call this if you need access to the encryption object before deleting the parser.

Returns:
the parsers encryption object or NULL if the read PDF file was not encrypted


Generated on Sat May 2 02:50:32 2009 for PoDoFo by  doxygen 1.5.8