e4Graph and XML

The e4XML library enables XML input to be stored in an e4_Storage object. The XML input is parsed using James Clark's expat parser, and stored in nodes and vertices under a given e4_Node object in an e4_Storage object. This facility is provided by the e4_XMLParser class.

The e4_XMLParser class uses instances of two other classes to process the XML input: e4_XMLInputProcessor and e4_XMLNodeVertexCreator. These classes are intended for subclassing by applications that need to implement non-default behavior, e.g. ignoring part of the input or creating a different e4Graph graph structure than the default. The functionality of these classes is described further below. The behaviors of e4_XMLInputProcessor and e4_XMLNodeVertexCreator are described as part of the description of the e4_XMLParser class.

The e4XML library also provides facilities for producing a character string representing the XML encoding of a node and all its vertices. This functionality is provided by the e4_XMLGenerator class.

The e4_XMLGenerator class uses instances of two other classes to generate the XML output: e4_XMLOutputStream and e4_XMLOutputProcessor. These classes are intended for subclassing by applications that need to produce non-standard XML output representations or to produce the output to a stream-oriented destination such as a file or socket, instead of the default string output destination. The functionality of these classes is described further below. The behaviors of e4_XMLOutputStream and e4_XMLOutputProcessor are described as part of the description of the e4_XMLGenerator class.

The e4_XMLParser Class

The e4XML library provides a class, e4_XMLParser, which can be used to parse a given string of XML input into an e4Graph graph of objects. A new instance of e4_XMLParser must be created for each parse; each instance can be used for only one parse and must be deleted afterwards.

Each instance of e4_XMLParser is associated with an instance of e4_Node. The association must be formed before parsing can commence, either during the construction of the parser, or afterwards by assigning a node to be associated with an existing parser using the SetNode method. The associated node can later be retrieved using the GetNode method, and interim parsing state can be queried through the regular operations on e4_Node instances.

A parse either succeeds or fails. The current state can be queried using the HasError method which returns true if an error was encountered. The reason for the error is stored in a NULL terminated string which is returned by the ErrorString method. The error state can be cleared using the ClearError method. Clearing the error state does not always succeed and the parser may be unable to continue parsing. If that happens, the parser will immediately enter a new error state.

A parse is either in progress or has finished. The current state may be queried using the Finished method, which returns true when the parse is finished.

Thus, there are four distinct states:

Parsing is done through the Parse method, which takes a buffer of input (not necessarily NULL terminated). The Parse method will advance the parse using the provided input.

The XML input is parsed as follows:

The following code snippet shows how the e4_XMLParser class might be used in a C++ program:

#include <stdlib.h>
#include "e4xml.h"
...
e4_XMLParser *parser;
e4_Node n;
char *buf;
size_t len;
...
parser = new e4_XMLParser(n);
...
if (!parser->Parse(buf, len)) {
    cerr << "Parsing encountered an error: \"" 
         << parser->ErrorString() << "\"\n";
}
delete parser;
XML Streaming and Vertex Completion

The e4XML library causes an event to be fired for every vertex that is successfully parsed from the input. The parser provides several APIs, described in the table below, for managing callbacks that are called when the event is fired.

The vertex completion event enables the writing of client programs that react to the availability of parsed XML input before the entire input has been consumed. It therefore enables the use of e4XML for parsing XML streams such as those suggested by the XMMS protocol (see http://www.ietf.org/internet-drafts/draft-ietf-xmpp-im-08.txt and http://www.ietf.org/internet-drafts/draft-ietf-xmpp-core-08.txt).
e4_XMLParser Methods and Constructors
   
e4_XMLParser() Constructor. Creates an empty parser that does not have an associated node.
e4_XMLParser(e4_Node nn) Constructor. Creates a parser associated with the node nn, which must be a valid node.
~e4_XMLParser() Destructor. Destroys the parser and any associated transient state information.
   
void SetNode(e4_Node nn) Associates the node nn with this parser. New input will be stored in new vertices added to the end of the list of vertices in the node nn.
bool GetNode(e4_Node &nn) Retrieves the associated node in nn, if there is an associated node. If not, returns false.
bool Finished() Returns true when the parse has finished successfully, false otherwise.
bool HasError() Returns true if the parse has encountered an error.
const char *ErrorString() Returns a NULL terminated string describing the error, if any, encountered by the parser.
void ClearError() Attempts to clear the error situation and recover; may fail, in which case a subsequent attempt to parse more input will reenter an error state.
bool Parse(char *buf, size_t len) Attempts to parse the XML input in the buffer buf of length len. If the entire buffer was parsed successfully, the operation returns true. If an error was encountered, false is returned.
bool DeclareVertexCompletionCallback(e4_CallbackFunction fn, void *clientData) Declares fn as a function that will be called when e4XML fires the vertex completion event. The function is called with three arguments, the newly parsed vertex, the client data (a void pointer) supplied in this call, and a call site data (another void pointer) that is supplied where the event is fired, inside e4XML. The result is true if the callback was registered successfully, false otherwise.
bool DeleteVertexCompletionCallback(e4_CallbackFunction fn, void *clientData) Deletes a previously declared vertex completion callback function. Returns true if the callback was successfully deleted, false otherwise.
const unsigned char *Base64_Decode(const char *base64Str, int *nbytes) As a convenience, the parser provides a method to decode a character string encoded in BASE64 (RFC 1341) into a binary value. The output argument nbytes contains the length of the byte sequence returned. The memory occupied by the returned binary value is managed by the parser and must be copied by your application.
   

The e4_XMLInputProcessor Class

This is a semi-internal class that is of interest to programmers that want to modify the behavior of the parser. It provides methods, called by the parser at appropriate points during the parse, that a user program can override and specialize for specific desired behavior.

The e4_XMLNodeVertexCreator class

This is a semi-internal class that is of interest to programmers that want to modify the behavior of the parser. It provides methods, called by the parser at appropriate points during the parse, that a user program can override and specialize for specific desired behavior.

The e4_XMLGenerator Class

The e4XML library provides a class, e4_XMLGenerator, which can be used for creating an XML string representing a given node and all its vertices, recursively. A new instance of e4_XMLGenerator must be used each time you want to generate XML output from a node.

Each instance of e4_XMLGenerator is associated with an instance of e4_Node, the node from which XML output is generated. The association must be formed before XML generation can occur, by either giving the associated node in the constructor of e4_XMLGenerator, or by using the SetNode method. The current associated node is returned by the GetNode method.

XML output generated from the associated node is wrapped by an XML tag determined either at construction time or set later by using the SetElementName method. The currently set wrapping XML tag is returned by the GetElementName method.

XML output is generated with the Generate method, which takes no arguments and returns the XML output as a NULL terminated string. You can get the XML output string from a previous invocation of Generate using the Get method. The memory occupied by the string returned by Generate and Get is owned by the XML generator; if your program wishes to keep the value around, it must be copied.

The generated XML output string reverses the process of parsing XML input using the e4_XMLParser class described above. Specifically:

Note that the generator is very naive and may cause a stack overflow while descending into the associated node's reachable graph structure. A future version of the generator may fix this problem by using iteration instead of recursive descent to visit all reachable nodes and vertices. However, the generator is guaranteed to finish generating output in bounded time; it will not recurse infinitely given circular data structures.

The following code snippet shows how you might use the e4_XMLGenerator class in your C++ programs:

#include "e4xml.h"
...
e4_XMLGenerator *gen;
e4_Node n;
char *xml;
...
gen = new e4_XMLGenerator(n, "hello");
...
xml = gen->Generate();
...
cerr << "Generated XML: \" << xml << "\"\n";
...
delete gen;

 
e4_XMLGenerator Constructors and Methods
   
e4_XMLGenerator() Creates an empty instance. Subsequently you should associate an instance of e4_Node with this XML generator using the SetNode method, and a wrapping XML element tag name, using the SetElementName method.
e4_XMLGenerator(e4_Node n, char *elementName) Creates an instance with an associated instance of e4_Node and a wrapping XML element  tag name.
e4_XMLGenerator(e4_Node n, char *elementName, bool exportXML) Creates an instance with an associated instance of e4_Node and a wrapping XML element  tag name. Also determines whether pure XML will be generated (i.e. exact reproduction of the XML that could have been used to create the given node) or a slightly marked up version that contains __nodeid__ attributes that allow circular graphs to be represented.
~e4_XMLGenerator() Destructor. Frees memory associated with this XML generator.
   
void SetNode(e4_Node n) Sets the associated node for this XML generator.
void SetElementName(char *elementName) Sets the wrapping XML element tag name for this XML generator.
void SetElementNameAndNode(e4_Node n, char *elementName) Sets both the associated node and the wrapping XML element tag name for this XML generator.
void GetNode(e4_Node &n) const Retrieves the associated node.
char *GetElementName() const Retrieves the wrapping XML element tag name. This retrieves the XML element tag name string used by the generator itself, not a copy.
char *Generate() Generates and returns the XML output representing the associated node, wrapped in the wrapping XML tag name. The returned string's memory is owned by the generator, so your application must copy it if the value should be preserved. If an error occurs or the generator is not ready to produce XML (no associated node or wrapping XML tag name was previously given) then NULL is returned.
char *Get() Retrieves the XML output previously generated by a sucessful call to Generate. The returned string's memory is managed by the generator and must be copied by your application. If you delete the instance of e4_XMLGenerator to which this string belongs, you can no longer use the value returned by Get().
char *Base64_Encode(unsigned char *bytes, int nbytes) const As a convenience, the class provides a method to encode binary data of a given length as a BASE64 character string. The memory occupied by the returned string is owned by the generator and must be copied by your application.
void SetExportXML(bool exporting) If the argument is true, after this call, pure XML will be generated that does not contain __nodeid__ attributes. This XML is an exact reproduction of the XML that could have been used to produce the associated node. If the argument is false, after this call, XML will be generated that does contain __nodeid__ attributes.
bool IsExportXML() const Returns a boolean value indicating whether pure XML or slightly marked up XML will be generated. If this method returns true then pure XML will be generated. If it returns false, the default, then XML with additional __nodeid__ attributes will be generated.
   

The e4_XMLOutputStream Class

The e4_XMLOutputProcessor Class