The e4_Storage Class

The e4_Storage class provides persistent storage of nodes and vertices. The representation of the physical storage is hidden in a driver, selected at construction time. Drivers for various databases can be easily constructed. The e4Graph package comes with a driver for MetaKit based storage.

A user application may have any number of storages open at a given time, limited only by the available machine resources. Each storage can use a different driver, storing its data in a variety of representations such as flat files or relational databases. The e4_Storage class provides methods to retrieve the name and driver identifier that were used to open the storage.

The storage representation is reference counted and is closed automatically when the last reference is discarded. The e4Graph package strongly encourages a programming style that uses stack allocated instances and so-called dot-based method invocation. Programming with pointers to instances allocated on the heap is possible but cumbersome and, unless care is taken, may lead to reference counting errors and memory leaks.

The e4_Storage class provides assignment and comparison operators, as shown in the following code snippet:

e4_Storage s("mystorage", E4_METAKIT);
...
if (s.IsValid()) {
    cout << "The storage \"" s.GetName() "\" is valid\n";
}
...
e4_Storage another = s;
...
if (s == another) {
    cout << "Yes, they are one and the same!\n";
}
If auto-commit is turned on, any changes are committed to the storage at the time it is closed. As explained above, a storage is closed when the application drops the last reference to the storage. Auto-commit is turned on by default. Changes can also be committed at any time under the control of the user program. A storage is considered unstable if any node or vertex in it has been modified and the change has not yet been committed. In the following example, the storage is automatically closed and committed when the instance s goes out of scope:
{
    e4_Storage s("mystorage", E4_METAKIT);
    ...
}
The global variable invalidStorage refers to a constant instance of e4_Storage that is guaranteed to be invalid. You can assign this instance to a local e4_Storage variable to discard the reference to another storage it contains, as shown in the following example:
e4_Storage s("mystorage", E4_METAKIT);
...
s = invalidStorage;
if (s.IsValid()) {
    cerr << "Something fishy here!\n";
}
In the above example, the assignment to s causes the number of references to the storage named mystorage to drop to zero, and it is automatically closed and committed if needed. When using some C++ compilers, you may need to assign invalidStorage to instance variables of e4_Storage embedded within heap allocated structures before these structures are freed, to ensure that the reference count of any storages referenced by the embedded instances is correct. When a storage is closed, any e4_Node and e4_Vertex instances held by the user program that refer to elements within that storage also become invalid.

Each storage has a root node. The root node can be retrieved using the GetRootNode method and assigned using the SetRootNode method.

While a storage is open, it may also contain detached nodes and vertices. A vertex is detached if it is not contained within a node, and a node is detached if it is the value of only detached vertices. All detached entities within a storage are recycled when the storage is closed.

Memory Management

Memory space in a storage is recycled by a built in garbage collection mechanism. The garbage collector uses reachability to determine which memory space is in use and which can be recycled. A node is reachable if it is the current root node, if it is referenced by the user program, or if it can be visited from a reachable node by traversing one or more vertices. Similarly, a vertex is reachable if it is contained within a reachable node. All nodes and vertices that are not reachable are recycled when the garbage collector runs.

The e4_Storage class provides methods to control the behavior of the garbage collector. Garbage collection normally runs occassionally when e4Graph determines that it can recover usable space. User programs can also instruct e4Graph to run the garbage collector immediately by calling the DoGC method. User programs can defer garbage collecion by calling SetState with a mask that does not contain E4_AUTOGC. This will defer garbage collection until a subsequent call to SetState with a mask containing E4_AUTOGC, or until the user program calls DoGC(). Finally, user programs can query whether garbage collection is currently deferred by calling GetState and checking whether E4_AUTOGC is set; if not, then garbage collection is deferred.

Additionally, user programs can request that garbage collection occur at the time a storage is opened, by passing a state flag containing the constant E4_GCATOPEN. And by calling SetState() with a mask containing E4_GCBEFORECOMMIT, user programs can request that a garbage collection be performed before the storage state is committed during a call to Commit().

The cost of garbage collection is roughly proportional to the number of reachable entities within a storage, because the garbage collector must traverse the entire reachable graph to determine reachability. Therefore, user programs that need precise control over their responsiveness and which cannot tolerate small pauses, might want to defer garbage collection and rely on calling DoGC() themselves when needed.

Note that garbage collection at commit time will not recycle storage occuppied by detached nodes and vertices that are referenced from the user application. These nodes and vertices become recycleable only when the storage is closed, or when the last reference to them is dropped by a user program. If E4_COMMITATCLOSE and E4_GCBEFORECOMMIT are both set for a storage, then any detached nodes and vertices will be recycled when that storage is closed.

Events, Timestamps, and Callbacks

Whenever an application makes significant changes to a storage, such as adding a node or vertex, or when a node or vertex becomes detached, e4Graph records an event, giving it a timestamp, a non-negative monotonously increasing integer value. Timestamps are per-storage and not persistent, so if the application closes an open storage and reopens it later, all its associated timestamps are reset to zero.

For some events, e4Graph also delivers the event to callback functions registered by the application. For events that are not delivered to callback functions, only the timestamp at which the event occurred is recorded, and the user application can later poll for whether the event has occurred. For example, when a node referenced by the user application becomes detached, e4Graph fires a node detach event that can be intercepted by a callback registered with that storage by the application The callback receives the node that just became detached as an argument; what the callback does is determined by the application.

Events are recorded and potentially fired when the application explicitly requests the changes through the appropriate operation on a storage, node or vertex, or when the application implicitly causes the change, such as when a node becomes detached because it is referenced by the user program but not otherwise reachable. Events are delivered to registered callbacks only for nodes and vertices that are referenced by the application using e4Graph. Events are not delivered for those entities that are not currently referenced by the application program. Timestamps, however, are recorded for events whether or not the affected entity is referenced by the application using e4Graph.

The following events are delivered to callbacks registered by an application using e4Graph:
 
Events Fired by e4Graph:
Event Name Event Code When Delivered:
     
Storage Change E4_ECCHANGESTG After the storage changes from stable to unstable or from unstable to stable. To review, a storage is stable if all of its contents have been committed; it is unstable if it contains any changes that have not yet been committed.
Node Addition E4_ECADDNODE After a new node is added to a storage.
Node Detach E4_ECDETNODE Asynchronously, some time after a node is determined to be detached. The callback may be delayed until a garbage collection is caused.
Node Attach E4_ECATTNODE After a detached node becomes attached again.
Node Modification E4_ECMODNODE After a node is modified in one of the following ways: a vertex was added or removed, a vertex in this node is renamed, vertices in the node are reordered, the node loses or gains a parent.
Vertex Addition E4_ECADDVERTEX After a new vertex is added to a storage.
Vertex Detach E4_ECDETVERTEX Asynchronously, after a vertex is determined to be detached. The callback may be delayed until a garbage collection is caused.
Vertex Attach E4_ECATTVERTEX After a detached vertex becomes attached again.
Vertex Modification E4_ECMODVERTEX Right after a vertex value is modified.
     

Additionally, the following events are only recorded and not delivered to registered callback functions:
 
Events Recorded by e4Graph:
Event Name Event Code When Recorded:
     
Storage Open E4_ECOPENSTG When a storage is opened (the timestamp associated with this event is always 1).
Storage CopyTo E4_ECCOPYTOSTG When this storage is used as the target of a CopyTo() call.
Storage CopyFrom E4_ECCOPYFRMSTG When this storage is used as the source of a CopyTo() call.
Storage Reroot E4_ECSETSTGROOT When the root node of this storage is set to some other node.
     

Callback functions have the following type:


typedef void (*e4_CallbackFunction)(void *clientData, const e4_RefCount &ref, void *callsiteData);

The application can register, through interfaces described below, any number of callback functions for each event for a given storage. At callback registration time the application supplies the address of the function to call as well as as an arbitrary client data argument accessed through a void * pointer. The client data is passed as the first argument to the callback function when the event is fired, and the e4Graph storage element (storage, node or vertex, respectively) on which the event occurs is passed as the second argument. The third parameter is another application-provided datum provided by the code that causes the event.

Here's an example vertex detach callback function that simply prints the name of the vertex that became detached:


void VertexDetachCB(void *clientData, const e4_RefCount &r, void *csdata)
{
    e4_Vertex v = (e4_Vertex) r;

    if (v.IsValid()) {
        cout << "Vertex \"" << v.Name() << "\" ("
	     << v.GetTemporaryUID() << ") is detached\n";
    }
}

void PrepareCallback(e4_Storage s)
{
    s.DeclareCallback(E4_ECDETVERTEX, VertexDetachCB, NULL);
}

The vertex modify and node modify event callback functions receive one of the values defined in the vertex modify reasons or the node modify reasons enums, respectively. This tells the callback functions the specific kind of modification made to the vertex or node.

An application can register the same callback function several times, with different client data pointers. In that case, the callback function will be invoked several times for each event, each time with the appropriate client data and the e4Graph storage element on which the event occurs.

When several callback functions are registered for an event, the order in which they are called is undetermined. Therefore applications should not rely on the order in which callback functions are invoked for a specific event. Similarly, when an event applies to several entities (e.g. detaching several vertices), the order in which callback functions are called is undetermined.

A single detach event, e.g. a vertex detach, may cause implicit detaching of a large number of other entities because they become unreachable from other entities in the storage but are still referenced by the user program. As explained above, detach events may be delayed, if the storage has deferred garbage collection, until a garbage collection is caused.

All detach callback functions are called after the current garbage collection has finished. Detach callback functions are free to do anything, including reattaching the node or vertex, or dropping the last reference to the entity. However, callback functions should still be implemented carefully, to avoid actions that may cause infinite loops. For example, a vertex modification callback function should not modify the value of the same or another vertex, otherwise an infinite chain of callback events is created.

To cancel a specific callback function for an event, your application can use the appropriate interface described below, supplying the same arguments as used when the callback function was registered. After the callback function is unregistered, that callback function will no longer be called with that client data when the event is fired; other callback functions registered for that event continue to be called when the event is fired. All registered callback functions are automatically removed when the storage on which they are registered is closed.

Events predefined by e4Graph are encoded by one of the constants defined here. When an application registers a callback for a specific predefined event, it uses one of these predefined constants.

The event mechanism is extensible by applications using e4Graph. Applications can register and unregister their own event types, and can cause these events to be fired at application-chosen points during execution. When an application causes an event to be fired, all registered callbacks for that event will be called, just as for the predefined event types shown in the table above.

Methods and Constructors of e4_Storage

The following methods are defined for the e4_Storage class:
 
e4_Storage Static Methods, Constructors and Instance Methods:
   
static const char *version() Returns a version string describing the version of e4Graph implemented by this library. The format is <major>.<minor><status><level> where major and minor are the major and minor version numbers respectively. The status part is either a, b, p, or . indicating respectively an alpha, beta, patch, or final release. The level part indicates the sequential alpha or beta release of this major.minor release, or its patchlevel. The memory occupied by the string is owned by e4Graph and may be reused or freed by e4Graph as soon as the next e4Graph API is called.
static int major_version() const Returns an integer value representing the major version of this release of e4Graph.
static int minor_version() const Returns an integer value representing the minor version of this release of e4Graph.
static e4_ReleaseStatus release_status() const Returns a constant denoting the release status of this release of e4Graph.
static int release_iteration() const Returns an integer representing the iteration number for the specific release status of this release of e4Graph. For example, if the release status is E4_ALPHARELEASE then this number would identify the Alpha release represented by the running code.
static const char *storage_version(const char *fname, const char *dname) Returns a version string describing the version of e4Graph that wrote the named storage. The format is the same as for e4_Storage::version(). The memory occupied by the returned string is owned by e4Graph and may be reused or freed by e4Graph as soon as the next e4Graph API call.
static bool storage_version_info(const char *fname, const char *dname, int &majver, int &minver, e4_ReleaseStatus &rs, int &ri) Returns, in the supplid output variables, individual parts of the version information for the version of e4Graph that wrote the named storage.
static bool DefineEventCode(int &eventCode) This static method defines and reserves a new event code, returned in the output parameter eventCode. Note that this defines a new event that can be caused on all storages. e4Graph does not provide a way to define events for a specific storage.
static bool UndefineEventCode(int eventCode) This static method undefines an event code previously defined with DefineEventCode. Use this only on application defined event codes. Attempting to undefine an event code for a predefined event will fail and the method will return false.
static bool IsEventCodeDefined(int eventCode) Returns true if eventCode is the event code for a predefined event or was returned from a call to DefineEventCode.
   
e4_Storage() Default constructor. Returns a storage that is not connected to a persistent representation and is invalid.
e4_Storage(const e4_Storage &ref) Constructs a storage by copying the state of ref. The new storage and ref refer to the same underlying persistent representation.
e4_Storage(const char *name, const char * storageKind) Constructor. Returns a storage with the given name, using the storage driver identified by the storageKind argument. This constructor always performs a cleanup of the storage to remove any remaining unreferenced detached elements before access to the storage is given to the caller. Opened with permissions defined by E4_SPDEFAULTMASK.
e4_Storage(const char *name, const char *storageKind, int modes) Constructor. Returns a storage with the given name, using the storage driver identified by the storageKind argument. The modes argument allows the caller to preset various modes for the newly opened storage; see the definition of the various supported modes here Opened with permissions defined by E4_SPDEFAULTMASK.
e4_Storage(const char *name, const char *storageKind, int modes, int permissions) Constructor. Returns a storage as above, and with the specified permissions.
~e4_Storage() Destructor. The underlying representation is reference counted and closed automatically when the last reference to it is discarded. If auto-commit is turned on, then changes to the storage are committed when the last reference is discarded.
bool operator==(const e4_Storage &comp) const Returns true if comp refers to the same storage instance as this or if both are invalid, false otherwise.
bool operator!=(const e4_Storage &comp) const Returns true if comp does not refer to the same storage instance as this, false if they are the same or if both are invalid.
e4_Storage & operator=(const e4_Storage &ref)  Copies the state of ref to this e4_Storage instance and returns this.
   
int SetState(int modes) const Sets various behavior modes for the storage. The various modes are described here.
int GetState() const Retrieves a bitmask with bits turned on for every mode that is currently turned on for this storage. The various modes are described here.
bool Commit() const Commits any changes to the storage at this time. Returns true if the commit succeeded, false otherwise.
bool Delete() Deletes the underlying storage. No events are fired because of the deletion of storage elements. If the operation succeeds, this returns true.
bool CopyTo(e4_Graph otherStorage, bool forceCommit) const Copies the e4Graph contents of this storage to otherStorage. The previous contents of otherStorage is deleted, all references held by the user program to entities in otherStorage become invalid, and no events are fired because of the deletion. If forceCommit is true, then otherStorage is committed after the copy is done. After this operation, the e4Graph contents of this storage and otherStorage are identical. Changes made to one storage after the copy are not reflected in the other storage. Callbacks registered for otherStorage stay in effect and will be called when events fire after the copy if changes are made to otherStorage.
bool IsStable() const Returns false when the storage has been modified and not yet committed. Returns true otherwise.
void MarkUnstable() const Marks the storage as unstable, i.e. it has been modified and not yet committed. For all explicit modifications, e4Graph takes care of marking the storage unstable. However, this method is provided to let the application define additional situations in which the storage is to be considered unstable.
bool GetRootdNode(e4_Node &n) const Retrieves the current root node.
bool SetRootNode(e4_Node n) const Makes the node denoted by n be the root node. That node must be within this storage and be valid.
bool CreateDetachedNode(e4_Node &n) const Assigns to n a new detached node created within this storage.
bool CreateDetachedVertex(const char *m, e4_Node n, e4_Vertex &v) const Assigns to v a new detached vertex created within this storage that has the node n as its value.
bool CreateDetachedVertex(const char *nm, int i, e4_Vertex &v) const Assigns to v a new detached vertex created within this storage that has the integer i as its value.
bool CreateDetachedVertex(const char *nm, double d, e4_Vertex &v) const Assigns to v a new detached vertex created within this storage that has the double d as its value.
bool CreateDetachedVertex(const char *nm, const char *s, e4_Vertex &v) const Assigns to v a new detached vertex created within this storage that has the NULL terminated string s as its value.
bool CreateDetachedVertex(const char *nm, const void *b, int nb, e4_Vertex &v) const Assigns to v a new detached vertex created within this storage that has the uninterpreted binary value constructed from b and nb as it value.
bool CreateDetachedVertex(const char *nm, const e4_Value &vv, e4_Vertex &v) const Assigns to v a new detached vertex created within this storage that has the value and type of vv as its value.
bool GetNodeFromID(e4_NodeUniqueID id, e4_Node &n) const Given a valid unique ID obtained from a node using GetUniqueID, retrieves the corresponding node.
bool GetVertexFromID(e4_VertexUniqueID id, e4_Vertex &v) const Given a valid unique ID obtained from a vertex using GetUniqueID, retrieves the corresponding vertex.
bool IsValid() const Returns true if the storage is valid, false otherwise. A storage is valid if it designates an open persistent storage.
void DoGC() const Causes a garbage collection to be executed in this storage. This call returns when the garbage collection has finished and after all detach events have fired.
bool NeedsGC() const Returns true if this storage contains unreclaimed unreachable entities.
const char *GetName() const Returns the name of the storage. Returns NULL if the storage is invalid. Note that the memory occupied by the returned string is owned by e4Graph and may be reused by the next e4Graph method invocation.
const char *GetDriver() const Returns the StorageKind used to open this storage. Returns NULL  if the storage is invalid. Note that the memory occupied by the returned string is owned by e4Graph and may be reused by the next e4Graph method invocation.
bool GetStatistic(e4_Space sp, e4_SpaceStat st, int &v) const Retrieves statistics about the use of various allocation spaces within a storage. If the operation succeeds, returns true and v contains the statistic measure selected by sp and st.
bool DeclareCallback(int eventCode, e4_CallbackFunction fn, void *clientData) Declares that the callback function fn will be called whenever the event defined by eventcode occurs in this storage. When fn is called, the first parameter is the value given in clientData.
bool DeleteCallback(int eventCode, e4_CallbackFunction fn, void *clientData) Deletes a callback function previously declared with DeclareCallback.
bool CauseEvent(int eventCode, const e4_RefCount &r, void *callsiteData) Causes the event defined by eventCode to be fired on this storage on the e4Graph entity r, which can be a storage, node or vertex. All callback functions previously registered for this event are called with the client data as the first parameter, and r as the second parameter. The callsiteData parameter is an application-defined value passed as the third to each callback function. Use this only for causing application defined events. Attempting to cause a predefined event will fail and false will be returned.
bool CauseEvent(int eventCode, const e4_RefCount &r, void *callsiteData, int &timestamp) Same as above, but also returns the timestamp for the event in timestamp.
int GetTimeStamp() const Retrieves the timestamp for the last recorded event. Returns -1 if the storage is invalid.
int GetTimeStampFor(int eventmask) const Retrieves the timestamp for the last event, chronologically, whose event code is present in eventmask; the event mask is formed by OR-ing together event codes described here and user defined event codes. Returns -1 if the storage is invalid. Returns 0 if none of the events whose event codes are present in eventmask have ever occurred.
bool HasOccurredSince(int timestamp, int eventmask) const Returns true if any of the events whose event code is present in eventmask has occurred since the timestamp timestamp. Returns false if the storage is invalid, or if the given timestamp is negative or in the future. Use a timestamp value of zero to query if any of these events have ever occurred for this storage. eventmask is formed by OR-ing together the event codes defined here and user defined event codes.
e4_RefKind Kind() const Returns E4_RKSTORAGE, the e4_RefKind identifier for the e4_Storage type.