Developer's Guide

1. Overview

This section provides a quick but broad introduction to the APIs and interfaces provided by eXist. We begin with an overview of how to configure eXist for XQuery to begin creating and executing XQuery scripts for web applications. For this, we look at how eXist uses either the XQueryServlet servlet or XQueryGenerator (Cocoon) to generate output from XQuery scripts. In section 3, we look at the basic REST-style API and its available HTTP request operations. Following that, in sections 4 and 5, we address Java programmers, and focus on the XML:DB API - a standard Java API used to access native XML database services - and its extensions. In sections 6 and 7, we discuss other ways to integrate eXist with Cocoon, including the XMLDBTransformer, and XSP Logicsheets. Sections 8 and 9 examine the network APIs for XML-RPC and its methods - this includes the use of XUpdate. SOAP interface is discussed as an alternative to XML-RPC in section 10. Finally, in section 11, we include an important appendix of libraries required to implement specific APIs and interfaces.

2. Writing Web Applications using XQuery

Not only is XQuery a powerful query language, it is also a functional programming language with strong typing features, and can therefore be used to implement the entire processing logic of a web application. Because of this functionality, much of the Java code of eXist web applications has gradually been replaced by XQuery scripts. As a result, eXist provides library modules for getting request parameters, getting/setting session attributes, encoding URLs and so on. Developers who have worked with other functional languages such as Lisp or Scheme will find XQuery very easy to learn.

2.1. XQueryServlet and XQueryGenerator (Cocoon)

eXist generates HTML web pages from XQuery files in two ways: the XQueryServlet and XQueryGenerator. With both the XQueryServlet and XQueryGenerator the compiled XQuery script is stored in a cache for future use. For this, eXist compiles XQuery into a tree of expression objects, which can be repeatedly executed. This code will only be recompiled if the source file has changed.

XQueryServlet

This servlet responds to URL-patterns (e.g. *.xql and *.xqy) as defined in the web.xml configuration file of the application. The servlet will interpret this pattern as pointing to a valid XQuery file. The XQuery file is then loaded, compiled and executed, and the results are then sent to the client.

To use the servlet, you must define the URL-patterns for your web application by adding the following to the WEB-INF/web.xml configuration file:

Example: Configuration for the Servlet

<web-app>
    <display-name>eXist Server</display-name>
    <description>eXist Server Setup</description>
    
    <servlet>
        <servlet-name>org.exist.http.servlets.XQueryServlet</servlet-name>
        <servlet-class>org.exist.http.servlets.XQueryServlet</servlet-class>

        <init-param>
            <param-name>uri</param-name>
            <param-value>xmldb:exist:///db</param-value>
        </init-param>
    </servlet>
    
    <servlet-mapping>
	  <servlet-name>org.exist.http.servlets.XQueryServlet</servlet-name>
	  <url-pattern>*.xql</url-pattern>
    </servlet-mapping>
</web-app>

This will configure the servlet to respond to any URL-pattern ending with the .xql file extension as specified in <servlet-mapping> . Note that the .xq is specifically NOT used for the <url-pattern> definition so as not to interfere with Cocoon examples, which exclusively use this file extension. Also note that the uri parameter in <init-param> specifies the XML:DB root collection used by the servlet. To configure this parameter to have the servlet access a remote database, follow instructions provided in the deployment docs.

XQueryGenerator (Cocoon)

As with the servlet, the Cocoon generator reads and executes XQuery scripts. However, unlike the servlet, the generator passes the results to a Cocoon pipeline for further processing. Furthermore, the XQueryGenerator has to be configured in the Cocoon sitemap (sitemap.xmap). The sitemap registers the generator and configures a pipeline to map resources for different web applications. For more information on configuring and using sitemaps, consult the documentation provided by Cocoon.The following is a basic sitemap:

Example: Cocoon Sitemap

<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">
    <map:components>
        <map:generators default="file">
            <map:generator name="xquery" 
                logger="sitemap.generator.xquery"
                src="org.exist.cocoon.XQueryGenerator"/>
        </map:generators>
        <map:readers default="resource"/>
        <map:serializers default="html"/>
        <map:selectors default="browser"/>
        <map:matchers default="wildcard"/>
        <map:transformers default="xslt">
        </map:transformers>
	</map:components>
    <map:pipelines>
        <map:pipeline>
            <map:match pattern="*.xq">
                <map:generate src="{1}.xq" type="xquery"/>
                <map:serialize encoding="UTF-8" type="html"/>
            </map:match>
        </map:pipeline>
    </map:pipelines>
</map:sitemap>

According to the above pipeline definition, any path ending with the .xq extension is matched and processed by the pipeline. The pipeline generates results using the XQueryGenerator defined as type xquery in <map:components> .

2.2. Example XQuery Script

A script example provided with eXist offers a simple number guessing game that illustrates many of the features of XQuery. The source code for this game is found in webapp/xquery/guess.xql, and can be viewed locally at http://localhost:8080/exist/examples.xml. As the file extension indicates, this particular script is processed by the XQueryServlet. The full script is as follows:

Example: Guess a Number

xquery version "1.0";

declare namespace request="http://exist-db.org/xquery/request";
declare namespace math="java:java.lang.Math";

declare function local:random($max as xs:integer) 
as empty()
{
    let $r := ceiling(math:random() * $max) cast as xs:integer
    return
        request:set-session-attribute("random", $r)
};

declare function local:guess($guess as xs:integer,
$rand as xs:integer) as element()
{
    if ($guess lt $rand) then
        <p>Your number is too small!</p>
    else if ($guess gt $rand) then
        <p>Your number is too large!</p>
    else (
        local:random(100),
        <p>Congratulations! You guessed the right number.
        Try again!</p> 
    )
};

declare function local:main() as node()?
{
    request:create-session(),
    let $rand := request:get-session-attribute("random"),
        $guess := request:request-parameter("guess", ())
    return
        if ($rand) then local:guess($guess, $rand)
        else local:random(100)
};

<html>
    <head><title>Number Guessing</title></head>
    <body>
        <form action="{request:encode-url(request:request-uri())}">
            <table border="0">
                <tr>
                    <th colspan="2">
                        Guess a number
                    </th>
                </tr>
                <tr>
                    <td>Number:</td>
                    <td><input type="text" name="guess"
                        size="3"/></td>
                </tr>
                <tr>
                    <td colspan="2" align="left">
                        <input type="submit"/>
                    </td>
                </tr>
            </table> 
        </form>
        { local:main() }
    </body>
</html>

In this example, a random number is generated using the local function local:random, which uses the Java binding to call the static method Math.random. (For information on the Java binding with eXist, check the XQuery Docs.) The generated number is stored as an HTTP session variable (attribute). The user is then asked to guess its value. If the user submits a guess, it is read from the HTTP request parameter and compared to the stored number. If the numbers match, a new random is generated and the game starts again.

Notice that you do not have to import the request module for handling HTTP parameter, but can simply declare its namespace (i.e. http://exist-db.org/xquery/request) in the document. The module, if it is available, is loaded automatically by the XQuery engine. How do the functions in this module access the HTTP request and session objects? The XQueryServlet and the XQueryGenerator both export a number of global variables to the XQuery script: $request, $response and $session. These variables store the corresponding HTTP objects as passed by the servlet engine (or Cocoon), which are accessed by the functions in the library module.

Both XQueryServlet and the XQueryGenerator provide initialization parameters to set the username and password used for requests. However, the code will also check if the current HTTP session contains the session attributes user and password. If so, the session settings will overwrite any previous settings. For more information on changing user identities, see the Session Example.

3. REST-Style Web API

eXist provides a REST-style (or RESTful) API through HTTP, which provides the simplest and quickest way to access the database. To implement this API, all one needs is an HTTP client, which is provided by nearly all programming languages and environments. However, not all of the database features are available using this approach.

When running eXist as a stand-alone server - i.e. when the database has been started using the shell-script bin/server.sh (Unix) or batch file bin/server.bat (Windows/DOS) - HTTP access is supported through a simple, built-in web server. This web server however has limited capabilities restricted to the basic operations defined by eXist's REST API (e.g. GET, POST, PUT and , DELETE).

When running in a servlet-context, this same server functionality is provided by the EXistServlet. In the standard eXist distribution, this servlet is configured to have a listen address at:

http://localhost:8080/exist/rest/

Both the stand-alone server and the servlet rely on Java class org.exist.http.RESTServer to do the actual work.

The server treats all HTTP request paths as paths to a database collection, i.e. all resources are read from the database instead of the file system. Relative paths are therefore resolved relative to the database root collection. For example, if you enter the following URL into your web-browser:

http://localhost:8080/exist/rest/db/shakespeare/plays/hamlet.xml

the server will receive an HTTP GET request for the resource hamlet.xml in the collection /db/shakespeare/plays in the database. The server will look for this collection, and check if the resource is available, and if so, retrieve its contents and send them back to the client. If the document does not exist, an HTTP 404 (Not Found) status response will be returned.

To keep the interface simple, the basic database operations are directly mapped to HTTP request methods wherever possible. The following request methods are supported:

GET

Retrieves a representation of the resource or collection from the database. XQuery and XPath queries may also be specified using GET's optional parameters applied to the selected resource.

PUT

Uploads a resource onto the database. If required, collections are automatically created, and existing resources are overwritten.

DELETE

Removes a resource (document or collection) from the database.

POST

Submits data in the form of an XML fragment in the content of the request which specifies the action to take. The fragment can be either an XUpdate document or a query request. Query requests are used to pass complex XQuery expressions too large to be URL-encoded.

3.1. HTTP Authentication

The REST server and servlet support basic HTTP authentication, and only valid users can access the database. If no username and password are specified, the server assumes a "guest" user identity, which has limited capabilities. If the username submitted is not known, or an incorrect password is submitted, an error page (Status code 403 - Forbidden) is returned.

3.2. GET Requests

If the server receives an HTTP GET request, it first tries to locate known parameters. If no parameters are given or known, it will try to locate the collection or document specified in the URI database path, and return a representation of this resource the client. Note that when the located resource is XML, the returned content-type attribute value will be text/xml, and for binary resources application/octet-stream.

If the path resolves to a database collection, the retrieved results are returned as an XML fragment. An example fragment is shown below:

Example: XML Results for GET Request for a Collection

                        

<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist">
    <exist:collection name="/db/xinclude" owner="guest" group="guest" 
        permissions="rwur-ur-u">
        <exist:resource name="disclaimer.xml" owner="guest" group="guest" 
            permissions="rwur-ur--"/>
        <exist:resource name="sidebar.xml" owner="guest" group="guest" 
            permissions="rwur-ur--"/>
        <exist:resource name="xinclude.xml" owner="guest" group="guest" 
            permissions="rwur-ur--"/>
    </exist:collection>
</exist:result>
                        

If an xml-stylesheet processing instruction is found in an XML document being requested, the database will try to apply the stylesheet before returning the document. Note that in this case, any relative path in a hypertext link will be resolved relative to the location of the source document. For example, if the document hamlet.xml, which is stored in collection /db/shakespeare/plays contains the XSLT processing instruction:

<?xml-stylesheet type="text/xml" href="shakes.xsl"?>

then the database will try to load the stylesheet from /db/shakespeare/plays/shakes.xsl and apply it to the document.

Optionally, GET accepts the following request parameters, which must be URL-encoded:

_xsl=XSL Stylesheet

Applies an XSL stylesheet to the requested resource. If the _xsl parameter contains an external URI, the corresponding external resource is retrieved. Otherwise, the path is treated as relative to the database root collection and the stylesheet is loaded from the database. This option will override any XSL stylesheet processing instructions found in the source XML file.

Setting _xsl to no disables any stylesheet processing. This is useful for retrieving the unprocessed XML from documents that have a stylesheet declaration.

_query=XPath/XQuery Expression

Executes a query specified by the request. The collection or resource referenced in the request path is added to the set of statically known documents for the query.

_indent=yes | no

Returns indented pretty-print XML.

_encoding=Character Encoding Type

Sets the character encoding for the resultant XML.

_howmany=Number of Items

Specifies the number of items to return from the resultant sequence.

_start=Starting Position in Sequence

Specifies the index position of the first item in the result sequence to be returned

_wrap=yes | no

Specifies whether the returned query results are to be wrapped into a surrounding <exist:result> element. The default value is yes.

EXAMPLE: The following URI will find all <SPEECH> elements in the collection /db/shakespeare with "Juliet" as the <SPEAKER> . As specified, it will return ten items from the result sequence, starting at position 1:

http://localhost:8080/exist/rest/db/shakespeare?_query=//SPEECH[SPEAKER=%22JULIET%22]&_start=1&_howmany=10

3.3. PUT Requests

Documents can be stored or updated using an HTTP PUT request. The request URI points to the location where the document will be stored. As defined by the HTTP specifications, an existing document at the specified path will be updated, i.e. removed, before storing the new resource. As well, any collections defined in the path that do not exist will be created automatically.

For example, the following Python script stores a document (the name of which is specified on the command-line) in the database collection /db/test, which will be created if this collection does not exist. Note that the HTTP header field content-type is specified as text/xml, since otherwise the document is stored as a binary resource.

Example: PUT Example using Python (See: samples/http/put.py)

import httplib
import sys
from string import rfind

collection = sys.argv[1]
file = sys.argv[2]

f = open(file, 'r')
print "reading file %s ..." % file
xml = f.read()
f.close()

p = rfind(file, '/')
if p > -1:
    doc = file[p+1:]
else:
    doc = file
print doc
print "storing document to collection %s ..." % collection
con = httplib.HTTP('localhost:8080')
con.putrequest('PUT', '/exist/rest/%s/%s' % (collection, doc))
con.putheader('Content-Type', 'text/xml')
clen = len(xml)
con.putheader('Content-Length', `clen`)
con.endheaders()
con.send(xml)

errcode, errmsg, headers = con.getreply()

if errcode != 200:
    f = con.getfile()
    print 'An error occurred: %s' % errmsg
    f.close()
else:
    print "Ok."

3.4. DELETE Requests

DELETE removes a collection or resource from the database. For this, the server first checks if the request path points to an existing database collection or resource, and once found, removes it.

3.5. POST Requests

POST requests require an XML fragment in the content of the request, which specifies the action to take.

If the root node of the fragment uses the XUpdate namespace (http://www.xmldb.org/xupdate), the fragment is sent to the XUpdateProcessor to be processed. Otherwise, the root node will have the namespace for eXist requests (http://exist.sourceforge.net/NS/exist), in which case the fragment is interpreted as an extended query request. Extended query requests can be used to post complex XQuery scripts that are too large to be encoded in a GET request.

The structure of the POST XML request is as follows:

Example: Extended Query Request

<query xmlns="http://exist.sourceforge.net/NS/exist"
    start="[first item to be returned]" 
    max="[maximum number of items to be returned]">
    <text>[XQuery expression]</text>
    <properties>
        <property name="[name1]" value="[value1]"/>
    </properties>
</query>

The root element query identifies the fragment as an extended query request, and the XQuery expression for this request is enclosed in the text element. Optional output properties, such as pretty-print, may be passed in the properties element. An example of POST for Perl is provided below:

Example: POST Example using Perl (See: samples/http/search.pl)

require LWP::UserAgent;

$URL = 'http://localhost:8080/exist/rest/db/';
$QUERY = <<END;
<?xml version="1.0" encoding="UTF-8"?>
<query xmlns="http://exist.sourceforge.net/NS/exist"
    start="1" max="20">
    <text>
        for \$speech in //SPEECH[LINE &= 'corrupt*']
        order by \$speech/SPEAKER[1]
        return
            <hit>{\$speech}</hit>
    </text>
    <properties>
        <property name="indent" value="yes"/>
    </properties>
</query>
END

$ua = LWP::UserAgent->new();
$req = HTTP::Request->new(POST => $URL);
$req->content_type('text/xml');
$req->content($QUERY);

$res = $ua->request($req);
if($res->is_success) {
    print $res->content . "\n";
} else {
    print "Error:\n\n" . $res->status_line . "\n";
}

Note

Please note that you may have to enclose the XQuery expression in a CDATA section (i.e. <![CDATA[ ... ]]>) to avoid parsing errors (this is not shown above).

The returned query results are enclosed in the <exist:result> element, which are shown below for the above example:

Example: Returned Results for POST Request


<exist:result xmlns:exist="http://exist.sourceforge.net/NS/exist" hits="2628" start="1" count="10">
<SPEECH xmlns:exist="http://exist.sourceforge.net/NS/exist">
<SPEAKER>BERNARDO</SPEAKER>
<LINE>Who's there?</LINE>
</SPEECH>
... more items follow ...
</exist:result>

4. Writing Java Applications with the XML:DB API

The preferred way to work with eXist when developing Java applications is to use the XML:DB API. This API provides a common interface to native or XML-enabled databases and supports the development of portable, reusable applications. eXist's implementation of the XML:DB standards follows the Xindice implementation, and conforms to the latest working drafts put forth by the XML:DB Initiative. For more information, refer to the Javadocs for this API.

The basic components employed by the XML:DB API are drivers, collections, resources and services.

Drivers are implementations of the database interface that encapsulate the database access logic for specific XML database products. They are provided by the product vendor and must be registered with the database manager.

A collection is a hierarchical container for resources and further sub-collections. Currently two different resources are defined by the API: XMLResource and BinaryResource. An XMLResource represents an XML document or a document fragment, selected by a previously executed XPath query.

Finally, services are requested for special tasks such as querying a collection with XPath, or managing a collection.

Note

There are several XML:DB examples provided in eXist's samples directory . To start an example, use the start.jar jar file and pass the name of the example class as the first parameter, for instance:

java -jar start.jar
org.exist.examples.xmldb.Retrieve [- other options]

Programming with the XML:DB API is straightforward. You will find some code examples in the samples/org/exist/examples/xmldb directory. In the following simple example, a document can be retrieved from the eXist server and printed to standard output.

Example: Retrieving a Document with XML:DB

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;
import javax.xml.transform.OutputKeys;

public class RetrieveExample {
    protected static String URI = "xmldb:exist://localhost:8080/exist/xmlrpc";

    public static void main(String args[]) throws Exception {
        String driver = "org.exist.xmldb.DatabaseImpl";
        
        // initialize database driver
        Class cl = Class.forName(driver);
        Database database = (Database) cl.newInstance();
        DatabaseManager.registerDatabase(database);

        // get the collection
        Collection col = DatabaseManager.getCollection(URI + args[0]);
        col.setProperty(OutputKeys.INDENT, "no");
        XMLResource res = (XMLResource)col.getResource(args[1]);
        if(res == null)
            System.out.println("document not found!");
        else
            System.out.println(res.getContent());   }
}
        

With this example, the database driver class for eXist (org.exist.xmldb.DatabaseImpl) is first registered with the DatabaseManager. Next we obtain a Collection object from the database manager by calling the static method DatabaseManger.getCollection(). The method expects a fully qualified URI for its parameter value, which identifies the desired collection. The format of this URI should look like the following:

xmldb:[DATABASE-ID]://[HOST-ADDRESS]/db/collection

Because more than one database driver can be registered with the database manager, the first part of the URI (xmldb:exist) is required to determine which driver class to use. The database-id is used by the database manager to select the correct driver from its list of available drivers. To use eXist, this ID should always be "exist" (unless you have set up multiple database instances; additional instances may have other names).

The final part of the URI identifies the collection path, and optionally the host address of the database server on the network. Internally, eXist uses two different driver implementations: The first talks to a remote database engine using XML-RPC calls, the second has direct access to a local instance of eXist. The root collection is always identified by /db. For example, the URI

xmldb:exist://localhost:8080/exist/xmlrpc/db/shakespeare/plays

references the Shakespeare collection on a remote server running the XML-RPC interface as a servlet at localhost:8080/exist/xmlrpc. If we leave out the host address, the XML:DB driver will try to connect to a locally attached database instance, e.g.:

xmldb:exist:///db/shakespeare/plays

In this case, we have to tell the XML:DB driver that it should create a new database instance if none has been started. This is done by setting the create-database property of class Database to "true" (more information on embedded use of eXist can be found in the deployment guide.

The setProperty calls are used to set database-specific parameters. In this case, pretty-printing of XML output is turned on for the collection. eXist uses the property keys defined in the standard Java package javax.xml.transform. Thus, in Java you can simply use class OutputKeys to get the correct keys.

Calling col.getResource() finally retrieves the document, which is returned as an XMLResource. All resources have a method getContent(), which returns the resource's content, depending on it's type. In this case we retrieve the content as type String.

To query the repository, we may either use the standard XPathQueryService or eXist's XQueryService class. The XML:DB API defines different kinds of services, which may or may not be provided by the database. The getService method of class Collection calls a service if it is available. The method expects the service name as the first parameter, and the version (as a string) as the second, which is used to distinguish between different versions of the service defined by the XML:DB API.

The following is an example of using the XML:DB API to execute a database query:

Example: Querying the Database (XML:DB API)

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;

public class QueryExample {
    public static void main(String args[]) throws Exception {
        String driver = "org.exist.xmldb.DatabaseImpl";
        Class cl = Class.forName(driver);			
        Database database = (Database)cl.newInstance();
        DatabaseManager.registerDatabase(database);
        
        Collection col = 
            DatabaseManager.getCollection(
                "xmldb:exist://localhost:8080/exist/xmlrpc/db"
            );
        XPathQueryService service =
            (XPathQueryService) col.getService("XPathQueryService", "1.0");
        service.setProperty("indent", "yes");
                
        ResourceSet result = service.query(args[0]);
        ResourceIterator i = result.getIterator();
        while(i.hasMoreResources()) {
            Resource r = i.nextResource();
            System.out.println((String)r.getContent());
        }
    }
}
        

To execute the query, method service.query(xpath) is called. This method returns a ResourceSet, containing the Resources found by the query. ResourceSet.getIterator() gives us an iterator over these resources. Every Resource contains a single document fragment or value selected by the XPath expression.

Internally, eXist does not distinguish between XPath and XQuery expressions. XQueryService thus maps to the same implementation class as XPathQueryService. However, it provides a few additional methods. Most important, when talking to an embedded database, XQueryService allows for the XQuery expression to be compiled as an internal representation, which can then be reused. With compilation, the previous example code would look as follows:

Example: Compiling a Query (XML:DB API)

import org.xmldb.api.base.*;
import org.xmldb.api.modules.*;
import org.xmldb.api.*;
import org.exist.xmldb.XQueryService;

public class QueryExample {
    public static void main(String args[]) throws Exception {
        String driver = "org.exist.xmldb.DatabaseImpl";
        Class cl = Class.forName(driver);			
        Database database = (Database)cl.newInstance();
        database.setProperty("create-database", "true");
        DatabaseManager.registerDatabase(database);
        
        Collection col = 
            DatabaseManager.getCollection("xmldb:exist:///db");
        XQueryService service =
            (XQueryService) col.getService("XQueryService", "1.0");
        service.setProperty("indent", "yes");
        
        CompiledExpression compiled = service.compile(args[0]);
        ResourceSet result = service.execute(compiled);
        ResourceIterator i = result.getIterator();
        while(i.hasMoreResources()) {
            Resource r = i.nextResource();
            System.out.println((String)r.getContent());
        }
    }
}
        

The XML-RPC server automatically caches compiled expressions, and so calling compile through the remote driver produces no effect if the expression is already cached.

Next, we would like to store a new document into the repository. This is done by creating a new XMLResource, assigning it the content of the new document, and calling the storeResource method of class Collection. First, a new Resource is created by method Collection.createResource(), and expects two parameters: the id and type of resource being created. If the id-parameter is null, a unique resource-id will be automatically generated .

In some cases, the collection may not yet exist, and so we must create it. To create a new collection, call the createCollection method of the CollectionManagementService service. In the following example, we simply start at the root-collection object to get the CollectionManagementService service.

Example: Adding a File (XML:DB API)

public class StoreExample {
    public final static String URI = "xmldb:exist://localhost:8080/exist/xmlrpc";

    public static void main(String args[]) throws Exception {
        if(args.length < 2) {
            System.out.println("usage: StoreExample collection-path document");
            System.exit(1);
        }

        String collection = args[0], file = args[1];

        // initialize driver
        String driver = "org.exist.xmldb.DatabaseImpl";
        Class cl = Class.forName(driver);
        Database database = (Database)cl.newInstance();
        DatabaseManager.registerDatabase(database);

        // try to get collection
        Collection col =
            DatabaseManager.getCollection(URI + collection);
        if(col == null) {
            // collection does not exist: get root collection and create
            // for simplicity, we assume that the new collection is a
            // direct child of the root collection, e.g. /db/test.
            // the example will fail otherwise.
            Collection root = DatabaseManager.getCollection(URI + "/db");
            CollectionManagementService mgtService = (CollectionManagementService)
                root.getService("CollectionManagementService", "1.0");
            col = mgtService.createCollection(collection.substring("/db".length()));
        }
        // create new XMLResource; an id will be assigned to the new resource
        XMLResource document = (XMLResource)col.createResource(null, "XMLResource");
        File f = new File(file);
        if(!f.canRead()) {
            System.out.println("cannot read file " + file);
            return;
        }
        document.setContent(f);
        System.out.print("storing document " + document.getId() + "...");
        col.storeResource(document);
        System.out.println("ok.");
    }
}

Please note that the XMLResource.setContent() method takes a Java object as its parameter. The eXist driver checks if the object is a File. Otherwise, the object is transformed into a String by calling the object's toString() method. Passing a File has one big advantage: If the database is running in the embedded mode, the file will be directly passed to the indexer. Thus, the file's content does not have to be loaded into the main memory. This is handy if your files are very large.

5. Extensions to XML:DB

5.1. Additional Services

eXist provides several services in addition to those defined by the XML:DB specification:

The UserManagementService service contains methods to manage users and handle permissions. These methods resemble common Unix commands such as chown or chmod. As with other services, UserManagementService can be retrieved from a collection object, as in:

UserManagementService service =
(UserManagementService)collection.getService("UserManagementService", "1.0");

Another service called DatabaseInstanceManager, provides a single method to shut down the database instance accessed by the driver. You have to be a member of the dba user group to use this method or an exception will be thrown. See the Deployment Guide for an example.

Finally, interface IndexQueryService supports access to the terms and elements contained in eXist's internal index. Method getIndexedElements() returns a list of element occurrences for the current collection. For each occurring element, the element's name and a frequency count is returned.

Method scanIndexTerms() allows for a retrieval of the list of occurring words for the current collection. This might be useful, for example, to provide users a list of searchable terms together with their frequency.

5.2. Multiple Database Instances

As explained above, passing a local XML:DB URI to the DatabaseManager means that the driver will try to start or access an embedded database instance. You can configure more than one database instance by setting the location of the central configuration file. The configuration file is set through the configuration property of the DatabaseImpl driver class. If you would like to use different drivers for different database instances, specify a name for the created instance through the database-id property. You may later use this name in the URI to refer to a database instance. The following fragment sets up two instances:

Example: Multiple Database Instances

// initialize driver
String driver = "org.exist.xmldb.DatabaseImpl";
Class cl = Class.forName(driver);			
Database database1 = (Database)cl.newInstance();
database1.setProperty("create-database", "true");
database1.setProperty("configuration", "/home/exist/test/conf.xml");
database1.setProperty("database-id", "test");
DatabaseManager.registerDatabase(database1);

Database database2 = (Database)cl.newInstance();
database2.setProperty("create-database", "true");
database2.setProperty("configuration", "/home/exist/production/conf.xml");
database2.setProperty("database-id", "exist");
DatabaseManager.registerDatabase(database1);

With the above example, the URI

xmldb:test:///db

selects the test database instance. Both instances should have their own data and log directory as specified in the configuration files.

6. XMLDBTransformer for Cocoon

eXist offers several ways to access the database from Cocoon-based applications. This includes access via the XMLDB pseudo-protocol, through XSP pages, and through the XMLDBTransformer. The XMLDBTransformer provides a simple way to query the database, and works in a similar way to other transformers supplied with Cocoon. Consult the Cocoon documentation for more on using Transformers and about their basic concepts.

As with other transformers, the XMLDBTransformer listens for a limited set of tags that belong to the namespace http://exist-db/transformer/1.0. These are <collection> , <for-each> , <select-node> , <current-node> . To examine how they are used, let's consider the following example (Note that the complete version of this example can be found at webapp/examples/simple2.xml):

Example: XMLDBTransformer Example


<xdb:collection xmlns:xdb="http://exist-db.org/transformer/1.0"
uri="xdb:exist:///db">
<!-- iterate through all rdf:Description elements containing the
term "computer" -->
<xdb:for-each query="//rdf:Description[dc:title &amp;= 'computer']"
from="0" to="9" sort-by="/dc:title">
<!-- output a book element for each entry -->
<book>
<!-- extract the title. There's only one title, so we use
select-node -->
<title><xdb:select-node query="dc:title/text()"/></title>
<!-- extract the creators. There's probably more than one,
so we use a nested for-each -->
<xdb:for-each query="dc:creator/text()">
<creator><xdb:current-node/></creator>
</xdb:for-each>
</book>
</xdb:for-each>
</xdb:collection>

As we can see above, before you can start to query the database, you must specify a collection in the <collection> element, which accepts a standard XMLDB URI in its uri attribute. To process a query, you may either use the <for-each> , or the <select-node> tag. The difference is the following:

The <current-node> element is used to return the current node being processed in a for-each iteration to the output document. You can restrict the number of for-each iterations by specifying the bounds set by the from and to attributes. The sort-by attribute is still experimental: the query results will be sorted by an XPath expression. For each of the results, the XPath expression is evaluated and the resulting string value is used to sort the query results in ascending order.

As shown above, it is possible to nest multiple for-each or select-node tags. The nested tag will be evaluated relative to the current result node. In the example above, the main for-each statement selects all <rdf:Description> fragments whose title contains the term "computer". During each iteration, we further process the current result fragment by using nested <for-each> and <select-node> tags to select the title and creators.

Notice that the same result could be achieved by an XSLT stylesheet. However, if the selected fragments are rather large, post-processing with XSLT can be much slower, since each fragment has to be serialized and then parsed by the XSLT processor.

The results of the XMLDBTransformer query are enclosed in the element <result-set> . Attributes for this tag include the number of hits for the query, the XPath query processed, the query time (in milliseconds), and the start and end position of the retrieved records in the result set. The output of the XMLDBTransformer for the above fragment is shown below:

Example: XMLDBTransformer Output


<xdb:result-set count="72" xpath="//rdf:Description[dc:title &= 'computer']"
query-time="370" from="0" to="9">
<book xdb:document-id="zit.rdf" xdb:collection="/db/library">
<title> A Centennial History of the American Society of Mechanical Engineers 1880-1980 </title>
<creator xdb:document-id="zit.rdf" xdb:collection="/db/library"> Sinclair, Bruce </creator>
</book>
<!-- more books here ... -->
</xdb:result-set>

7. XML:DB Logicsheet for Cocoon

Cocoon offers a powerful mechanism called XSP (eXtensible Server Pages) to write dynamic XML-based web pages. Similar to JSP, XSP embeds Java code in the XML pages. However, embedding large sections of Java code in an XML document is usually considered poor programming form. To support the separation of content and programming logic, XSP allows us to put reusable code into "logicsheets", which correspond to the tag libraries found in JSP. A logicsheet helps to minimize the amount of Java code used inside an XSP page.

Version 0.8 of eXist includes a logicsheet based on the XML:DB API, which defines tags for all important tasks. While it is possible to write all of the XML:DB related code by hand, these predefined tags make the XML file more readable and helps users without Java experience to understand the process involved.

An overview of the available XSP tags is available with the stylesheet documentation (generated using xsldoc). In the following simple XSP example, a document is retrieved and displayed:

Example: Simple XSP Page (example1.xsp)

<xsp:page xmlns:xsp="http://apache.org/xsp"
          xmlns:xdb="http://exist-db.org/xmldb/1.0"
>
<document>
    <body>
        <section title="View document">
            
        <p>Retrieving document <xsp:expr>request.getParameter("doc")</xsp:expr></p>
        
        <xdb:collection uri="xdb:exist:///db/shakespeare/plays">
            <xml-source>
                <xdb:get-document encoding="ISO-8859-1" as="xml">
                     <xdb:name>request.getParameter("doc")</xdb:name>
                </xdb:get-document>
            </xml-source>
        </xdb:collection>
        </section>
    </body>
</document>
</xsp:page>

The Cocoon version included with eXist is already configured to recognize the xmldb namespace and associate it with the XML:DB logicsheet. The logicsheet is defined in src/org/exist/xmldb.xsl. To use the logicsheet from our page we just declare the xmldb namespace (i.e. xmlns:xdb="http://exist-db.org/xmldb/1.0").

The above sample code retrieves a document from the collection /db/shakespeare/plays. The name of the document is passed in the HTTP request parameter doc.

To post-process the retrieved XML data, we set the attribute as to "xml". This indicates that the resource should be fed into the current Cocoon processing stream. To include the data as a string value, you may specify as="string". As a result, all XML markup characters will be escaped.

Please note that the parameters of the logicsheet tags may be specified either as an attribute of an element or as a child element. If you specify a parameter as a child element, its content will be interpreted as a Java expression. Literal values should be set via an attribute. For example, the xpath parameter is specified as a Java expression, it is thus embedded in an <xdb:xpath> element.

Finally, in order to tell Cocoon how to process this page, we have to add a new <map:match> pattern to the sitemap - for example:

Example: Cocoon Sitemap Snippet (XSP)

<map:match pattern="test.xsp">
    <map:generate type="serverpages" src="test.xsp"/>
    <map:transform src="stylesheets/doc2html-2.xsl"/>
    <map:serialize type="xhtml"/>
</map:match>

The next example shows how to query the database:

Example: Querying the Database (example2.xsp)

<xsp:page xmlns:xsp="http://apache.org/xsp"
          xmlns:xdb="http://exist-db.org/xmldb/1.0"
>
    <html>
        <body>
            <h1>Find books by title</h1>
            <xdb:collection uri="xdb:exist:///db">
                <xdb:execute>
                    <xdb:xpath>
                        "document()//rdf:Description[dc:title" +
                        "&amp;='" + request.getParameter("title") + "']"
                    </xdb:xpath>
                    <p>Found <xdb:get-hit-count/> hits.</p>
                    
                    <xdb:results>
                        <pre>
                            <xdb:get-xml as="string"/>
                        </pre>
                    </xdb:results>
                </xdb:execute>
            </xdb:collection>
        </body>
    </html>
</xsp:page>

This XSP example page takes the HTTP request parameter title as its input and creates an XPath expression that finds all <df:Description> elements having a <dc:title> element containing the keywords entered by the user. As required by the XML:DB API, any action has to be enclosed in an <xdb:collection> element. The query is specified in the <xdb:xpath> element using a Java expression, which inserts the value of the request parameter title into the XPath query string.

The <xdb:results> element will iterate through the generated result set, inserting each resource into the page by calling <xdb:get-xml> . In this case, <xdb:get-xml> inserts the resource contents as a string, which means that all XML markup is escaped.

8. Using the XML-RPC API

XML-RPC (XML Remote Procedural Call) provides a simple way to call remote procedures from a wide variety of programming languages. eXist's XML-RPC API makes it easy to access eXist from other applications, CGI scripts, PHP, JSP and more. For more information on XML-RPC see www.xmlrpc.org. For the Java server, eXist uses the XML-RPC library created by Hannes Wallnoefer which recently has moved to Apache (see: http://xml.apache.org/xmlrpc). Perl examples use the RPC::XML package, which should be available at every CPAN mirror (see CPAN).

The following is a small example, which shows how to talk to eXist from Java using the Apache XML-RPC library. This example can be found in samples/org/exist/examples/xmldb/Retrieve.java.

Example: Retrieving a document from eXist

 
public class Retrieve {

protected final static String uri = 
    "http://localhost:8080/exist/xmlrpc";

protected static void usage() {
    System.out.println( "usage: org.exist.examples.xmlrpc.Retrieve " +
        "path-to-document" );
    System.exit( 0 );
}

public static void main( String args[] ) throws Exception {
    if ( args.length < 1 ) {
        usage();
    }
    XmlRpc.setEncoding("UTF-8");
    XmlRpcClient xmlrpc = new XmlRpcClient( uri );
    Hashtable options = new Hashtable();
    options.put("indent", "yes");
    options.put("encoding", "UTF-8");
    options.put("expand-xincludes", "yes");
    options.put("highlight-matches", "elements");
    
    Vector params = new Vector();
    params.addElement( args[0] ); 
    params.addElement( options );
    String xml = (String)
        xmlrpc.execute( "getDocumentAsString", params );
    System.out.println( xml );
}
}

As shown above, the execute method of XmlRpcClient expects as its parameters a method (passed as a string) to call on the server and a Vector of parameters to pass to this executed method. In this example, the method getDocumentAsString is called as the first parameter, and a Vector params. Various output properties can also be set through the hashtable argument (see the method description below). Since all parameters are passed in a Vector, they are necessarily Java objects.

XML-RPC messages (requests and responses sent between the server and client) are themselves XML documents. In some cases, these documents may use a character encoding which is in conflict with the encoding of the document we would like to receive. It is thus important to set the transport encoding to UTF-8 as shown in the above example. However, conflicts may persist depending on which client library is used. To avoid such conflicts, eXist provides alternative declarations for selected methods, which expect string parameters as byte arrays. The XML-RPC library will send them as binary data (using Base64 encoding for transport). With this approach, document encodings are preserved regardless of the character encoding used by the XML-RPC transport layer.

Note

Please note that the XML-RPC API uses int to encode booleans. This is because some clients do not correctly pass boolean parameters.

Querying is as easy using XML-RPC. The following example:

Example: Sending a Query to eXist (XML-RPC)

#!/usr/bin/perl
use RPC::XML;
use RPC::XML::Client;

$query = <<END;
for \$speech in //SPEECH[LINE &= 'tear*']
order by \$speech/SPEAKER[1]
return
    \$speech
END

$URL = "http://guest:guest\@localhost:8080/exist/xmlrpc";
print "connecting to $URL...\n";
$client = new RPC::XML::Client $URL;

# Output options
$options = RPC::XML::struct->new(
    'indent' => 'yes', 
    'encoding' => 'UTF-8',
    'highlight-matches' => 'none');

$req = RPC::XML::request->new("query", $query, 20, 1, $options);
$response = $client->send_request($req);
if($response->is_fault) {
    die "An error occurred: " . $response->string . "\n";
}
print $response->value;

You will find the source code of this example in samples/xmlrpc/search2.pl. It uses the simple query method, which executes the query and returns a document containing the specified number of results. However, the result set is not cached on the server.

The following example calls the executeQuery method, which returns a unique session id. In this case, the actual results are cached on the server and can be retrieved using the retrieve method.

Example: Another Query Examplet (XML-RPC)

use RPC::XML;
#!/usr/bin/perl

use RPC::XML;
use RPC::XML::Client;

# Execute an XQuery through XML-RPC. The query is passed
# to the "executeQuery" method, which returns a handle to
# the created result set. The handle can then be used to
# retrieve results.

$query = <<END;
for \$speech in //SPEECH[LINE &= 'corrupt*']
order by \$speech/SPEAKER[1]
return
    \$speech
END

$URL = "http://guest:guest\@localhost:8080/exist/xmlrpc";
print "connecting to $URL...\n";
$client = new RPC::XML::Client $URL;

# Execute the query. The method call returns a handle
# to the created result set.
$req = RPC::XML::request->new("executeQuery", 
    RPC::XML::base64->new($query), 
	"UTF-8");
$resp = process($req);
$result_id = $resp->value;

# Get the number of hits in the result set
$req = RPC::XML::request->new("getHits", $result_id);
$resp = process($req);
$hits = $resp->value;
print "Found $hits hits.\n";

# Output options
$options = RPC::XML::struct->new(
    'indent' => 'no', 
    'encoding' => 'UTF-8');
# Retrieve query results 1 to 10
for($i = 1; $i < 10 && $i < $hits; $i++) {
    $req = RPC::XML::request->new("retrieve", $result_id, $i, $options);
    $resp = process($req);
    print $resp->value . "\n";
}

# Send the request and check for errors
sub process {
    my($request) = @_;
    $response = $client->send_request($request);
    if($response->is_fault) {
        die "An error occurred: " . $response->string . "\n";
    }
    return $response;
}

9. XML-RPC: Available Methods

This section gives you an overview of the methods implemented by the eXist XML-RPC server. Only the most common methods are presented here. For a complete list see the Java interface RpcAPI.java. Note that the method signatures are presented below using Java data types. Also note that some methods like getDocument() and retrieve() accept a struct to specify optional output properties.

In general, the following optional fields for methods are supported:

indent

Returns indented pretty-print XML. [yes | no]

encoding

Specifies the character encoding used for the output. If the method returns a string, only the XML declaration will be modified accordingly.

omit-xml-declaration

Add XML declaration to the head of the document. [yes | no]

expand-xincludes

Expand XInclude elements. [yes | no]

process-xsl-pi

Specifying "yes": XSL processing instructions in the document will be processed and the corresponding stylesheet applied to the output. [yes | no]

highlight-matches

Database adds special tags to highlight the strings in the text that have triggered a fulltext match. Set to "elements" to highlight matches in element values, "attributes" for attribute values or "both" for both elements and attributes.

stylesheet

Use this parameter to specify an XSL stylesheet which should be applied to the output. If the parameter contains a relative path, the stylesheet will be loaded from the database.

stylesheet-param.key1 ... stylesheet-param.key2

If a stylesheet has been specified with stylesheet, you can also pass it parameters. Stylesheet parameters are recognized if they start with the prefix stylesheet-param., followed by the name of the parameter. The leading "stylesheet-param." string will be removed before the parameter is passed to the stylesheet.

9.1. Retrieving documents

9.2. Storing Documents

9.3. Creating a Collection

9.4. Removing Documents or Collections

9.5. Querying

9.6. Retrieving Information on Collections and Documents

9.7. XUpdate

9.8. Managing Users and Permissions

9.9. Access to the Index Contents

The following methods provide access to eXist's internal index structure.

9.10. Other Methods

10. SOAP

Beginning with version 0.8, eXist provides a SOAP interface as an alternative to XML-RPC. Programming with SOAP is slightly more convenient than XML-RPC. While you have to write XML-RPC method calls by hand, most SOAP tools will automatically create the low-level code from a given WSDL service description. Also fewer methods are needed to exploit the same functionality. On the other hand, SOAP toolkits tend to be complex.

eXist uses the Axis SOAP toolkit from Apache, which runs as a servlet. The Tomcat webserver shipped with eXist has been configured to start Axis automatically, and will listen on port 8080: http://localhost:8080/exist/services. Note however that SOAP is not available in the stand-alone server.

The interface has been tested using various clients, including Perl (SOAP::Lite) and the Microsoft .NET framework. The client stubs needed to access the SOAP interface from Java have been automatically generated by Axis and are included in the distribution.

eXist provides two web services: one that contains methods to query the server and retrieve documents, and a second for storing and removing documents and collections. The first will by default listen on:

http://localhost:8080/exist/services/Query

while the second is available on:

http://localhost:8080/exist/services/Admin

Both services are described in the Java docs regarding their interfaces. Visit: org.exist.soap.Query and org.exist.soap.Admin for more information.

The following SOAP example (available at: samples/org/exist/examples/soap/GetDocument.java) demonstrates how to retrieve a document from the database:

Example: Retrieving a document (SOAP)

package org.exist.examples.soap;

import org.exist.soap.Query;
import org.exist.soap.QueryService;
import org.exist.soap.QueryServiceLocator;

public class GetDocument {

    public static void main( String[] args ) throws Exception {
        QueryService service = new QueryServiceLocator();
        Query query = service.getQuery();
		String session = query.connect("guest", "guest");
        
		byte[] data = query.getResourceData(session, 
			"/db/shakespeare/plays/hamlet.xml",
			true, false, false);
		System.out.println(new String(data, "UTF-8"));
		query.disconnect(session);
    }
}
        

In this example, the Query client stub class has been automatically generated by the WSDL service description, and has methods for each of the operations defined in WSDL. You will find the web service description file query.wsdl in directory src/org/exist/soap. You may also get the WSDL directly from the server by pointing your web browser to http://localhost:8080/exist/services/Query?WSDL.

To use the services provided, the client first has to establish a connection with the database. This is done by calling connect() with a valid user id and password. connect() returns a session id, which can then be passed to any subsequent method calls.

To retrieve a resource we simply call Query.getResource(). And to release the current session, the method Query.disconnect() is called. Otherwise the session will remain valid for at least 60 minutes.

11. Appendix: Required Libraries

eXist consists of three jar-files:

exist.jar

The core classes of eXist.

start.jar

The bootstrap loader used to startup the database and client applications. This library loads all other required jars.

exist-optional.jar

Optional components for eXist including Cocoon supported classes, SOAP interfaces, Ant tasks, and the HTTP request module for XQuery. This jar is only required if eXist is running in a Cocoon framework or when using Ant.

The lib directory contains three subdirectories:

  1. core

  2. optional

  3. endorsed

Since these jar-files are required by the database core, they reside in the lib/core directory. However, you will not need all of them if you intend to use eXist as an embedded database in your own application. At an absolute minimum, you must include the following jars:

xmldb.jar

Defines the common interfaces for the XML:DB API.

antlr.jar

The ANTLR parser generator used by the XQuery engine.

log4j.jar

Provides the logging facility.

commons-pool-x.x.jar

Provides various object pool implementations.

xml-commons-resolver-x.x.jar

A library for resolving XML external entities from the catalogue. files.

xmlrpc-x.x-patched.jar

XMLRPC protocol support. This library has been patched to handle the full unicode character range. NOTE: You should include this library even if you don't intend to connect to a remote database instance. The XML:DB driver references this library and most Java machines will show a runtime error if it is missing.

The other jars in lib/core are support libraries for the command-line client (i.e. excalibur-cli-x.x.jar, libreadline-java.jar, and jEdit-syntax.jar). These are not required to run your own application.

The jar files in lib/optional are only required for Cocoon (most of them are distributed with Cocoon) and Axis-SOAP.

The lib/endorsed directory furthermore plays a special role: the 1.4.x Java releases come with their own XML support libraries, including Xalan for XSLT processing, an XML parser, and the standard Java interfaces for SAX and DOM. Unfortunately, we have found that some features of eXist in combination with Cocoon will not work properly with the wrong version of Xalan (in particular, XSP pages occasionally fail to compile). To ensure that the correct versions are available, we have included these versions of Xerces and Xalan, plus the standard interfaces used by both of them.

You can use Java's endorsed library loading mechanism to ensure that the correct XML support libraries are loaded. Specifying the -Djava.endorsed.dirs=lib/endorsed system property on the Java command line will force the JVM to prefer any library it finds in the endorsed directory over its own system libraries. Copying the jars into $JAVA_HOME/jre/lib/endorsed will do the same thing. Note that the batch and shell scripts included with eXist all set the java.endorsed.dirs system property to point to lib/endorsed.