Module | Status | Priority | Test Coverage | Progress | Who |
---|---|---|---|---|---|
1. Document Storage | |||||
1.1. File size/complexity limits | Beta | Highest | Tested | 100% | |
The numbering scheme at the core of eXist 1.0 did limit the maximum size of a document to be stored within the database. The limit was not a question of the document size alone, but also depended on the overall number of nodes and how deep elements were nested within the document. The actual limit thus differed from document to document and was difficult to compute in advance. eXist 1.1 fixes this. |
|||||
1.2. Collection size/number of collections limit | Stable, but subject to redesign | High | No tests | 50% | |
Concerning collection management, there are two known problems:
The first problem could be solved by using placeholder (or proxy) objects for the real document during query processing. Most of the administrative metadata (creation-time etc.) is not required. As of May 2006, the storage location of the document metadata has already been separated from the collection store. However, eXist will still load all the document descriptors when the collection is accessed the first time. The switch to the DLN numbering scheme also results in smaller document objects. TODO: re-check memory consumption with DLN scheme. For the second issue, a better, more dynamic caching mechanism for collections needs to be implemented. |
|||||
1.3. Allow metadata to be associated with a document | Open | Avg | N/A | 0% | |
Metadata could include system properties like last-modification date or user-defined metadata. Preferably, metadata records should be ordinary XML documents. The format should not be restricted. |
|||||
2. Indexing | |||||
Currently, eXist supports these index types:
However, there will be other index types added in the future, for example:
In the current architecture, integration of new index types faces two major problems:
Preferably, it should be possible for the query engine to determine if an index can be used or not at compile time, i.e. during query analysis phase. To achieve this, index configuration probably needs to be simplified, so the query engine can make a decision on index usage before the query is actually evaluated. It might not be possible to keep the current, fine-grained configuration scheme, which allows indexes to depend on the presence of ancestor nodes. Some commercial (and highly efficient) xml DBMS allow users to define indexes only on a given QName, not a path expression. This topic needs further discussion in the community. The interfaces to the indexing system need to be redesigned to support the query engine in index selection. This includes, for example, statistical information about the frequency of index items. The redesign of the indexing system thus presents a necessary foundation for the query optimizer. |
|||||
2.1. Full text indexing | Stable, but subject to redesign | Avg | Tested | 75% | |
(Align with the XQuery Full-text specification) The interfaces to the indexing system need to be redesigned to support the query engine in index selection. This includes, for example, statistical information about the frequency of index items. The redesign of the indexing system thus presents a necessary foundation for the query optimizer. The current architecture is also too limited with respect to text analysis. The general-purpose tokenizer is not suitable for language-dependent analysis. Plans are to replace these classes by Lucene's analyzer. Lucene offers a pluggable architecture in which multiple analyzers can be combined. |
|||||
2.2. Range indexing | Stable | x | Partially tested | 100% | |
No remarks available. |
|||||
2.3. Combined range and qname index | Alpha | Avg | x | 50% | |
Works, but special functions need to be used. |
|||||
2.4. Indexes on xml:id | Stable, but subject to redesign | Avg | Tested | 90% | |
Currently stored in the structural index. Should be moved to the range index. |
|||||
2.5. N-gram | Open | High | x | 0% | |
When dealing with texts in many non-European languages, the token-based full-text index produces insufficient results. Tokenization is currently based on Unicode code points. Most chinese characters, for example, are thus stored as single tokens. Users have to abuse the near() or phrase() function to search for character sequences consisting of more than one character, which is quite slow. It also means that real proximity searches are not available. An N-gram based index would be much more suitable for these languages. It would also allow additional functionality to be implemented, e.g. to deal with varying spellings. The main question is how the N-gram index would integrate conceptually with the existing full-text functions. |
|||||
2.6. Integration of other index types (e.g. Spatial indexes, external indexes) | Open | Avg | N/A | 0% | |
For some types of data, e.g.. spatial information, specialized indexes might be provided by other systems. |
|||||
2.7. Index-support for order-by, distinct-values | Open | Avg | N/A | 0% | |
Order-by expressions and other functions that need to access atomized nodes are not supported by indexes. |
|||||
3. Transactions and Recovery | |||||
The journal log and the recovery manager should be stable and are covered by extensive tests. However, recovery failures can not be excluded entirely. The tests can't reproduce every possible real-world scenario. However, some steps remain for eXist to become a fully transactional database system. Transaction support is currently limited to the functionality needed for crash recovery. Though we maintain transactions internally, they are currently not exposed to applications. Also, read operations are not transactional right now. In order to allow user-defined ACID transactions with support for rollback, all index files would need to be protected by the journaling log. The required functionality is basically available, but the feature is currently not regarded as high-priority. |
|||||
3.1. Journal log | Stable | x | Tested | 100% | |
No remarks available. |
|||||
3.2. Recovery | Stable | x | Tested | 100% | |
No remarks available. |
|||||
3.3. Internal transaction management | Stable | x | Tested | 100% | |
Transactions are maintained internally, but they are not exposed to applications. eXist does not yet support full ACID transactions. Read-only operations bypass the transaction system. |
|||||
3.4. User-definable transactions | Open | Low | N/A | 0% | |
Journal logs are limited to critical data required for recovery. No transaction rollbacks. |
|||||
4. Backup / Restore | Stable | x | No tests | 100% | |
No remarks available. |
|||||
5. Node-level updates | |||||
5.1. XUpdate | Stable | x | Tested | 100% | |
No remarks available. |
|||||
5.2. XQuery Update Extensions | Stable, but subject to redesign | x | Tested | 50% | |
W3C is working on an update extension, but no draft released so far. |
|||||
6. Access-Control | |||||
The currently implemented Unix-like access control scheme is sufficient to protect resources and collections in a multi-user environment. However, it might be too coarse-grained for some types of applications. A more dynamic ACL implementation could help here. Right now, security management forms part of the database core. This is unnecessary. A more modular architecture would allow different security managers to be plugged in. It would be the responsibility of the security manager implementation to handle ACL lists. More important, some critical areas are currently not protected: This includes access to stored XQueries, system-critical XQuery functions and the Java binding. Basically all Java classes can be used from within an XQuery and this leaves doors wide open on all systems that allow users to execute XQuery on the server. Required steps:
|
|||||
6.1. User management | Stable | x | No tests | 100% | |
No remarks available. |
|||||
6.2. Access control on resources and collections | Stable, but subject to redesign | Avg | No tests | 100% | |
Need more dynamic ACL structures that can adapt to varying requirements. |
|||||
6.3. Access control on stored XQueries, XQuery functions and modules | Beta | High | N/A | 90% | |
No remarks available. |
|||||
6.4. Java binding | Beta | High | N/A | 100% | |
No remarks available. |
|||||
7. Schema Validation | |||||
7.1. Validate document against schema when indexing | Stable | x | No tests | 100% | |
No remarks available. |
|||||
7.2. Validate document after node-level updates | Open | Avg | N/A | 0% | |
No remarks available. |
|||||
7.3. Locate schema's and DTDs stored in database | Beta | High | x | 75% | |
No remarks available. |
|||||
7.4. Support for catalog files in database | Beta | High | x | 75% | |
No remarks available. |
|||||
7.5. Manual validation against schema | Beta | High | Tested | 75% | |
No remarks available. |
|||||
7.6. XQuery validation features | Open | Avg | N/A | 0% | |
No remarks available. |
|||||
7.7. Store PSVI with the node tree in the database | Open | Low | N/A | 0% | |
No remarks available. |
|||||
7.8. Static typing based on PSVI | Open | Low | N/A | 0% | |
No remarks available. |
|||||
8. XQuery | |||||
The XQuery engine as well as the standard function libraries should be updated to align with the latest candidate recommendation. Basically, almost all core language features are implemented, excluding schema related features, which are currently beyond eXist's scope. XQuery support in eXist is not sufficiently covered by the test suite. In particular, we lack tests for the function library. Implementing the official XQTS XQuery test suite should thus be a top priority in order to guarantee standard conformance and avoid future regressions. |
|||||
8.1. Core XPath and XQuery | Stable | High | Partially tested | 100% | |
Updated to latest Candidate Recommendation. Stable, excluding schema-related features |
|||||
8.2. XPath and XQuery atomic value types | Stable | Avg | Partially tested | 60% | |
Add gregorian dates and NOTATION |
|||||
8.3. XPath and XQuery function libraries | Stable | High | Partially tested | 100% | |
Updated to latest Candidate Recommendation. Stable, excluding schema-related features. |
|||||
8.4. XPath and XQuery function libraries | Stable | High | No tests | 100% | |
Updated to latest Candidate Recommendation |
|||||
8.5. XQuery serialization | Stable, but subject to redesign | Avg | No tests | 75% | |
Though we implement most of the serialization options specified in the XQuery and XSLT serialization spec, some options need to be reworked and should be covered by tests. |
|||||
8.6. XQTS XQuery test suite | Beta | High | N/A | 93% | |
Should be implemented to avoid future regressions and ensure conformance |
|||||
8.7. XQuery Optimizer | Stable, but subject to redesign | High | No tests | 75% | |
Most query optimizations are currently hard-coded into the query engine and are applied at execution time, not at compile time. This puts a limit on extensibility and possible optimizations. As described above, adding new index types is difficult. For example, the operator implementation for “=” and “eq” does currently include about 4 or 5 different optimization paths. Location steps like child::foo can choose among 3 execution alternatives. This makes it hard to see if the correct optimization is applied or not. Adding further index types will make the code unmanageable. As already explained above, the decision which index to use should thus be moved into the query analysis phase. The index implementation thus has to provide sufficient information to make the decision. A good part of the performance problems eXist currently has could be solved by a post-compilation/pre-evaluation optimizer and intelligent query rewriting. In particular, query rewriting could be used to reduce the general size of the node sets that need to be processed by an XPath expression, which can result in a tremendous performance boost for queries on huge document sets, if those queries involve a predicate expression which limits the number of potential matches. |
|||||
8.8. Error reporting | Stable, but subject to redesign | Avg | N/A | 75% | |
Error reports by the XQuery parser and compiler need to be improved. |
|||||
8.9. Make function calls tail-recursive | Stable | High | 70 | 100% | |
Recursive functions may trigger a StackOverflowException. We need to handle tail-recursion. |
|||||
9. XInclude | Stable | Low | No tests | 100% | |
XInclude expansion happens at serialization time. Queries across the included document fragments are not possible. Stable, but limited |
|||||
10. Interfaces | |||||
10.1. XML:DB API embedded | Stable | x | Tested | 100% | |
No remarks available. |
|||||
10.2. XML:DB API remote access | Stable, but subject to redesign | Low | Partially tested | 75% | |
New implementation should be based on REST to avoid current problems (e.g.. With character encodings) |
|||||
10.3. XML-RPC | Stable | x | Partially tested | 100% | |
Exposes the entire database functionality. |
|||||
10.4. REST | Stable, but subject to redesign | Low | Partially tested | 75% | |
Does not cover user-management and permissions as well as other administrative functions. Stable, but further functionality could be exposed |
|||||
10.5. SOAP | Stable | Low | No tests | 90% | |
10.6. Cocoon Integration | Stable | x | No tests | 100% | |
General functionality tests required |
|||||
10.7. XQJ XQuery API for Java | Open | Low | N/A | 10% | |
Could be a simpler alternative to the now somewhat bloated XML:DB API |
|||||
11. Other Tasks | |||||
11.1. I18n | Open | Low | N/A | 20% | |
Provide translations for error messages, console outputs etc. At least, resource bundles should be used, so others can translate them if they want. |
|||||
11.2. Clean up/upgrade libraries | Open | Low | N/A | 0% | |
All libraries included with eXist need to be checked. |
|||||
12. Documentation | |||||
12.1. XQuery on the Web | Open | High | N/A | 0% | |
Should explain in more depth how one can write webapps in XQuery, using the XQueryGenerator with Cocoon or stored XQueries. |
|||||
12.2. XQuery stored modules | Partial | High | N/A | 60% | |
calling XQuery scripts stored in the DB; import stored modules into a query passed to the DB |
|||||
12.3. WebDAV | Beta | Avg | N/A | 90% | |
No remarks available. |
|||||
12.4. Deployment | Stable, but subject to redesign | High | N/A | 90% | |
Integration with a servlet engine, Cocoon, stand-alone server, embedded use. |
|||||
12.5. Indexation and Index Configuration | Stable | High | N/A | 100% | |
No remarks available. |
|||||
12.6. Trigger | Open | Low | N/A | 0% | |
No remarks available. |
|||||
12.7. Searchable Documentation | Open | Avg | N/A | 0% | |
The documentation is currently not searchable, though most of it is provided in XML. |
|||||
12.8. XQDoc integration | Open | High | N/A | 0% | |
Migrate the function documentation to XQDoc. Use XQDoc to better document all XQuery examples. |
Percentage | Description |
---|---|
0 | work not started |
20 | 1-20 Percentage of completion |
40 | 21-40 Percentage of completion |
60 | 41-60 Percentage of completion |
80 | 61-80 Percentage of completion |
99 | 81-99 Percentage of completion |
Done | 100 Percentage of completion |
Priority | Description |
---|---|
1. Highest |
|
2. High |
|
3. Avg |
|
4. Low |
|
5. x |
|