One major advantage of modern native XML databases is their ability to index the XML documents they contain. Proper use of indices can significantly reduce the time required to execute a particular Xquery expression. The previous examples likely executed in a perceptible amount of time, because BDB XML was evaluating each and every document in the container against the query. Without indices, BDB XML has no choice but to review each document in turn. With indices, BDB XML can find a subset of matching documents with a single, or significantly reduced, set of lookups. By carefully applying BDB XML indexing strategies we can improve retrieval performance considerably.
To examine the usefulness of our indices, we will use the time command with each of our queries. This will report how long it takes for each operation to complete.
Recall the first structural query:
time query ' collection("parts.dbxml")/part[parent-part]' 10000 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part]' Time in seconds for command 'query': 0.437096
Notice the query execution time. This query takes almost a half a second to execute because the query is examining each document in turn as it searches for the presence of a parent-part element. To improve our performance, we want to specify an index that allows BDB XML to identify the subset of documents containing the parent-part element without actually examining each document.
Indices are specified in four parts: path type, node type, key type, and uniqueness. This query requires an index of the node elements to determine if something is present or not. Because the pattern is not expected to be unique, we do not want to turn on uniqueness. Therefore, the BDB XML index type that we should use is node-element-presence-none.
dbxml> addIndex "" parent-part node-element-presence-none Adding index type: node-element-presence-none to node: {}:parent-part dbxml> time query ' collection("parts.dbxml")/part[parent-part]' 10000 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part]'
Our query time improved from .4 seconds to .2 seconds. As containers grow in size or complexity, indices increase performance even more dramatically.
The previous index will also improve the performance of the value query designed to search for the value of the parent-part element. But for better results, we should index the node as a double value. (You use double here instead of decimal because the XQuery specification indicates that implicit numerical casts should be cast to double).
To do this, use a node-element-equality-double index.
dbxml> time query ' collection("parts.dbxml")/part[parent-part = 1]' 3333 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part = 1]' Time in seconds for command 'query': 0.511752 dbxml> addIndex "" parent-part node-element-equality-double Adding index type: node-element-equality-decimal to node: {}:parent-part dbxml> time query ' collection("parts.dbxml")/part[parent-part = 1]' 3333 objects returned for eager expression ' collection("parts.dbxml")/part[parent-part = 1]' Time in seconds for command 'query': 0.070674
Additional indices will improve performance for the other value queries.
dbxml> time query ' collection("parts.dbxml")/part[@number > 100 and @number < 105]' 4 objects returned for eager expression 'collection("parts.dbxml")/part[@number > 100 and @number < 105]' Time in seconds for command 'query': 5.06106
At over 5 seconds there is plenty of room for improvement. To improve our range query, we can provide an index for the number attribute:
dbxml> addIndex "" number node-attribute-equality-double Adding index type: node-attribute-equality-double to node: {}:number dbxml> time query ' collection("parts.dbxml")/part[@number > 100 and @number < 105]' 4 objects returned for eager expression ' collection("parts.dbxml")/part[@number > 100 and @number < 105]' Time in seconds for command 'query': 3.33212
As you can see, proper use of indices can dramatically effect query performance.