DBA: XML
Getting into SQL/XML by
Tim Quinlan
Get an overview of SQL and XML interoperability and learn how to
begin working with XML documents stored in an Oracle database.
Published December 2006
XML data is commonly used in today’s production systems and is a
major part of database implementations. In the past, this fact
caused design problems for database developers who consequently had
to store, query, and update XML data as unstructured LOBs, or
alternatively, “shred” data into relational tables and then put it
back together. This approach resulted in programming complications
and inefficiencies because the access mechanisms were immature.
The Oracle XML DB feature that was first delivered with
Oracle9i Database Release 2 provided groundbreaking
features for storing, retrieving, and manipulating XML data in the
database. Oracle
10g Release 2 XML DB, however, greatly extends this
initial approach in which XML data is little more than a BLOB in the
database.
If you are a DBA or a developer new to the implementation of XML
in Oracle, this article is for you: you will get an overview of SQL
and XML interoperability and learn how to begin working with XML
documents stored in an Oracle 10g Release 2 database, and
about the many features of Oracle Database 10g Release 2
that DBAs and developers should become comfortable with before
beginning to work with XML in the database. Thanks to these SQL/XML
features, the skills you've already acquired while working with
relational data can easily be enhanced to help you work with XML.
What is Oracle’s implementation of XML?
Oracle has implemented XML through the XML DB component of the
database, which is a standard feature in Enterprise Edition as well
as Standard Edition. Oracle XML DB is easily installed using either
the Database Configuration Assistant (DBCA) or by running a catalog
script, catqm.sql. The XML DB product provides all the features
discussed in this article.
You can verify that XML DB has been installed by simply running
the SQL below: select comp_name from dba_registry where comp_name like '%XML%';
COMP_NAME
Oracle XML Database
The XML DB schema and its objects can also be checked to see that
Oracle XML DB has been installed.
SQL and XML Working Together
The SQL/XML standard has been under development to provide a
mechanism that allows us to generate XML from a relational query
and, conversely, provide the ability to deliver SQL data from XML
documents. Oracle XML DB in Oracle Database 10g Release 2
implements SQL 2003 and features from the upcoming SQL/XML standard
release. The implementation of these standards in Oracle allows us
to look inside an XML document in ways that were not possible in
older SQL standards.
New datatype. A new datatype, XMLType, was
introduced in Oracle9i Database Release 1 to allow an XML
document in the database to be accessible in SQL and at the same
time allow XML developers the ability to use XML standards on a
document. This datatype tells the database that the content is in
XML format and allows us to perform queries on an XML document.
Using XMLType rather than a relational or CLOB implementation
provides a layer of separation between the application and storage
model. This separation can allow data to move to a different storage
model without being tied to a CLOB or relational model. XMLType can
be used to create a table, column, or view. It can also be used as a
datatype for parameters and variables.
Built-in XML methods operate on the content of a document by
allowing us to create, extract, and index XML data. Indexing can be
performed using b-tree, text indexing, and function-based indexes.
In effect, XMLType data combined with XPath access can be used to
look inside a document. This functionality is provided through
PL/SQL and Java APIs. XMLType can be used in PL/SQL, in Java using
JDBC, and in the Oracle Data Provider for .Net. This powerful and
relatively new datatype will be used extensively throughout this
article.
Storage structures. Oracle’s XML implementation
gives us the flexibility to store data in two different ways: as
structured and unstructured storage. XMLType data is unstructured
when stored as a single LOB column and structured when implemented
as a set of objects. One specific example of this is an XMLType
table, which can be implemented using either of the two storage
models. An XMLType table is unstructured when implemented as a LOB
by using the “XMLType store as CLOB” syntax shown below: create table table_name of XMLtype
XMLType store as CLOB;
Let’s focus on the structured and unstructured storage for XML
data and look at these in more detail.
Structured storage. Structured XML storage is
implemented as a set of objects. These objects can be implemented in
a relational format through tables with referential constraints
optionally implemented between the tables. They can also be
implemented by using an XMLSchema to decompose an XMLType document
into a set of objects.
In the case where relational tables are used, the tables may be
designed in advance for this purpose or, alternatively, existing
tables can be used. The document is in effect stored as a “virtual”
document through the relational tables, with its logical structure
maintained. This approach maintains Document Object Model (DOM)
fidelity, although it is not a byte-for-byte physical representation
of the document. This can be done by creating an XMLType view over
existing relational data.
Structured storage has some performance advantages over
unstructured storage and can be chosen to provide more query and
update optimization through the table and index design. XML
operations on structured storage can help to reduce memory and
storage because XML tags are not stored and because there is
potentially a more-granular level of data retrieval and usage. Index
usage with b-tree and function-based indexes is enhanced, and
in-place updates can be performed on pieces of the document using
XPath rewrite. All of this helps performance and is discussed in
more detail later in this article.
There are also some disadvantages to using structured storage.
Inserting and retrieving an entire document can take more overhead.
There is also limited flexibility, because only documents that match
the XMLSchema can be stored—although there are cases where this is
actually an advantage. The document is not stored as a byte-for-byte
representation of the original, and the order of data in a document
is not maintained. However, no data will be lost.
Think of highly structured data as being mostly XML, where each
element of the data can be clearly defined.
Unstructured storage. XML can be stored in
Oracle in an unstructured manner using CLOB storage such that SQL
queries will not know the structure of the data. This data can be
stored using Oracle’s XMLType datatype.
Unstructured data may be chosen in cases where you want the
stored data to match the exact physical representation of the
document. Also, if the data is not updated often or if there is a
large percentage of inserts or reads of the entire document (rather
than pieces of the document), it may be more efficient to keep the
data together in one spot. There are also situations where you need
to keep documents flexible, and in these cases an XMLType table or
column can be useful.
A disadvantage of unstructured storage is that updates to pieces
of a document are generally not as efficient as with structured
data. Also, XPath operations that construct DOM from CLOBs using
functions can be an expensive use of system resources. Other issues
with unstructured data are that SQL constraints cannot be
implemented and that memory management is not as efficient.
Think of unstructured data as mostly not XML data and where most
or all of the data is simply seen as a single CLOB.
Now that you’ve been introduced to the storage models, let’s take
a look at some naming features.
What’s in a Namespace?
This is a large topic and one of the more complex XML DB concepts
to understand. In this article, we will cover some namespace
concepts to give you a gist of what these are and how they are used.
A namespace is used to describe a set or collection of related
attributes or elements in an XML document. Namespaces can be used to
ensure that document constructs have a universally unique name.
XMLSchema in particular takes advantage of this feature, as a target
namespace is usually the same as the URI of the XMLSchema. Examples
of namespaces are shown below. Note that the naming looks similar to
an internet URL. This naming standard is recommended by the World
Wide Web Consortium (W3C) but is not required. The URI is used to
uniquely identify the name of a registered XMLSchema in the database
and does not need to be the physical URI where a document is
located.
XMLType methods and XML functions make use of namespace prefixes.
When an XML document has no target namespace, then the namespace
prefix is in the noNameSpace namespace. Attribute
xsi:noNamespaceSchemaLocation can be used for the schema URI.
If an element does not have another namespace prefix, a namespace
can be applied by default when it applies to an element where it is
declared.
There are two special namespaces that you will see regularly.
These are http://www.w3.org/2001/XMLSchema, which is the
overall XMLSchema namespace, and http://xmlns.oracle.com/xdb, which is the
Oracle-supplied XML DB namespace. The latter namespace has functions
that get written to underlying SQL functions. Some of the XPath
functions are also rewritten. Examples are ora:contains, ceiling,
floor, not, string-length, substring, and translate. Attributes used
by XML DB also belong to this namespace.
When defining an element, we can specify a namespace for that
element. In XML, we can define this as <elementName xmlns:ab="http.name.com" />
As you see, we gave an elementName, which is the element we are
working on (for example, “orders”). Xmlns is a reserved word and
tells us that this is a namespace definition. The “ab” part is
called a prefix. This prefix is a short name that is bound to the
URI. In this case, ab is bound to ‘http.name.com’, and ab can be
prefixed to an element name.
Below we will create a very small schema and declare a bind
prefix xs to http://www.w3.org/2001/XMLSchema and prefix xdb to
http://xmlns.oracle.com/xdb . In order to declare
the namespace, we use xmlns to bind them. Notice that the xs prefix
is used in this case on the schema element names. <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xdb= "http://xmlns.oracle.com/xdb" version="1.0">
<xs:element name="INVOICESCHEMA" xdb:defaultTable="INVOICESCHEMA">
<xs:complexType>
<xs:sequence>
<xs:element name="MailAddressTo">
<xs:complexType>
<xs:sequence>
<xs:element name="Person" type="xs:string"/>
<xs:element name="Street" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
We’ve only touched the surface of namespaces here. Refer to the
Oracle XML DB Developer’s Guide 10g Release 2, on Oracle
Technology Network (OTN), for more information on this topic.
Working with XML
Getting data into an XMLType table and column.
As you may expect, there are many options available to us to get XML
data into the database. Data can be inserted with SQL, PL/SQL, Java,
and C programs. We can also load data into the database using the
SQL*Loader and Import/Export functions of XMLType tables. XML data
can also be moved with Oracle Streams and can also use DBMS_XDB
functions. Let’s explore one useful and simple way to load XML data.
In this example, an XML document will be loaded from a file into
a table using a SQL insert to load the XML document from a bfile.
This file needs to be accessed through a database directory. To
begin this, we’ll create a directory pointing to a file where the
XML document file is stored. Note that to perform the examples in
this article, you must ensure that role xdbadmin has been granted to
the user you are working with. create or replace directory xmldir as '/u01/app/oracle/admin/db01/sql';
Now we can create tables that can contain XMLType data in two
ways. The first will be a table with a column of XMLType. The second
will be a table defined as type XMLType.
1. Create a table with an XML column. create table invoiceXML_col (
inv_id number primary key,
inv_doc XMLType);
2. Create an XML table. create table invoiceXML_tbl of XMLtype;
Each of the above statements implicitly creates two indexes in
each table—one for the primary key and one for the LOB.
Now we will insert an invoice document into both tables. Note
that we are using character set AL32UTF8 and will specify this to
pass the character set encoding to be used for the file being read.
Data will be inserted into both tables to show the similarity
between the SQL statements. Insert into invoicexml_col values (1,
XMLType(bfilename('XMLDIR', 'invoicexml.txt'),
nls_charset_id('AL32UTF8') ));
Insert into invoicexml_tbl values
(XMLType(bfilename('XMLDIR', 'invoicexml.txt'),
nls_charset_id('AL32UTF8')));
We now have two XML documents loaded into the database using an
XMLType table and column. It’s as simple as that.
Registering and Implementing Schema-Typed Data
As if dealing with namespaces and storage techniques were not
enough, XMLType data can also be implemented as schema-typed or
untyped data. An XMLSchema associates a document with an XMLSchema
that provides information about the document structure and contents.
This provides better documentation, validation, and controls around
a document. The information in the schema can be used to deliver
more-efficient query and update processing. An XMLSchema also allows
an XML document to be stored as structured storage data, because the
document can be decomposed into a set of objects through the
XMLSchema. The object model used to store the document is determined
by the schema definition. XMLType methods schemaValidate() and
isSchemaValid() allow editing of an XMLType document using the
schema definition.
How do we create a schema and use this with an XML document? To
do this, we need to create an XMLSchema definition
- Register the schema
- Create an XMLSchema-based table
- Insert data into it
The example below shows an implementation of these in four steps.
1. First, create a schema definition in file invoiceformtest.xsd
in directory XML_DIR. The listing is below: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xdb=http://xmlns.oracle.com/xdb version="1.0">
<xs:element name="INVOICEFORMTEST" xdb:defaultTable="INVOICEFORMTEST">
<xs:complexType>
<xs:sequence>
<xs:element name="MailAddressTo">
<xs:complexType>
<xs:sequence>
<xs:element name="Person" type="xs:string"/>
<xs:element name="Street" type="xs:string"/>
<xs:element name="City" type="xs:string"/>
<xs:element name="State" type="xs:string"/>
<xs:element name="Zipcode" type="xs:string"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string" use="required"/>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
2. Register the above schema in the database using procedure
DBMS_XMLSCHEMA.registerSchema. The username you are using to connect
to the database will require that the alter session privilege be
granted to it so that the schema can be successfully registered.
BEGIN
DBMS_XMLSCHEMA.registerSchema(
SCHEMAURL => 'http://xmlns.oracle.com/xdb/invoiceformtest.xsd',
SCHEMADOC => bfilename('XMLDIR','invoiceformtest.xsd'),
CSID => nls_charset_id('AL32UTF8'));
END;
/
To delete the schema, simply run the DBMS_XMLSCHEMA.deleteSchema
statement, as shown below: BEGIN
DBMS_XMLSCHEMA.deleteSchema(
SCHEMAURL => 'http://xmlns.oracle.com/xdb/invoiceformtest.xsd',
DELETE_OPTION => dbms_xmlschema.DELETE_CASCADE_FORCE);
END;
/
3. The above statement also creates table invoiceformtest because
of the xdb:defaultTable="INVOICEFORMTEST" statement in the schema
definition above. Without this, a generated name (one that you would
not want to work with) would have been created.
4. Now we are ready to enter documents into the XMLSchema-based
table from an XML document that is stored in XML_DIR and is in the
format specified by the implemented schema. Insert into invoiceformtest values
(XMLType(bfilename('XMLDIR', 'invoiceformtest.txt'),
nls_charset_id(' AL32UTF8')));
Let’s look at what we’ve just created using the object_value
pseudocolumn on the XML document. SQL> select object_value from invoiceformtest;
OBJECT_VALUE
<INVOICEFORMTEST>
<MailAddressTo id="1">
<Person>Joe Smith</Person>
<Street>10 Apple Tree Lane</Street>
<City>New York</City>
<State>NY</State>
<Zipcode>12345</Zipcode>
</MailAddressTo>
</INVOICEFORMTEST>
We’re done!
Understanding XQuery Functions and Expressions
XQuery is a standard developed by the W3C for a query language to
extract information from an XML file It allows us to access physical
XML documents or relational data that is virtualized as XML
documents through XML views. The expressions used can be simple
queries or part of a larger query, and standard functions can be
included such as date/time, arithmetic, or string functions as well
as user-defined functions. XQuery is new in Oracle Database
10g Release 2 and is supported in XMLQuery() and XMLTable()
functions. Examples using these two functions are shown later, in
the “XMLQuery() and XMLTable()” section.
XQuery makes use of XPath expressions (described below) to locate
the detailed items in XML documents. This can be thought of as the
foundation for the XMLQuery() and XMLTable() functions introduced in
Oracle Database 10g Release 2 and covered later in this
article.
Some of the more important XQuery expressions include XPath,
XMLSequence, and FLWOR and are discussed below.
XPath, XMLSequence, and FLWOR Expressions
XPath. XPath provides a way to locate items in
an XML document using addressing techniques, and it processes these
items using a logical path through a document structure. It allows a
programmer to deal with a document at a higher level of abstraction
by specifying a route through a document rather than pointing at
specific elements. XPath uses the concept of a node that defines
where the path begins and then a “logical tree” that includes
relationships such as attribute, self, parent, child, and ancestor.
In other words, XPath models an XML document as a tree of nodes.
There are different types of nodes, such as element nodes, attribute
nodes, and text nodes. XPath can determine a way to calculate a
string value for each node.
XPath expressions can be used to query and update XML documents
in a standards-based way. We’ll use the extract, extractValue,
existsNode, and XMLSequence functions to demonstrate XPath
functionality using the invoicexml_tbl document we’ve already
created and into which we’ve already inserted data.
Query the document to be used in XPath examples.
Let’s take a look at searching XML (XMLType) data with SQL using
XPath functions. To do this, we’ll look at the document we’ll be
working with using the object_value pseudocolumn to retrieve the XML
document from the XMLType table. SQL> select object_value from invoicexml_tbl;
OBJECT_VALUE
<Invoice>
<MailAddressTo id="PA">
<Person>Joe Smith</Person>
<Street>10 Apple Tree Lane</Street>
<City>New York</City>
<State>NY</State>
<Zipcode>12345</Zipcode>
</MailAddressTo>
<MailAddressFrom id="PA">
<Person>Ed Jones</Person>
<Street>11 Cherry Lane</Street>
<City>Newark</City>
<State>NJ</State>
<Zipcode>67890</Zipcode>
</MailAddressFrom>
<Details id="2006Sept1to30PA">
<FromTo>Sept 1, 2006 to Sept 30, 2006</FromTo>
<Hours>70</Hours>
<Rate>30</Rate>
<Taxes>210</Taxes>
<TotalDue>2310</TotalDue>
<InvDate>Oct 1, 2006</InvDate>
<Contractor>Ed Jones</Contractor>
</Details>
</Invoice>
Using extract. Using extract, we can select an
individual node and its leaf nodes from the document by combining
extract with object_value. In other words, we can look inside an XML
document that’s been stored as XMLType. This is true whether we’ve
used structured or unstructured data and also whether or not the
data is schema-based. Let’s extract the MailAddressTo node and its
leaf nodes. select extract(object_value, '/Invoice/MailAddressTo') from invoicexml_tbl;
EXTRACT(OBJECT_VALUE,'/INVOICE/MAILADDRESSTO')
<MailAddressTo id="PA"><Person>Joe Smith</Person><Street>10
Apple Tree Lane</Street><City>New York</City><State>NY</Stat
e><Zipcode>12345</Zipcode></MailAddressTo>
As you can see, the output includes the MailAddressTo portion of
the document and is not pretty-printed (formatted). Also, the syntax
used to accomplish this is very simple. What’s important is that we
were able to look inside the document without dumping the entire
thing.
Using extractValue. The data value that exists
in a leaf node can be extracted using extractValue. A higher - level
node such as MailAddressTo cannot be extracted using this function.
Note that the output of this is not in XML-syntax format and
contains simply the data value. select extractValue(object_value, '/Invoice/MailAddressTo/Person')
Person from invoicexml_tbl;
PERSON
Joe Smith
Using existsNode. ExistsNode is used in a
similar fashion, to search for specific values at the node level
(and only the node level) of a document. It returns a True or False
flag to specify whether a search was successful. The = 1 predicate
is not a count but rather represents a True condition, and = 0 is
False. Select count(*) from invoicexml_tbl
where existsNode(
object_value, '/Invoice/MailAddressTo[Person="Joe Smith"]') = 1;
COUNT(*)
1
Using XMLSequence. Unlike extractValue, which
can only extract a value from a single node, XMLSequence can be used
to look at multiple nodes or a fragment of a document. It does this
by creating a virtual table that consists of XMLType objects. Let’s
compare extractValue with XMLSequence using the MailAddressTo branch
node. select extractValue(object_value, '/Invoice/MailAddressTo') from invoicexml_tbl;
from invoicexml_tbl
*
ERROR at line 2:
ORA-19025: EXTRACTVALUE returns value of only one node
This ORA-19025 message is self-explanatory. We can fortunately
overcome this by restructuring the query and using XMLSequence, as
shown below: select value(addr)
from invoicexml_tbl i,
table(XMLSequence(
extract(i.object_value, '/Invoice/MailAddressTo'))) addr
where existsNode(i.object_value, '/Invoice/Details[@id="2006Sept1to30PA"]') = 1;
VALUE(ADDR)
<MailAddressTo id="PA"><Person>Joe Smith</Person><Street>10
Apple Tree Lane</Street><City>New York</City><State>NY</Stat
e><Zipcode>12345</Zipcode></MailAddressTo>
Because XMLSequence creates a virtual table, we can also use this
function on a leaf node. select value(person)
from invoicexml_tbl i,
table(XMLSequence(
extract(i.object_value, '/Invoice/MailAddressTo/Person'))) person
where existsnode(i.object_value, '/Invoice/Details[@id="2006Sept1to30PA"]') = 1;
VALUE(PERSON)
<Person>Joe Smith</Person>
FLWOR. FOR, LET, WHERE, ORDER BY, and RETURN
(FLWOR; pronounced “flower”) is one of the most important and
powerful expressions in XQuery syntax. Either FOR or LET must exist,
WHERE and ORDER BY are optional, and RETURN is
mandatory.FLWOR on its own is a large topic to
cover. This section gives you an introduction to the power of this
statement.
FOR binds one or more variables in an iterative manner, in the
order that the variables are listed. Values listed previously can
then be used in a subsequent set of values. This also works
similarly to a SQL From clause. Values listed previously can then be
used in a subsequent set of values, as shown below: For $var in (1,2,3) , $varPlus5 in (5+$var, 5+$var, 5+$var)
The three iterations set $var and $varPlus5 to 1,6; 2,7; and 3,8,
respectively.
LET, like FOR, binds variables iteratively, and values can be
computed using previously calculated values. As with FOR, LET can be
thought of as a SQL FROM clause. LET can also be used to perform
joins.
WHERE filters data in the same manner as a SQL WHERE clause.
ORDER BY optionally sorts the data.
RETURN returns the final result set from the filtered and ordered
FLWOR expression.
FLWOR being used with XMLQuery(). Let’s look at
an example where we query and join two documents: partys.xml to an
orders.xml document on the Party key. This XML data is in the Oracle
XML DB Repository. To do this, we will use XMLQuery(); FLWOR; and
XQuery functions doc, count, avg, and integer. These are in the
namespace for built-in XQuery functions,
http://www.w3.org/2003/11/xpath_functions.
The query below reads as follows: Using function fn:doc, FOR all
partyno attributes in partys.xml, join (LET) all order elements in
orders.xml that match on partyno (variable $p was bound in the FOR
statement). This produces a stream of items ($p and $o), with $p
representing a party number and $o a set of orders for that party.
Get the items WHERE there is more than 1 order. ORDER BY the average
amt descending using XQuery function avg in namespace fn. Amt is
attached to order element $o. Return the party number (bound to $p)
and child element ordercount. SELECT XMLQuery()('for $p in fn:doc("/public/partys.xml")/partys/party/@partyno
let $o := fn:doc("/public/orders.xml")/orders/order[@partyno = $p]
where fn:count($o) > 1
order by fn:avg($o/@amt) descending
return <big-party>{$p,
<ordercount>{fn:count($o)}</ordercount>,
<avgamt>{xs:integer(fn:avg($o/@amt))}</avgamt>}
</big-party>'
RETURNING CONTENT) ORDERS FROM DUAL;
ORDERS
<big-party>1111<ordercount>2</ordercount><avgamt>3500</avgamt></big-party>
Yes, this query is doing a lot and shows some of what’s possible
when using FLWOR expressions on XML documents.
XQuery ora: Functions
Oracle XML DB provides five more XQuery
functions that developers can use. These are
implemented in the http://xmlns.oracle.com/xdb namespace, which uses
the prefix ora:. Along with these are two other
XPath functions. These are not yet part of the SQL/XML standard but
are expected to be in the future. The ora:view function is a
particularly valuable one that can be used to transform relational
data into XML format. All of these are described below.
ora:view XQuery function. This is a valuable
function to create a view over relational tables. This can make them
appear as if they are XML documents. Syntax: ora:view ([db-schema STRING, ] db_table STRING)
RETURNS document-node (element())
Examples that use ora:view are shown later in the article.
ora:contains XQuery function. The namespace
prefix “ora:” and name “contains” are used together to create an
XPath function that can be used as an XPath argument to the SQL/XML
existsNode, extract, and extractValue functions.This allows you to
perform a structural search with a text predicate.
This function returns a positive number if text_query matches the
input_text. If they do not match, then 0 is returned. When they are
used with existsNode, extract, or extractValue, you need to include
the namespace mapping parameter
xmlns:ora="http://xmlns.oracle.com/xdb". The syntax for ora:contains
is shown below. The owner is the current user in cases where a
policy_owner is not defined. The policy_name is the name of the
matching rules that will be applied. The default matching rules are
defined by ctxsys.default_policy_oracontains. Syntax: ora:contains (input_text, text_query [, policy_name] [, policy_owner])
The example below shows how we can use ora:contains as an
argument to existsNode. SELECT count(*) FROM invoicexml_tbl
WHERE existsNode(object_value, '/Invoice/MailAddressTo/Person
[ora:contains(text(), "Joe Smith") > 0]',
'xmlns:ora="http://xmlns.oracle.com/xdb"') = 1;
COUNT(*)
1
ora:matches XQuery function. This function uses
regular expressions to match text. This is similar to the SQL
REGEXP_LIKE condition except that it uses XQuery arguments instead
of SQL datatypes. True is returned if the target_string matches the
regular expression match_pattern ; if not, False is
returned. As you can see in the syntax below, match_parameter can be
added to specify additional criteria to use in a search. An example
of this is to provide case sensitivity. Syntax: ora:matches (target_string, match_pattern [, match_parameter])
ora:replace XQuery function. This extends the
ora:matches function by replacing the target_string with the
replace_string if the match_pattern is met ( True ). As with
ora:matches, this uses regular expressions to match the text. Syntax: ora:replace (target_string, match_pattern,
replace_string [, match_parameter])
ora:sqrt XQuery function. As you may expect from
the name, this returns the square root value of the provided number.
Syntax: ora:sqrt (number)
XPath extension functions: ora:instanceof and
ora:instanceof-only. Oracle XML DB can support schema-based
data when the datatypes of attributes and elements are known.
Because XPath 1.0 is not aware of datatype information, there are
XML DB extension functions in namespace http://xmlns.oracle.com/xdb
that allow you to restrict an XML document node to a specific
datatype. Function ora:instanceof can be used to restrict a node to
a datatype or subtype, and ora:instanceof-only restricts a node to
the datatype only. A subtype is a feature than can be used to extend
or restrict a type. Syntax: ora:instanceof(nodeset-expr, typename [, schema-url])
Syntax: ora:instanceof-only(nodeset-expr, typename [, schema-url])
In the above syntax, nodeset-expr is usually a relative XPath
expression. This function returns True if the datatype of any node
matches typename and False if it does not. Typename can be qualified
with a namespace prefix.
T hese functions will return False for non-schema-based data
because it does not have a datatype associated with elements and
attributes .
Querying XML Data in the XML DB Repository Using fn:doc and
fn:collection Functions
There are two important XQuery functions that can be used to
query all of the resources in the XML DB Repository. Fn:doc is an
XQuery function that can obtain a repository file containing XML
data. This file resource is pointed at by its URI argument. XQuery
variables can be bound to data using the FLWOR expressions FOR and
LET. XQuery function fn:doc can be used to read a single XML
document stored in the XML DB repository.
The second XQuery function that queries resources in the
repository is fn:collection. This can return a number of similar
documents stored in the same folder in the repository.
Let’s create repository resources to illustrate how fn:doc and
fn:collection work with some simple examples. We will start by
creating a resource using the DBMS_XDB PL/SQL package. This can be
used to manage resources in XML DB. The createResource procedure is
used in this case to create new file resources that contain the
Orders and Partys XML documents.
Create the orders and partys resource names used in the examples
below. There is one additional resource created named
ordersnamespace.xml, which shows an example of a namespace being
used. DECLARE
res BOOLEAN;
ordersxmlstring VARCHAR2(500):=
'<?xml version="1.0"?>
<orders>
<order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
<order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
<order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
<order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
</orders>';
partysxmlstring VARCHAR2(500):=
'<?xml version="1.0"?>
<partys>
<party partyno="1111" partyname="ABC Corp" partycity="Toronto"/>
<party partyno="2222" partyname="Freds Inc" partycity="Chicago"/>
<party partyno="3333" partyname="Gofaster Corp" partycity="Montreal"/>
</partys>';
ordersxmlnsstring VARCHAR2(500):=
'<?xml version="1.0"?>
<orders xmlns="http://order.com">
<order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
<order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
<order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
<order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
</orders>';
BEGIN
res := DBMS_XDB.createResource('/public/orders.xml', ordersxmlstring);
res := DBMS_XDB.createResource('/public/partys.xml', partysxmlstring);
res := DBMS_XDB.createResource('/public/ordersnamespace.xml', ordersxmlnsstring);
END;
We can see the resources we just created by querying the
resource_view, as shown below: SQL> select any_path, res from resource_view where any_path like '%partys%';
ANY_PATH
RES
/public/partys.xml
<Resource xmlns="http://xmlns.oracle.com/xdb/XDBResource.xsd">
<CreationDate>2006-06-19T17:12:00.414000</CreationDate>
<ModificationDate>2006-06-19T17:12:00.414000</ModificationDate>
<DisplayName>partys.xml</DisplayName>
<Language>en-US</Language>
<CharacterSet>WINDOWS-1252</CharacterSet>
<ContentType>text/xml</ContentType>
<RefCount>1</RefCount>
</Resource>
Next, display a single partys.xml document using fn:doc. SELECT XMLQuery('for $p in fn:doc("/public/partys.xml")
return $p'
RETURNING CONTENT) partys FROM DUAL;
PARTYS
---------------------------------------------------------------------
<partys><party partyno="1111" partyname="ABC Corp"
partycity="Toronto"/><party partyno="2222" partyname="Freds Inc"
partycity="Chicago"/> <party partyno="3333" partyname="Gofaster Corp"
partycity="Montreal"/></partys>
Fn:collection can be used a little differently than fn:doc by
returning all of the documents stored in the same folder in the
repository. The output produced from this statement was not
pretty-printed, and the formatting below was done manually to make
the document more readable. SELECT XMLQuery('for $p in fn:collection("/public")
return $p'
RETURNING CONTENT) collection_public FROM DUAL;
COLLECTION_PUBLIC
----------------------------------------------------------------------
<orders>
<order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
<order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
<order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
<order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
</orders>
<orders xmlns="http://order.com">
<order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
<order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
<order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
<order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
</orders>
<partys>
<party partyno="1111" partyname="ABC Corp" partycity="Toronto"/>
<party partyno="2222" partyname="Freds Inc" partycity="Chicago"/>
<party partyno="3333" partyname="Gofaster Corp" partycity="Montreal"/>
</partys>
There are many other DBMS_XDB procedures and functions that can
be used to manage all XML DB resources, such as providing the
ability to delete resources and create folders. These can be found
in the Oracle XML DB Developer’s Guide 10g Release
2, on OTN.
XMLQuery() and XMLTable()
The XMLQuery() and XMLTable() functions were introduced in Oracle
Database 10g Release 2. They provide a powerful interface
between SQL and XML, allowing us to query, construct, and transform
relational data as if it’s XML and XML data as if it’s relational.
In general, we use XMLQuery() to generate an XML document from
relational data and XMLTable() to create relational views from XML
data. These are not yet part of ANSISQL but are
expected to be included in a future standard.
XMLQuery() function. This is used to construct
XML data and query it using XQuery. It enables the execution of
XQuery expressions in a SQL context. We can also query relational
data by creating a dynamic XML view over relational data using
ora:view and are able to operate on parts of an XML document rather
than on the entire document.
XMLQuery() takes an XQuery expression as a string literal and an
optional XQuery “context item” as a SQL expression. This expression
sets the XPath context in which the XQuery expression will be
evaluated. The function returns the result of the XQuery expression
as an XMLType instance. The XMLQuery() function can also take in SQL
expressions as arguments, where the values are bound to XQuery
variables when the expression is evaluated. The results are returned
as an XMLType instance.
Let’s look at two examples where we use XMLQuery() on relational
and XML data.
• Example combining XMLQuery() with ora:view and
FLWOR expressions. In the first example, we will use
XMLQuery() on physical relational tables. XMLQuery() must act on XML
data, and this is accomplished using the ora:view feature on
relational tables Employee and Department in the HR schema. Once
ora:view is used on both relational tables, they appear as XML and
we are able to then use XQuery expressions, including a nested FLWOR
expression.
The query below is doing the following: For each department, get
the department id, and for all employees that match the department
id with a commission greater than 30 percent, return the employee
first and last name. Note that the FOR expression is used twice in
this query. SELECT XMLQuery(
'FOR $dep in ora:view("DEPARTMENTS")/ROW
RETURN <Department id="{$dep/DEPARTMENT_ID}">
<Employee>
{FOR $emp in ora:view("EMPLOYEES")/ROW
WHERE $emp/DEPARTMENT_ID eq $dep/DEPARTMENT_ID
and $emp/COMMISSION_PCT > .3
RETURN ($emp/FIRST_NAME, $emp/LAST_NAME)}
</Employee>
</Department>'
RETURNING CONTENT) HIGH_COMMISSION_EMP_NAMES FROM DUAL;
HIGH_COMMISSION_EMP_NAMES
---------------------------------------------------------------
<Department id="10"><Employee></Employee></Department><Department
id="20"><Employee></Employee></Department><Department
id="30"><Employee></Employee></Department><Department
id="40"><Employee></Employee></Department><Department
id="50"><Employee></Employee></Department><Department
id="60"><Employee></Employee></Department>
The result, as you can see, is not pretty-printed. The query
syntax is slightly complex but shows that we have the ability to
transform relational data into XML data with ora:view and that we
can perform a join operation on XML documents and apply predicates
on the data inside the documents.
• Example using XMLQuery() with an XMLType column and
FLWOR expressions. Table invoicexml_col, which we
created above and into which we inserted a document, contains an
XMLType column inv_doc. The Invoice data that is stored in this
column is in XML format. In this query, we will pass XMLType column
inv_doc to XQuery using the XMLQuery() function with the PASSING
clause. Notice how we can return some specific fields from the
Invoice document based on WHERE predicates. In effect, we are able
to get inside the XML document and retrieve specific fields based on
predicates rather than only being able to see this document as a
CLOB.
The SELECT statement below applies to all of the rows of
invoicexml_col. We then iterate across all the invoice rows with the
FOR statement. The WHERE predicate picks out the elements from
Zipcode 12345, and we then RETURN the City, State, and Zipcode and
whether the City and State are entered correctly for this Zipcode.
An IF…THEN…ELSE construct shows some additional power to this
syntax. Select XMLQuery(
'FOR $i in /Invoice
WHERE $i/MailAddressTo/Zipcode = 12345
RETURN <Details>
<Zipcode num="{$i/MailAddressTo/Zipcode}"/>
<CityName char="{$i/MailAddressTo/City}"/>
<City>{IF ($i/MailAddressTo/City = "New York")
THEN "Correct City"
ELSE "Incorrect City"}
</City>
<State>{if ($i/MailAddressTo/State = "NY")
then "Correct State" else "Incorrect State"}
</State>
</Details>'
PASSING inv_doc RETURNING CONTENT) ny_invoice
FROM invoicexml_col;
NY_INVOICE
<Details><Zipcode num="12345"></Zipcode>
<CityName char="New York"></CityName>
<City>Correct City</City>
<State>Correct State</State>
</Details>
Both queries illustrate the power we have to look inside and
process XML documents in a detailed and piece-wise manner. The
functionality provided is similar to what we’ve been able to do with
relational data—impressive!
XMLTable() function. This function enables an
XML value to be interpreted as a table or a set. It’s used to return
a table and columns from the evaluation of XQuery expressions,
rather than returning a sequence as XQuery would normally do.
XMLType data can be queried and the XML results split or shredded
into relational format—think of this as creating a virtual table.
The virtual table can then be used to insert data into other tables
or can be queried. Relational views can also be constructed over XML
data. The XMLTable() function can also be used in a SQL From clause.
• Example using XMLTable() with the COLUMNS
clause. We will use our invoicexml_col table,
introduced earlier, to illustrate how XMLTable() can be used to
transform XML data into relational format. In the example below,
XMLTable() accesses the Invoice document that is stored in column
inv_doc. The paths of the data elements we want are mapped to new
column names and formats using the COLUMNS
clause.The XMLTable() function returns the data as
a virtual table, and the result of this query is the same as if we
had queried a relational table. Note the use of the WHERE clause at
the bottom of the query that allows us to filter XML data in the
exact same way we would with any SQL query written for relational
data.
The query and output are shown below: SELECT inv_id, a.PersonName, a.StreetName, a.CityName, a.State, a.Zipcode
FROM invoicexml_col,
XMLTABLE('/Invoice'
PASSING invoicexml_col.inv_doc
COLUMNS
PersonName varchar2(10) PATH '/Invoice/MailAddressTo/Person',
StreetName varchar2(20) PATH '/Invoice/MailAddressTo/Street',
CityName varchar2(10) PATH '/Invoice/MailAddressTo/City',
State varchar2(5) PATH '/Invoice/MailAddressTo/State',
Zipcode varchar2(7) PATH '/Invoice/MailAddressTo/Zipcode'
) a
WHERE a.CityName like 'New%';
INV_ID PERSONNAME STREETNAME CITYNAME STATE ZIPCODE
1 Joe Smith 10 Apple Tree Lane New York NY 12345
Getting XML to Perform
We’ve already seen that we can perform queries that look inside
XML documents. The next step is to think about ways to make these
queries perform quickly. XML queries can be explained and indexes
created to help improve the performance of XML data access. In
particular, we can improve the performance of XPath functions in a
manner similar to the way we’ve tuned SQL in the past. Also, as with
SQL tuning, there are some situations where restructuring XML
queries can also help to change and improve access paths.
Function-based indexes can be used on structured and unstructured
XMLType tables whether they are schema-based or not. We may also
want to take advantage of binary indexes for functions such as
existsNode that only return flags of 0 or 1 depending on whether the
predicate evaluates to True or False.
In this section, we’ll see examples of how structured and
unstructured indexing can potentially impact performance through
explain plans. We will first look at the optimization of an XQuery
expression that uses ora:view with relational tables.
Optimizing relational data in XQuery
expressions. Let’s look at a tuning example on a similar
query to one we’ve already examined . This query uses ora:view to
access r elational d ata and takes advantage of FLWOR expressions.
The relational tables Employees and Departments from the HR schema
will be joined and department info returned for all employees who
had a commission > .3. Note that the attribute names (such as JOB
in $emp/ROW/COMMISSION_PCT) are case-sensitive.
In the query below, FOR allows us to iterate over row elements
Employees in Departments. Employees rows are bound to variable $emp,
and Departments rows are bound to $dep. WHERE performs a join
between the two tables and selects all employees who had a
commission_pct > .3. RETURN returns the department information.
The query and explain plans are shown below: explain plan for
SELECT XMLQuery('for $emp in ora:view("EMPLOYEES"), $dep in ora:view("DEPARTMENTS")
where $emp/ROW/DEPARTMENT_ID = $dep/ROW/DEPARTMENT_ID
and $emp/ROW/COMMISSION_PCT > .3
return $dep'
RETURNING CONTENT) AS high_commission_employees
FROM DUAL;
QUERY_PLAN OBJECT_NAME COST BYTES LEVEL
SELECT STATEMENT 2 1
SORT 82 2
HASH JOIN 5 328 3
TABLE ACCESS EMPLOYEES 2 104 4
TABLE ACCESS DEPARTMENTS 2 1512 4
FAST DUAL 2 2
As you can see, the access path that the optimizer chose does not
use indexes. Because ora:view allows us to return results from a
relational query as XML elements, we can simply create b-tree
indexes on the relational tables and see if this can improve
performance. Create index emp_idx1 on employees (department_id, commission_pct);
Create index dept_idx1 on departments (department_id);
QUERY_PLAN OBJECT_NAME COST BYTES LEVEL
SELECT STATEMENT 2 1
SORT 82 2
TABLE ACCESS DEPARTMENTS 1 56 3
NESTED LOOPS 2 328 4
INDEX EMP_IDX1 1 104 5
INDEX DEPT_IDX1 0 5
FAST DUAL 2 2
The indexes are now used, and cost decreased very slightly. Our
tuning of this XQuery expression using ora:view is very similar to
our tuning of standard, non-XML SQL. As with standard SQL, rewriting
the query and also using features such as bind variables rather than
hard-coded string values can help improve performance.
Optimizing structured XMLType data. XPath
rewrite can optimize structured (object-relational) storage
techniques. The optimizer can make one other internal performance
improvement: It can change XPath-based functions into relational
statements when the following are true:
- The XMLType data uses a registered XMLSchema. This is another
advantage of using schema-based data.
- Attributes map to underlying SQL (relational) tables.
- Structured storage is used.
This provides the possibility of query rewrite for performance
reasons. It also allows relational performance tuning techniques to
be used, as shown above.
Now, let’s optimize a query on structured, nonrelational XMLType
data. We will start with a very simple example using the
invoicexml_tbl table created earlier. explain plan for
select extract(object_value, '/Invoice/MailAddressTo')
from invoicexml_tbl
where extractValue(object_value, '/Invoice/MailAddressTo/Person')= 'Joe Smith';
The access path is shown below.
QUERY_PLAN OBJECT_NAME COST BYTES LEVEL
SELECT STATEMENT 3 87 1
TABLE ACCESS INVOICEXML_TBL 3 87 2
A function-based index on the predicate used in this query can be
created. The index must use the exact same syntax as the query that
will use it. This index will be created and the SQL explained one
more time to see if the new index is used. Create index invoicexml_tbl_idx1 on invoicexml_tbl
(extractValue(object_value, '/Invoice/MailAddressTo/Person'));
QUERY_PLAN OBJECT_NAME COST BYTES LEVEL
SELECT STATEMENT 1 87 1
TABLE ACCESS INVOICEXML_TBL 1 87 2
INDEX INVOICEXML_TBL_IDX1 1 3
The new function-based index is used, and the cost of the query
has decreased as a result.
Optimizing unstructured XMLType data. We’ll
now perform the exact same test with unstructured XMLType data.
Table invtest_unstruct will be created in exactly the same manner as
we created invoicexml_tbl, except that we will add the “XMLType
store as CLOB” syntax to ensure that this is defined as unstructured
data. The data definition language (DDL), insert, and explain plan
are below : create table invtest_unstruct of XMLtype
XMLType store as CLOB;
Insert into invtest_unstruct values (
XMLType(bfilename('XMLDIR', 'invoicexml.txt'),
nls_charset_id('WE8MSWIN1252') ));
explain plan for
select extract(object_value, '/Invoice/MailAddressTo')
from invtest_unstruct
where extractValue(object_value, '/Invoice/MailAddressTo/Person')='Joe Smith'
/
QUERY_PLAN OBJECT_NAME COST BYTES LEVEL
SELECT STATEMENT 2 2002 1
TABLE ACCESS INVTEST_UNSTRUCT 2 2002 2
We will now create a function-based index to see if it will be
used and if the expected cost has decreased. Note that the “create
index” syntax is the same as the syntax used in the query predicate.
Create index invtest_unstruct_idx1 on invtest_unstruct
(extractValue(object_value, '/Invoice/MailAddressTo/Person'));
QUERY_PLAN OBJECT_NAME COST BYTES LEVEL
SELECT STATEMENT 1 2002 1
TABLE ACCESS INVTEST_UNSTRUCT 1 2002 2
INDEX INVTEST_UNSTRUCT_IDX1 1 3
As you can see, the same index and query can be used effectively
with structured and unstructured data. In general, however, note
that structured data access is generally more efficient than
unstructured access.
Conclusion
Oracle Database has evolved to incorporate XML standards and data
at a rapid pace. Oracle9i Database introduced the XML DB
repository and a new datatype, XMLType, which delivered both LOB and
structured storage options. Oracle XML DB features provide the
ability to perform native XML processing in the database through a
hierarchical model that maps URIs to database objects. XPath
expressions are used to operate on individual elements of a document
through a “logical tree” using the concepts of a node and path.
XMLSchema support was also included with Oracle9i Database.
Oracle Database 10g enhanced XML support with the W3C
XML XQuery language, which includes the XMLQuery() and XMLTable()
functions. The combination of these features facilitates the
interchangeable use of relational data and XML data in Oracle
Database 10g Release 2. As this article demonstrates, SQL
queries can operate on XML data, and XML queries are now able to
access relational data. A key to this is Oracle’s SQL/XML
implementation of the XMLQuery() and XMLTable() functions.
Oracle’s standards-based approach also supports popular protocols
such as FTP, HTTP, and WebDAV to allow common client tools and
applications to access, edit, and publish XML data stored in the
database.
In this article you've learned that the skills you’ve already
acquired while working with relational data can easily be enhanced
to allow you to work with XML using these SQL/XML features. An
important example of this is our ability to enhance performance of
XML data access through Oracle’s indexing, explain, and storage
features. It’s critical that designers, developers, and DBAs keep
up-to-date with this quickly changing technology.
Tim Quinlan [tquinlan@tlqconsulting.com]
is an Oracle certified Database Administrator with experience in
Oracle 7 thru Oracle10g Release 2. He has worked with
Databases since 1981, performing the roles of DBA, architect,
designer and implementer of enterprise-wide Data Warehouse and
transactional Databases. Tim has spoken at many conferences, taught
database courses, and written feature articles for leading database
publications. His main (professional) interest is designing and
implementing very large, high-performance, high-availability
database systems.
|