DBA: XML


	DOWNLOAD

Oracle Database 10g (SE or EE)


	TAGS

xml, dba, sql, All

Getting into SQL/XML

by Tim Quinlan

Get an overview of SQL and XML interoperability and learn how to begin working with XML documents stored in an Oracle database.

Published December 2006

XML data is commonly used in today’s production systems and is a major part of database implementations. In the past, this fact caused design problems for database developers who consequently had to store, query, and update XML data as unstructured LOBs, or alternatively, “shred” data into relational tables and then put it back together. This approach resulted in programming complications and inefficiencies because the access mechanisms were immature.

The Oracle XML DB feature that was first delivered with Oracle9i Database Release 2 provided groundbreaking features for storing, retrieving, and manipulating XML data in the database. Oracle 10g Release 2 XML DB, however, greatly extends this initial approach in which XML data is little more than a BLOB in the database.

If you are a DBA or a developer new to the implementation of XML in Oracle, this article is for you: you will get an overview of SQL and XML interoperability and learn how to begin working with XML documents stored in an Oracle 10g Release 2 database, and about the many features of Oracle Database 10g Release 2 that DBAs and developers should become comfortable with before beginning to work with XML in the database. Thanks to these SQL/XML features, the skills you've already acquired while working with relational data can easily be enhanced to help you work with XML.

What is Oracle’s implementation of XML?

Oracle has implemented XML through the XML DB component of the database, which is a standard feature in Enterprise Edition as well as Standard Edition. Oracle XML DB is easily installed using either the Database Configuration Assistant (DBCA) or by running a catalog script, catqm.sql. The XML DB product provides all the features discussed in this article.

You can verify that XML DB has been installed by simply running the SQL below:

select comp_name from dba_registry where comp_name like '%XML%';	

COMP_NAME		
Oracle XML Database

The XML DB schema and its objects can also be checked to see that Oracle XML DB has been installed.

SQL and XML Working Together

The SQL/XML standard has been under development to provide a mechanism that allows us to generate XML from a relational query and, conversely, provide the ability to deliver SQL data from XML documents. Oracle XML DB in Oracle Database 10g Release 2 implements SQL 2003 and features from the upcoming SQL/XML standard release. The implementation of these standards in Oracle allows us to look inside an XML document in ways that were not possible in older SQL standards.

New datatype. A new datatype, XMLType, was introduced in Oracle9i Database Release 1 to allow an XML document in the database to be accessible in SQL and at the same time allow XML developers the ability to use XML standards on a document. This datatype tells the database that the content is in XML format and allows us to perform queries on an XML document. Using XMLType rather than a relational or CLOB implementation provides a layer of separation between the application and storage model. This separation can allow data to move to a different storage model without being tied to a CLOB or relational model. XMLType can be used to create a table, column, or view. It can also be used as a datatype for parameters and variables.

Built-in XML methods operate on the content of a document by allowing us to create, extract, and index XML data. Indexing can be performed using b-tree, text indexing, and function-based indexes. In effect, XMLType data combined with XPath access can be used to look inside a document. This functionality is provided through PL/SQL and Java APIs. XMLType can be used in PL/SQL, in Java using JDBC, and in the Oracle Data Provider for .Net. This powerful and relatively new datatype will be used extensively throughout this article.

Storage structures. Oracle’s XML implementation gives us the flexibility to store data in two different ways: as structured and unstructured storage. XMLType data is unstructured when stored as a single LOB column and structured when implemented as a set of objects.

One specific example of this is an XMLType table, which can be implemented using either of the two storage models. An XMLType table is unstructured when implemented as a LOB by using the “XMLType store as CLOB” syntax shown below:

create table table_name of XMLtype
XMLType store as CLOB;

Let’s focus on the structured and unstructured storage for XML data and look at these in more detail.

Structured storage. Structured XML storage is implemented as a set of objects. These objects can be implemented in a relational format through tables with referential constraints optionally implemented between the tables. They can also be implemented by using an XMLSchema to decompose an XMLType document into a set of objects.

In the case where relational tables are used, the tables may be designed in advance for this purpose or, alternatively, existing tables can be used. The document is in effect stored as a “virtual” document through the relational tables, with its logical structure maintained. This approach maintains Document Object Model (DOM) fidelity, although it is not a byte-for-byte physical representation of the document. This can be done by creating an XMLType view over existing relational data.

Structured storage has some performance advantages over unstructured storage and can be chosen to provide more query and update optimization through the table and index design. XML operations on structured storage can help to reduce memory and storage because XML tags are not stored and because there is potentially a more-granular level of data retrieval and usage. Index usage with b-tree and function-based indexes is enhanced, and in-place updates can be performed on pieces of the document using XPath rewrite. All of this helps performance and is discussed in more detail later in this article.

There are also some disadvantages to using structured storage. Inserting and retrieving an entire document can take more overhead. There is also limited flexibility, because only documents that match the XMLSchema can be stored—although there are cases where this is actually an advantage. The document is not stored as a byte-for-byte representation of the original, and the order of data in a document is not maintained. However, no data will be lost.

Think of highly structured data as being mostly XML, where each element of the data can be clearly defined.

Unstructured storage. XML can be stored in Oracle in an unstructured manner using CLOB storage such that SQL queries will not know the structure of the data. This data can be stored using Oracle’s XMLType datatype.

Unstructured data may be chosen in cases where you want the stored data to match the exact physical representation of the document. Also, if the data is not updated often or if there is a large percentage of inserts or reads of the entire document (rather than pieces of the document), it may be more efficient to keep the data together in one spot. There are also situations where you need to keep documents flexible, and in these cases an XMLType table or column can be useful.

A disadvantage of unstructured storage is that updates to pieces of a document are generally not as efficient as with structured data. Also, XPath operations that construct DOM from CLOBs using functions can be an expensive use of system resources. Other issues with unstructured data are that SQL constraints cannot be implemented and that memory management is not as efficient.

Think of unstructured data as mostly not XML data and where most or all of the data is simply seen as a single CLOB.

Now that you’ve been introduced to the storage models, let’s take a look at some naming features.

What’s in a Namespace?

This is a large topic and one of the more complex XML DB concepts to understand. In this article, we will cover some namespace concepts to give you a gist of what these are and how they are used.

A namespace is used to describe a set or collection of related attributes or elements in an XML document. Namespaces can be used to ensure that document constructs have a universally unique name. XMLSchema in particular takes advantage of this feature, as a target namespace is usually the same as the URI of the XMLSchema. Examples of namespaces are shown below. Note that the naming looks similar to an internet URL. This naming standard is recommended by the World Wide Web Consortium (W3C) but is not required. The URI is used to uniquely identify the name of a registered XMLSchema in the database and does not need to be the physical URI where a document is located.

XMLType methods and XML functions make use of namespace prefixes. When an XML document has no target namespace, then the namespace prefix is in the noNameSpace namespace. Attribute xsi:noNamespaceSchemaLocation can be used for the schema URI.

If an element does not have another namespace prefix, a namespace can be applied by default when it applies to an element where it is declared.

There are two special namespaces that you will see regularly. These are http://www.w3.org/2001/XMLSchema, which is the overall XMLSchema namespace, and http://xmlns.oracle.com/xdb, which is the Oracle-supplied XML DB namespace. The latter namespace has functions that get written to underlying SQL functions. Some of the XPath functions are also rewritten. Examples are ora:contains, ceiling, floor, not, string-length, substring, and translate. Attributes used by XML DB also belong to this namespace.

When defining an element, we can specify a namespace for that element. In XML, we can define this as

<elementName xmlns:ab="http.name.com" />

As you see, we gave an elementName, which is the element we are working on (for example, “orders”). Xmlns is a reserved word and tells us that this is a namespace definition. The “ab” part is called a prefix. This prefix is a short name that is bound to the URI. In this case, ab is bound to ‘http.name.com’, and ab can be prefixed to an element name.

Below we will create a very small schema and declare a bind prefix xs to http://www.w3.org/2001/XMLSchema and prefix xdb to http://xmlns.oracle.com/xdb . In order to declare the namespace, we use xmlns to bind them. Notice that the xs prefix is used in this case on the schema element names.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           xmlns:xdb= "http://xmlns.oracle.com/xdb" version="1.0">
<xs:element name="INVOICESCHEMA" xdb:defaultTable="INVOICESCHEMA">
   <xs:complexType>
        <xs:sequence>
             <xs:element name="MailAddressTo">
                   <xs:complexType>
                         <xs:sequence> 
                              <xs:element name="Person" type="xs:string"/>
                              <xs:element name="Street" type="xs:string"/>
                         </xs:sequence>  
                       <xs:attribute name="id" type="xs:string" use="required"/>  
                   </xs:complexType>  
             </xs:element>
        </xs:sequence>
  </xs:complexType>
</xs:element>
</xs:schema>

We’ve only touched the surface of namespaces here. Refer to the Oracle XML DB Developer’s Guide 10g Release 2, on Oracle Technology Network (OTN), for more information on this topic.

Working with XML

Getting data into an XMLType table and column. As you may expect, there are many options available to us to get XML data into the database. Data can be inserted with SQL, PL/SQL, Java, and C programs. We can also load data into the database using the SQL*Loader and Import/Export functions of XMLType tables. XML data can also be moved with Oracle Streams and can also use DBMS_XDB functions. Let’s explore one useful and simple way to load XML data.

In this example, an XML document will be loaded from a file into a table using a SQL insert to load the XML document from a bfile. This file needs to be accessed through a database directory. To begin this, we’ll create a directory pointing to a file where the XML document file is stored. Note that to perform the examples in this article, you must ensure that role xdbadmin has been granted to the user you are working with.

create or replace directory xmldir as '/u01/app/oracle/admin/db01/sql';

Now we can create tables that can contain XMLType data in two ways. The first will be a table with a column of XMLType. The second will be a table defined as type XMLType.

1. Create a table with an XML column.

create table invoiceXML_col (
inv_id number primary key,
inv_doc XMLType);

2. Create an XML table.

create table invoiceXML_tbl of XMLtype;

Each of the above statements implicitly creates two indexes in each table—one for the primary key and one for the LOB.

Now we will insert an invoice document into both tables. Note that we are using character set AL32UTF8 and will specify this to pass the character set encoding to be used for the file being read. Data will be inserted into both tables to show the similarity between the SQL statements.

Insert into invoicexml_col values (1, 
XMLType(bfilename('XMLDIR', 'invoicexml.txt'), 
nls_charset_id('AL32UTF8') )); 

Insert into invoicexml_tbl values 
(XMLType(bfilename('XMLDIR', 'invoicexml.txt'),
 nls_charset_id('AL32UTF8')));

We now have two XML documents loaded into the database using an XMLType table and column. It’s as simple as that.

Registering and Implementing Schema-Typed Data

As if dealing with namespaces and storage techniques were not enough, XMLType data can also be implemented as schema-typed or untyped data. An XMLSchema associates a document with an XMLSchema that provides information about the document structure and contents. This provides better documentation, validation, and controls around a document. The information in the schema can be used to deliver more-efficient query and update processing. An XMLSchema also allows an XML document to be stored as structured storage data, because the document can be decomposed into a set of objects through the XMLSchema. The object model used to store the document is determined by the schema definition. XMLType methods schemaValidate() and isSchemaValid() allow editing of an XMLType document using the schema definition.

How do we create a schema and use this with an XML document? To do this, we need to create an XMLSchema definition

Register the schema
Create an XMLSchema-based table
Insert data into it

The example below shows an implementation of these in four steps.

1. First, create a schema definition in file invoiceformtest.xsd in directory XML_DIR. The listing is below:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           xmlns:xdb=http://xmlns.oracle.com/xdb version="1.0">
<xs:element name="INVOICEFORMTEST" xdb:defaultTable="INVOICEFORMTEST">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="MailAddressTo">
            <xs:complexType>
               <xs:sequence> 
                  <xs:element name="Person" type="xs:string"/>
                  <xs:element name="Street" type="xs:string"/>
                  <xs:element name="City" type="xs:string"/>
                  <xs:element name="State" type="xs:string"/>
                  <xs:element name="Zipcode" type="xs:string"/>
               </xs:sequence>    
               <xs:attribute name="id" type="xs:string" use="required"/>  
         </xs:element>
      </xs:sequence>
   </xs:complexType>
</xs:element>
</xs:schema>

2. Register the above schema in the database using procedure DBMS_XMLSCHEMA.registerSchema. The username you are using to connect to the database will require that the alter session privilege be granted to it so that the schema can be successfully registered.

BEGIN
  DBMS_XMLSCHEMA.registerSchema(
    SCHEMAURL => 'http://xmlns.oracle.com/xdb/invoiceformtest.xsd',
    SCHEMADOC => bfilename('XMLDIR','invoiceformtest.xsd'),
    CSID => nls_charset_id('AL32UTF8'));
END;
/

To delete the schema, simply run the DBMS_XMLSCHEMA.deleteSchema statement, as shown below:

BEGIN
  DBMS_XMLSCHEMA.deleteSchema(
  SCHEMAURL => 'http://xmlns.oracle.com/xdb/invoiceformtest.xsd',
  DELETE_OPTION => dbms_xmlschema.DELETE_CASCADE_FORCE);
END;
/

3. The above statement also creates table invoiceformtest because of the xdb:defaultTable="INVOICEFORMTEST" statement in the schema definition above. Without this, a generated name (one that you would not want to work with) would have been created.

4. Now we are ready to enter documents into the XMLSchema-based table from an XML document that is stored in XML_DIR and is in the format specified by the implemented schema.

Insert into invoiceformtest values 
(XMLType(bfilename('XMLDIR', 'invoiceformtest.txt'),
 nls_charset_id(' AL32UTF8')));

Let’s look at what we’ve just created using the object_value pseudocolumn on the XML document.

SQL> select object_value from invoiceformtest;

OBJECT_VALUE
<INVOICEFORMTEST>
     <MailAddressTo id="1">
            <Person>Joe Smith</Person>
            <Street>10 Apple Tree Lane</Street>
            <City>New York</City>
            <State>NY</State>
            <Zipcode>12345</Zipcode>
     </MailAddressTo>
</INVOICEFORMTEST>

We’re done!

Understanding XQuery Functions and Expressions

XQuery is a standard developed by the W3C for a query language to extract information from an XML file It allows us to access physical XML documents or relational data that is virtualized as XML documents through XML views. The expressions used can be simple queries or part of a larger query, and standard functions can be included such as date/time, arithmetic, or string functions as well as user-defined functions. XQuery is new in Oracle Database 10g Release 2 and is supported in XMLQuery() and XMLTable() functions. Examples using these two functions are shown later, in the “XMLQuery() and XMLTable()” section.

XQuery makes use of XPath expressions (described below) to locate the detailed items in XML documents. This can be thought of as the foundation for the XMLQuery() and XMLTable() functions introduced in Oracle Database 10g Release 2 and covered later in this article.

Some of the more important XQuery expressions include XPath, XMLSequence, and FLWOR and are discussed below.

XPath, XMLSequence, and FLWOR Expressions

XPath. XPath provides a way to locate items in an XML document using addressing techniques, and it processes these items using a logical path through a document structure. It allows a programmer to deal with a document at a higher level of abstraction by specifying a route through a document rather than pointing at specific elements. XPath uses the concept of a node that defines where the path begins and then a “logical tree” that includes relationships such as attribute, self, parent, child, and ancestor. In other words, XPath models an XML document as a tree of nodes. There are different types of nodes, such as element nodes, attribute nodes, and text nodes. XPath can determine a way to calculate a string value for each node.

XPath expressions can be used to query and update XML documents in a standards-based way. We’ll use the extract, extractValue, existsNode, and XMLSequence functions to demonstrate XPath functionality using the invoicexml_tbl document we’ve already created and into which we’ve already inserted data.

Query the document to be used in XPath examples. Let’s take a look at searching XML (XMLType) data with SQL using XPath functions. To do this, we’ll look at the document we’ll be working with using the object_value pseudocolumn to retrieve the XML document from the XMLType table.

SQL> select object_value from invoicexml_tbl;

OBJECT_VALUE
<Invoice>
    <MailAddressTo id="PA">
        <Person>Joe Smith</Person>
        <Street>10 Apple Tree Lane</Street>
        <City>New York</City>
        <State>NY</State>
        <Zipcode>12345</Zipcode>
    </MailAddressTo>
    <MailAddressFrom id="PA">
        <Person>Ed Jones</Person>
        <Street>11 Cherry Lane</Street>
        <City>Newark</City>
        <State>NJ</State>
        <Zipcode>67890</Zipcode>
    </MailAddressFrom>
    <Details id="2006Sept1to30PA">
        <FromTo>Sept 1, 2006 to Sept 30, 2006</FromTo>
	<Hours>70</Hours>
        <Rate>30</Rate>
  	<Taxes>210</Taxes>
	<TotalDue>2310</TotalDue>
        <InvDate>Oct 1, 2006</InvDate>
        <Contractor>Ed Jones</Contractor>
    </Details>
</Invoice>

Using extract. Using extract, we can select an individual node and its leaf nodes from the document by combining extract with object_value. In other words, we can look inside an XML document that’s been stored as XMLType. This is true whether we’ve used structured or unstructured data and also whether or not the data is schema-based. Let’s extract the MailAddressTo node and its leaf nodes.

select extract(object_value, '/Invoice/MailAddressTo') from invoicexml_tbl;

EXTRACT(OBJECT_VALUE,'/INVOICE/MAILADDRESSTO')
<MailAddressTo id="PA"><Person>Joe Smith</Person><Street>10
Apple Tree Lane</Street><City>New York</City><State>NY</Stat
e><Zipcode>12345</Zipcode></MailAddressTo>

As you can see, the output includes the MailAddressTo portion of the document and is not pretty-printed (formatted). Also, the syntax used to accomplish this is very simple. What’s important is that we were able to look inside the document without dumping the entire thing.

Using extractValue. The data value that exists in a leaf node can be extracted using extractValue. A higher - level node such as MailAddressTo cannot be extracted using this function. Note that the output of this is not in XML-syntax format and contains simply the data value.

select extractValue(object_value, '/Invoice/MailAddressTo/Person')
 Person from invoicexml_tbl;

PERSON
Joe Smith

Using existsNode. ExistsNode is used in a similar fashion, to search for specific values at the node level (and only the node level) of a document. It returns a True or False flag to specify whether a search was successful. The = 1 predicate is not a count but rather represents a True condition, and = 0 is False.

Select count(*) from invoicexml_tbl
where existsNode(
object_value, '/Invoice/MailAddressTo[Person="Joe Smith"]') = 1;

COUNT(*)
      1

Using XMLSequence. Unlike extractValue, which can only extract a value from a single node, XMLSequence can be used to look at multiple nodes or a fragment of a document. It does this by creating a virtual table that consists of XMLType objects. Let’s compare extractValue with XMLSequence using the MailAddressTo branch node.

select extractValue(object_value, '/Invoice/MailAddressTo') from invoicexml_tbl;

from invoicexml_tbl
     *
ERROR at line 2:
ORA-19025: EXTRACTVALUE returns value of only one node

This ORA-19025 message is self-explanatory. We can fortunately overcome this by restructuring the query and using XMLSequence, as shown below:

select value(addr) 
from invoicexml_tbl i,
    table(XMLSequence(
    extract(i.object_value, '/Invoice/MailAddressTo'))) addr
where existsNode(i.object_value, '/Invoice/Details[@id="2006Sept1to30PA"]') = 1;

VALUE(ADDR)
<MailAddressTo id="PA"><Person>Joe Smith</Person><Street>10
Apple Tree Lane</Street><City>New York</City><State>NY</Stat
e><Zipcode>12345</Zipcode></MailAddressTo>

Because XMLSequence creates a virtual table, we can also use this function on a leaf node.

select value(person)
from invoicexml_tbl i,
    table(XMLSequence(
    extract(i.object_value, '/Invoice/MailAddressTo/Person'))) person
where existsnode(i.object_value, '/Invoice/Details[@id="2006Sept1to30PA"]') = 1;

VALUE(PERSON)
<Person>Joe Smith</Person>

FLWOR. FOR, LET, WHERE, ORDER BY, and RETURN (FLWOR; pronounced “flower”) is one of the most important and powerful expressions in XQuery syntax. Either FOR or LET must exist, WHERE and ORDER BY are optional, and RETURN is mandatory.FLWOR on its own is a large topic to cover. This section gives you an introduction to the power of this statement.

FOR binds one or more variables in an iterative manner, in the order that the variables are listed. Values listed previously can then be used in a subsequent set of values. This also works similarly to a SQL From clause. Values listed previously can then be used in a subsequent set of values, as shown below:

For $var in (1,2,3) , $varPlus5 in (5+$var, 5+$var, 5+$var)

The three iterations set $var and $varPlus5 to 1,6; 2,7; and 3,8, respectively.

LET, like FOR, binds variables iteratively, and values can be computed using previously calculated values. As with FOR, LET can be thought of as a SQL FROM clause. LET can also be used to perform joins.

WHERE filters data in the same manner as a SQL WHERE clause.

ORDER BY optionally sorts the data.

RETURN returns the final result set from the filtered and ordered FLWOR expression.

FLWOR being used with XMLQuery(). Let’s look at an example where we query and join two documents: partys.xml to an orders.xml document on the Party key. This XML data is in the Oracle XML DB Repository. To do this, we will use XMLQuery(); FLWOR; and XQuery functions doc, count, avg, and integer. These are in the namespace for built-in XQuery functions, http://www.w3.org/2003/11/xpath_functions.

The query below reads as follows: Using function fn:doc, FOR all partyno attributes in partys.xml, join (LET) all order elements in orders.xml that match on partyno (variable $p was bound in the FOR statement). This produces a stream of items ($p and $o), with $p representing a party number and $o a set of orders for that party. Get the items WHERE there is more than 1 order. ORDER BY the average amt descending using XQuery function avg in namespace fn. Amt is attached to order element $o. Return the party number (bound to $p) and child element ordercount.

SELECT XMLQuery()('for $p in fn:doc("/public/partys.xml")/partys/party/@partyno
                 let $o := fn:doc("/public/orders.xml")/orders/order[@partyno = $p]
                 where fn:count($o) > 1
                 order by fn:avg($o/@amt) descending
                 return <big-party>{$p,
                              <ordercount>{fn:count($o)}</ordercount>,
                              <avgamt>{xs:integer(fn:avg($o/@amt))}</avgamt>}
                   </big-party>'
                RETURNING CONTENT) ORDERS FROM DUAL;


ORDERS
<big-party>1111<ordercount>2</ordercount><avgamt>3500</avgamt></big-party>

Yes, this query is doing a lot and shows some of what’s possible when using FLWOR expressions on XML documents.

XQuery ora: Functions

Oracle XML DB provides five more XQuery functions that developers can use. These are implemented in the http://xmlns.oracle.com/xdb namespace, which uses the prefix ora:. Along with these are two other XPath functions. These are not yet part of the SQL/XML standard but are expected to be in the future. The ora:view function is a particularly valuable one that can be used to transform relational data into XML format. All of these are described below.

ora:view XQuery function. This is a valuable function to create a view over relational tables. This can make them appear as if they are XML documents.

Syntax: 	ora:view ([db-schema STRING, ] db_table STRING)
		RETURNS document-node (element())

Examples that use ora:view are shown later in the article.

ora:contains XQuery function. The namespace prefix “ora:” and name “contains” are used together to create an XPath function that can be used as an XPath argument to the SQL/XML existsNode, extract, and extractValue functions.This allows you to perform a structural search with a text predicate.

This function returns a positive number if text_query matches the input_text. If they do not match, then 0 is returned. When they are used with existsNode, extract, or extractValue, you need to include the namespace mapping parameter xmlns:ora="http://xmlns.oracle.com/xdb". The syntax for ora:contains is shown below. The owner is the current user in cases where a policy_owner is not defined. The policy_name is the name of the matching rules that will be applied. The default matching rules are defined by ctxsys.default_policy_oracontains.

Syntax: ora:contains (input_text, text_query [, policy_name] [, policy_owner])

The example below shows how we can use ora:contains as an argument to existsNode.

SELECT count(*) FROM invoicexml_tbl
  WHERE existsNode(object_value, '/Invoice/MailAddressTo/Person
                     [ora:contains(text(), "Joe Smith") > 0]', 
                   'xmlns:ora="http://xmlns.oracle.com/xdb"') = 1;

COUNT(*)
      1

ora:matches XQuery function. This function uses regular expressions to match text. This is similar to the SQL REGEXP_LIKE condition except that it uses XQuery arguments instead of SQL datatypes. True is returned if the target_string matches the regular expression match_pattern ; if not, False is returned. As you can see in the syntax below, match_parameter can be added to specify additional criteria to use in a search. An example of this is to provide case sensitivity.

Syntax: ora:matches (target_string, match_pattern [, match_parameter])

ora:replace XQuery function. This extends the ora:matches function by replacing the target_string with the replace_string if the match_pattern is met ( True ). As with ora:matches, this uses regular expressions to match the text.

Syntax: ora:replace (target_string, match_pattern,
 replace_string [, match_parameter])

ora:sqrt XQuery function. As you may expect from the name, this returns the square root value of the provided number.

Syntax: ora:sqrt (number)

XPath extension functions: ora:instanceof and ora:instanceof-only. Oracle XML DB can support schema-based data when the datatypes of attributes and elements are known. Because XPath 1.0 is not aware of datatype information, there are XML DB extension functions in namespace http://xmlns.oracle.com/xdb that allow you to restrict an XML document node to a specific datatype. Function ora:instanceof can be used to restrict a node to a datatype or subtype, and ora:instanceof-only restricts a node to the datatype only. A subtype is a feature than can be used to extend or restrict a type.

Syntax: ora:instanceof(nodeset-expr, typename [, schema-url])

Syntax: ora:instanceof-only(nodeset-expr, typename [, schema-url])

In the above syntax, nodeset-expr is usually a relative XPath expression. This function returns True if the datatype of any node matches typename and False if it does not. Typename can be qualified with a namespace prefix.

T hese functions will return False for non-schema-based data because it does not have a datatype associated with elements and attributes .

Querying XML Data in the XML DB Repository Using fn:doc and fn:collection Functions

There are two important XQuery functions that can be used to query all of the resources in the XML DB Repository. Fn:doc is an XQuery function that can obtain a repository file containing XML data. This file resource is pointed at by its URI argument. XQuery variables can be bound to data using the FLWOR expressions FOR and LET. XQuery function fn:doc can be used to read a single XML document stored in the XML DB repository.

The second XQuery function that queries resources in the repository is fn:collection. This can return a number of similar documents stored in the same folder in the repository.

Let’s create repository resources to illustrate how fn:doc and fn:collection work with some simple examples. We will start by creating a resource using the DBMS_XDB PL/SQL package. This can be used to manage resources in XML DB. The createResource procedure is used in this case to create new file resources that contain the Orders and Partys XML documents.

Create the orders and partys resource names used in the examples below. There is one additional resource created named ordersnamespace.xml, which shows an example of a namespace being used.

DECLARE
  res BOOLEAN;
  ordersxmlstring VARCHAR2(500):= 
    '<?xml version="1.0"?>
     <orders>
       <order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
       <order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
       <order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
       <order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
     </orders>';
  partysxmlstring VARCHAR2(500):=
    '<?xml version="1.0"?>
     <partys>
       <party partyno="1111" partyname="ABC Corp" partycity="Toronto"/>
       <party partyno="2222" partyname="Freds Inc" partycity="Chicago"/>
       <party partyno="3333" partyname="Gofaster Corp" partycity="Montreal"/>
     </partys>';
  ordersxmlnsstring VARCHAR2(500):=
    '<?xml version="1.0"?>
     <orders xmlns="http://order.com">
       <order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
       <order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
       <order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
       <order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
     </orders>';
BEGIN
  res := DBMS_XDB.createResource('/public/orders.xml',   ordersxmlstring);
  res := DBMS_XDB.createResource('/public/partys.xml',  partysxmlstring);
  res := DBMS_XDB.createResource('/public/ordersnamespace.xml', ordersxmlnsstring);
END;

We can see the resources we just created by querying the resource_view, as shown below:

SQL>  select any_path, res from resource_view where any_path like '%partys%';
ANY_PATH
RES
/public/partys.xml
<Resource xmlns="http://xmlns.oracle.com/xdb/XDBResource.xsd">
  <CreationDate>2006-06-19T17:12:00.414000</CreationDate>
  <ModificationDate>2006-06-19T17:12:00.414000</ModificationDate>
  <DisplayName>partys.xml</DisplayName>
  <Language>en-US</Language>
  <CharacterSet>WINDOWS-1252</CharacterSet>
  <ContentType>text/xml</ContentType>
  <RefCount>1</RefCount>
</Resource>

Next, display a single partys.xml document using fn:doc.

SELECT XMLQuery('for $p in fn:doc("/public/partys.xml") 
                 return $p'
                RETURNING CONTENT) partys FROM DUAL;

PARTYS
---------------------------------------------------------------------
<partys><party partyno="1111" partyname="ABC Corp" 
partycity="Toronto"/><party partyno="2222" partyname="Freds Inc" 
partycity="Chicago"/> <party partyno="3333" partyname="Gofaster Corp"
partycity="Montreal"/></partys>

Fn:collection can be used a little differently than fn:doc by returning all of the documents stored in the same folder in the repository. The output produced from this statement was not pretty-printed, and the formatting below was done manually to make the document more readable.

SELECT XMLQuery('for $p in fn:collection("/public") 
                 return $p'
                RETURNING CONTENT) collection_public FROM DUAL;

COLLECTION_PUBLIC
----------------------------------------------------------------------
<orders>
   <order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
   <order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
   <order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
   <order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
</orders>

<orders xmlns="http://order.com">
   <order orderno="15" partyno="1111" itemname="Widget" amt="5000"/>
   <order orderno="25" partyno="1111" itemname="Do dad" amt="2000"/>
   <order orderno="35" partyno="2222" itemname="All purpose item" amt="7000"/>
   <order orderno="45" partyno="3333" itemname="The best thing" amt="15000"/>
</orders>

<partys>
   <party partyno="1111" partyname="ABC Corp" partycity="Toronto"/>
   <party partyno="2222" partyname="Freds Inc" partycity="Chicago"/>
   <party partyno="3333" partyname="Gofaster Corp" partycity="Montreal"/>
</partys>

There are many other DBMS_XDB procedures and functions that can be used to manage all XML DB resources, such as providing the ability to delete resources and create folders. These can be found in the Oracle XML DB Developer’s Guide 10g Release 2, on OTN.

XMLQuery() and XMLTable()

The XMLQuery() and XMLTable() functions were introduced in Oracle Database 10g Release 2. They provide a powerful interface between SQL and XML, allowing us to query, construct, and transform relational data as if it’s XML and XML data as if it’s relational. In general, we use XMLQuery() to generate an XML document from relational data and XMLTable() to create relational views from XML data. These are not yet part of ANSISQL but are expected to be included in a future standard.

XMLQuery() function. This is used to construct XML data and query it using XQuery. It enables the execution of XQuery expressions in a SQL context. We can also query relational data by creating a dynamic XML view over relational data using ora:view and are able to operate on parts of an XML document rather than on the entire document.

XMLQuery() takes an XQuery expression as a string literal and an optional XQuery “context item” as a SQL expression. This expression sets the XPath context in which the XQuery expression will be evaluated. The function returns the result of the XQuery expression as an XMLType instance. The XMLQuery() function can also take in SQL expressions as arguments, where the values are bound to XQuery variables when the expression is evaluated. The results are returned as an XMLType instance.

Let’s look at two examples where we use XMLQuery() on relational and XML data.

• Example combining XMLQuery() with ora:view and FLWOR expressions. In the first example, we will use XMLQuery() on physical relational tables. XMLQuery() must act on XML data, and this is accomplished using the ora:view feature on relational tables Employee and Department in the HR schema. Once ora:view is used on both relational tables, they appear as XML and we are able to then use XQuery expressions, including a nested FLWOR expression.

The query below is doing the following: For each department, get the department id, and for all employees that match the department id with a commission greater than 30 percent, return the employee first and last name. Note that the FOR expression is used twice in this query.

SELECT XMLQuery(
         'FOR $dep in ora:view("DEPARTMENTS")/ROW
          RETURN <Department id="{$dep/DEPARTMENT_ID}">
                 <Employee>
                   {FOR $emp in ora:view("EMPLOYEES")/ROW 
                    WHERE $emp/DEPARTMENT_ID eq $dep/DEPARTMENT_ID
                      and $emp/COMMISSION_PCT > .3 
                    RETURN ($emp/FIRST_NAME, $emp/LAST_NAME)}
                </Employee>
                </Department>'
          RETURNING CONTENT) HIGH_COMMISSION_EMP_NAMES FROM DUAL;

HIGH_COMMISSION_EMP_NAMES
---------------------------------------------------------------
<Department id="10"><Employee></Employee></Department><Department 
id="20"><Employee></Employee></Department><Department 
id="30"><Employee></Employee></Department><Department 
id="40"><Employee></Employee></Department><Department 
id="50"><Employee></Employee></Department><Department 
id="60"><Employee></Employee></Department>

The result, as you can see, is not pretty-printed. The query syntax is slightly complex but shows that we have the ability to transform relational data into XML data with ora:view and that we can perform a join operation on XML documents and apply predicates on the data inside the documents.

• Example using XMLQuery() with an XMLType column and FLWOR expressions. Table invoicexml_col, which we created above and into which we inserted a document, contains an XMLType column inv_doc. The Invoice data that is stored in this column is in XML format. In this query, we will pass XMLType column inv_doc to XQuery using the XMLQuery() function with the PASSING clause. Notice how we can return some specific fields from the Invoice document based on WHERE predicates. In effect, we are able to get inside the XML document and retrieve specific fields based on predicates rather than only being able to see this document as a CLOB.

The SELECT statement below applies to all of the rows of invoicexml_col. We then iterate across all the invoice rows with the FOR statement. The WHERE predicate picks out the elements from Zipcode 12345, and we then RETURN the City, State, and Zipcode and whether the City and State are entered correctly for this Zipcode. An IF…THEN…ELSE construct shows some additional power to this syntax.

Select XMLQuery(
         'FOR $i in /Invoice 
          WHERE  $i/MailAddressTo/Zipcode = 12345 
          RETURN <Details>
                   <Zipcode num="{$i/MailAddressTo/Zipcode}"/>
                   <CityName char="{$i/MailAddressTo/City}"/>
                   <City>{IF ($i/MailAddressTo/City = "New York") 
                          THEN "Correct City" 
                          ELSE "Incorrect City"}
                   </City>
                   <State>{if ($i/MailAddressTo/State = "NY") 
				   then "Correct State" else "Incorrect State"}
                   </State>
                 </Details>'
         PASSING inv_doc RETURNING CONTENT) ny_invoice
FROM invoicexml_col;

NY_INVOICE
<Details><Zipcode num="12345"></Zipcode>
<CityName char="New York"></CityName>
<City>Correct City</City>
<State>Correct State</State>
</Details>

Both queries illustrate the power we have to look inside and process XML documents in a detailed and piece-wise manner. The functionality provided is similar to what we’ve been able to do with relational data—impressive!

XMLTable() function. This function enables an XML value to be interpreted as a table or a set. It’s used to return a table and columns from the evaluation of XQuery expressions, rather than returning a sequence as XQuery would normally do. XMLType data can be queried and the XML results split or shredded into relational format—think of this as creating a virtual table. The virtual table can then be used to insert data into other tables or can be queried. Relational views can also be constructed over XML data. The XMLTable() function can also be used in a SQL From clause.

• Example using XMLTable() with the COLUMNS clause. We will use our invoicexml_col table, introduced earlier, to illustrate how XMLTable() can be used to transform XML data into relational format. In the example below, XMLTable() accesses the Invoice document that is stored in column inv_doc. The paths of the data elements we want are mapped to new column names and formats using the COLUMNS clause.The XMLTable() function returns the data as a virtual table, and the result of this query is the same as if we had queried a relational table. Note the use of the WHERE clause at the bottom of the query that allows us to filter XML data in the exact same way we would with any SQL query written for relational data.

The query and output are shown below:

SELECT inv_id, a.PersonName, a.StreetName, a.CityName, a.State, a.Zipcode
FROM invoicexml_col,  
     XMLTABLE('/Invoice'
     PASSING invoicexml_col.inv_doc
     COLUMNS 
        PersonName varchar2(10) PATH '/Invoice/MailAddressTo/Person',
        StreetName varchar2(20) PATH '/Invoice/MailAddressTo/Street',
        CityName varchar2(10) PATH '/Invoice/MailAddressTo/City',
        State varchar2(5) PATH '/Invoice/MailAddressTo/State',
        Zipcode varchar2(7) PATH '/Invoice/MailAddressTo/Zipcode'
      ) a
WHERE a.CityName like 'New%';

INV_ID	PERSONNAME	STREETNAME	     CITYNAME	STATE 	ZIPCODE
    1	Joe Smith  	10 Apple Tree Lane      New York	 NY	 12345

Getting XML to Perform

We’ve already seen that we can perform queries that look inside XML documents. The next step is to think about ways to make these queries perform quickly. XML queries can be explained and indexes created to help improve the performance of XML data access. In particular, we can improve the performance of XPath functions in a manner similar to the way we’ve tuned SQL in the past. Also, as with SQL tuning, there are some situations where restructuring XML queries can also help to change and improve access paths.

Function-based indexes can be used on structured and unstructured XMLType tables whether they are schema-based or not. We may also want to take advantage of binary indexes for functions such as existsNode that only return flags of 0 or 1 depending on whether the predicate evaluates to True or False.

In this section, we’ll see examples of how structured and unstructured indexing can potentially impact performance through explain plans. We will first look at the optimization of an XQuery expression that uses ora:view with relational tables.

Optimizing relational data in XQuery expressions. Let’s look at a tuning example on a similar query to one we’ve already examined . This query uses ora:view to access r elational d ata and takes advantage of FLWOR expressions. The relational tables Employees and Departments from the HR schema will be joined and department info returned for all employees who had a commission > .3. Note that the attribute names (such as JOB in $emp/ROW/COMMISSION_PCT) are case-sensitive.

In the query below, FOR allows us to iterate over row elements Employees in Departments. Employees rows are bound to variable $emp, and Departments rows are bound to $dep. WHERE performs a join between the two tables and selects all employees who had a commission_pct > .3. RETURN returns the department information.

The query and explain plans are shown below:

explain plan for 
SELECT XMLQuery('for $emp in ora:view("EMPLOYEES"), $dep in ora:view("DEPARTMENTS")
                 where $emp/ROW/DEPARTMENT_ID = $dep/ROW/DEPARTMENT_ID
                   and $emp/ROW/COMMISSION_PCT > .3
                 return $dep'   
                RETURNING CONTENT) AS high_commission_employees
FROM DUAL;

QUERY_PLAN              OBJECT_NAME	         COST     BYTES	LEVEL
SELECT STATEMENT                               2                  1
  SORT                                                 82         2
    HASH JOIN                                  5       328        3
      TABLE ACCESS      EMPLOYEES              2       104        4
      TABLE ACCESS      DEPARTMENTS            2       1512       4
  FAST DUAL                                    2                  2

As you can see, the access path that the optimizer chose does not use indexes. Because ora:view allows us to return results from a relational query as XML elements, we can simply create b-tree indexes on the relational tables and see if this can improve performance.

Create index emp_idx1 on employees (department_id, commission_pct);
Create index dept_idx1 on departments (department_id);

QUERY_PLAN          OBJECT_NAME		COST     BYTES	LEVEL
SELECT STATEMENT                               2                  1
  SORT                                                 82         2
    TABLE ACCESS    DEPARTMENTS                1       56         3
    NESTED LOOPS                               2       328        4
    INDEX           EMP_IDX1                   1       104        5
    INDEX           DEPT_IDX1                  0                  5
  FAST DUAL                                    2                  2

The indexes are now used, and cost decreased very slightly. Our tuning of this XQuery expression using ora:view is very similar to our tuning of standard, non-XML SQL. As with standard SQL, rewriting the query and also using features such as bind variables rather than hard-coded string values can help improve performance.

Optimizing structured XMLType data. XPath rewrite can optimize structured (object-relational) storage techniques. The optimizer can make one other internal performance improvement: It can change XPath-based functions into relational statements when the following are true:

The XMLType data uses a registered XMLSchema. This is another advantage of using schema-based data.
Attributes map to underlying SQL (relational) tables.
Structured storage is used.

This provides the possibility of query rewrite for performance reasons. It also allows relational performance tuning techniques to be used, as shown above.

Now, let’s optimize a query on structured, nonrelational XMLType data. We will start with a very simple example using the invoicexml_tbl table created earlier.

explain plan for 
select extract(object_value, '/Invoice/MailAddressTo')
from invoicexml_tbl
where extractValue(object_value, '/Invoice/MailAddressTo/Person')= 'Joe Smith';

The access path is shown below.


QUERY_PLAN        OBJECT_NAME          COST      BYTES	LEVEL
SELECT STATEMENT                        3        87       1
  TABLE ACCESS    INVOICEXML_TBL        3        87       2

A function-based index on the predicate used in this query can be created. The index must use the exact same syntax as the query that will use it. This index will be created and the SQL explained one more time to see if the new index is used.

Create index invoicexml_tbl_idx1 on invoicexml_tbl
(extractValue(object_value, '/Invoice/MailAddressTo/Person'));

QUERY_PLAN        OBJECT_NAME           COST    BYTES	LEVEL
SELECT STATEMENT                         1      87        1
  TABLE ACCESS    INVOICEXML_TBL         1      87        2
  INDEX           INVOICEXML_TBL_IDX1    1                3

The new function-based index is used, and the cost of the query has decreased as a result.

Optimizing unstructured XMLType data.
We’ll now perform the exact same test with unstructured XMLType data. Table invtest_unstruct will be created in exactly the same manner as we created invoicexml_tbl, except that we will add the “XMLType store as CLOB” syntax to ensure that this is defined as unstructured data. The data definition language (DDL), insert, and explain plan are below :

create table invtest_unstruct of XMLtype 
XMLType store as CLOB;

Insert into invtest_unstruct values ( 
XMLType(bfilename('XMLDIR', 'invoicexml.txt'), 
nls_charset_id('WE8MSWIN1252') )); 

explain plan for 
select extract(object_value, '/Invoice/MailAddressTo')
from invtest_unstruct
where extractValue(object_value, '/Invoice/MailAddressTo/Person')='Joe Smith'
/

QUERY_PLAN            OBJECT_NAME      	COST     BYTES	LEVEL
SELECT STATEMENT                               2       2002       1
  TABLE ACCESS        INVTEST_UNSTRUCT         2       2002       2

We will now create a function-based index to see if it will be used and if the expected cost has decreased. Note that the “create index” syntax is the same as the syntax used in the query predicate.

Create index invtest_unstruct_idx1 on invtest_unstruct
(extractValue(object_value, '/Invoice/MailAddressTo/Person'));

QUERY_PLAN           OBJECT_NAME              COST     BYTES 	LEVEL
SELECT STATEMENT                               1       2002	  1
  TABLE ACCESS       INVTEST_UNSTRUCT          1       2002   2
  INDEX              INVTEST_UNSTRUCT_IDX1     1           	  3

As you can see, the same index and query can be used effectively with structured and unstructured data. In general, however, note that structured data access is generally more efficient than unstructured access.

Conclusion

Oracle Database has evolved to incorporate XML standards and data at a rapid pace. Oracle9i Database introduced the XML DB repository and a new datatype, XMLType, which delivered both LOB and structured storage options. Oracle XML DB features provide the ability to perform native XML processing in the database through a hierarchical model that maps URIs to database objects. XPath expressions are used to operate on individual elements of a document through a “logical tree” using the concepts of a node and path. XMLSchema support was also included with Oracle9i Database.

Oracle Database 10g enhanced XML support with the W3C XML XQuery language, which includes the XMLQuery() and XMLTable() functions. The combination of these features facilitates the interchangeable use of relational data and XML data in Oracle Database 10g Release 2. As this article demonstrates, SQL queries can operate on XML data, and XML queries are now able to access relational data. A key to this is Oracle’s SQL/XML implementation of the XMLQuery() and XMLTable() functions.

Oracle’s standards-based approach also supports popular protocols such as FTP, HTTP, and WebDAV to allow common client tools and applications to access, edit, and publish XML data stored in the database.

In this article you've learned that the skills you’ve already acquired while working with relational data can easily be enhanced to allow you to work with XML using these SQL/XML features. An important example of this is our ability to enhance performance of XML data access through Oracle’s indexing, explain, and storage features. It’s critical that designers, developers, and DBAs keep up-to-date with this quickly changing technology.

Tim Quinlan [tquinlan@tlqconsulting.com] is an Oracle certified Database Administrator with experience in Oracle 7 thru Oracle10g Release 2. He has worked with Databases since 1981, performing the roles of DBA, architect, designer and implementer of enterprise-wide Data Warehouse and transactional Databases. Tim has spoken at many conferences, taught database courses, and written feature articles for leading database publications. His main (professional) interest is designing and implementing very large, high-performance, high-availability database systems.