Laboratory Training 3

Working with XML Documents

1 Training Assignment

1.1 Individual Task

Design and implement classes for the representation of entities given in an individual assignment of Laboratory training #5 of the course "Fundamentals of Programming (Second Part)". You need to create two derived classes from the class that represents the main entity. The first class should support reading data from a pre-generated XML document, storing data in structures that are automatically created using data binding technology, and writing data to another XML document after sorting. To avoid duplication of data in the program, you must also redefine the class representing the second entity. The second derived class should implement XML serialization of previously created objects, deserialization, as well as sorting and serialization into another file. Derived classes must implement a common interface created by performing the individual task of Laboratory training # 1.

An additional console output should be provided.

1.2 Using SAX and DOM Technologies

Prepare an XML document with data about the students of the academic group. Using SAX technology read data from an XML document and display data on the console. Using DOM technology, read data from the same XML document, modify the data and write it to a new document.

1.3 Implementation of XML Serialization and Deserialization

Create classes Student and AcademicGroup (with an array of students as a field). Create objects, implement their XML serialization and deserialization.

2 Instructions

2.1 Common Concepts of Using XML Language

XML (eXtensible Markup Language) is a platform-independent way of structuring information. Because XML separates the content of a document from its structure, it is successfully used for the exchange of information. For example, XML may be used to transfer data between the application and the database or between databases having different formats.

XML documents are always text files. The syntax of XML is similar to the syntax of HTML, which is used for marking up texts published over the Internet. XML language can also be applied directly to the markup text.

Most often, an XML document begins with a so-called prefix. The prefix for the document in general is as follows:

<?xml version="1.0" [other-attributes] ?>

Among the possible attributes, the encoding = "character-set" attribute is most useful. It specifies the encoding for the text. If you want to use non-UNICODE Cyrillic characters, you can define it, for example, in the following way:

<?xml version="1.0" encoding="Windows-1251"?>

The next line may contain information about document type. The rest of document contains a set of XML elements. Elements are delimited by tags. Start tags begin with < followed by the element name. End tags begin with </ followed by the element name. Both start and end-tags terminate with >. Everything in between the two tags is the content of the element. All start tags must be matched by end tags. All attribute values must be quoted. Each document must contain the only root element in which all other elements are inserted.

Unlike HTML, XML allows you to use an unlimited set of tag pairs, each of which represents not what the data it contains should look like, but what it means. XML allows you to create your own set of tags for each class of documents. Thus, it is more accurate to call it not a language, but a meta-language.

Having formally described document structure, you can check its correctness. The presence of markup tags allows both the person and the program to analyze the document. XML documents, in the first place, are intended for software analysis of their contents.

The following XML-document stores prime numbers.

<?xml version="1.0" encoding="UTF-8"?>
    <Numbers>
    <Number>1</Number>
    <Number>2</Number>
    <Number>3</Number>
    <Number>5</Number>
    <Number>7</Number>
    <Number>11</Number>
</Numbers>

The Numbers and Number tags are invented by author. Text indents are used for better readability.

Tags may contain attributes - additional information about the elements contained inside the corner brackets. Values of attributes must be taken with quotation marks. The followings example shows Message tag with to and from attributes:

<Message to="you" from="me">
    <Text>
        How to use XML?
    </Text>
</Message>

Use of end-tags is obligatory in XML. Furthermore, you must close inner elements before outer ones. Such code snippet produces an error:

<A> <B> text </A> </B>

And such fragment is correct:

<A> <B> text </B> </A>

Tags can be empty. Such tags end with backslash symbol. For example, you can write <Nothing/> instead of <Nothing></Nothing>. In contrast HTML-tags, XML-tags are case sensitive, therefore <cat> and <CAT> are different tags. XML-documents can contain comments:

<!-- Here are comments -->

XML recognition programs, the so-called XML parsers, perform the analysis of a document before finding the first error, in contrast to the HTML parsers embedded in the browser. Browsers are trying to display a document, even if the code contains errors.

An XML document that conforms to all XML syntax rules is considered to be a well-formed document.

2.2 Standard Approaches to Working with XML Documents

2.2.1 Overview. JAXP Tools

There are two standard approaches to working with XML documents in your program:

event-based document model (Simple API for XML, SAX) supports processing events concerned with particular XML tags by reading XML data;
Document Object Model, DOM allows creation and processing of collection of objects organized in a hierarchy.

Event-based approach does not allow the developer to change the data in the source document. If the part of the data needs to be corrected, the document must be completely updated. In contrast, the DOM provides API, which allows developers to add or remove nodes in any part of the tree in the application.

Both approaches use concept of parser. Parser is an application program, which parses document and split it into tokens. The parser can initiate events (as in SAX), or build a data tree.

In order to implement standard approaches to working with XML in Java SE, we use Java API for XML Processing (JAXP). JAXP provides tools for validating and parsing XML documents. To implement the object model, the JAXP document includes the DOM software interface, SAX implemented with the appropriate software interface. In addition to them, the Streaming API for XML (StAX) and the XSLT (XML Stylesheet Language Transformations) tools are provided.

2.2.2 Using Simple API for XML and StAX

Simple API for XML (SAX, a simple application programming interface for working with XML) provides a consistent mechanism for analyzing an XML document. The analyzer, which implements the SAX interface (SAX Parser), processes information from an XML document as a single data stream. This data stream is only available in one direction, that is, previously processed data cannot be re-read without re-analysis. Most programmers agree that the processing of XML documents using SAX is generally faster than using DOM. This is because SAX stream requires much less memory compared with the construction of a complete DOM tree.

SAX analyzers implement an event-driven approach when the programmer needs to create event handlers that are called by the parsers when processing an XML document.

The Java SE tools for working with SAX are implemented in the packages javax.xml.parsers and org.xml.sax, as well as in the packages included in them. To create an object of javax.xml.parsers.SAXParser class, you should use the class javax.xml.parsers.SAXParserFactory, representing the corresponding factory methods. The SAX parser does not create an XML document view in memory. Instead, the SAX parser informs clients about the structure of the XML document using the callback mechanism. You can create a class by yourself implementing a number of necessary interfaces, in particular org.xml.sax.ContentHandler. However, the simplest and most recommended way is to use the org.xml.sax.helpers.DefaultHandler class, creating a derived class and overriding its methods that should be called when various events in the process of document analysis occurs. Most often overridden methods are:

startDocument() and endDocument(): methods that are called at the beginning and end of the analysis of an XML document
startElement() and endElement(): methods that are called at the beginning and end of the document element analysis
characters(): method called when retrieving the text content of an XML document element.

The following example illustrates the use of SAX to read a document. Suppose the Hello.xml file in the project directory has the following content:

<?xml version="1.0" encoding="UTF-8" ?>
<Greetings>
    <Hello Text="Hi, this is an attribute!">
        Hi, this is the text!
    </Hello>
</Greetings>

Note. When saving the file, you must specify the UTF-8 encoding.

The code of the program that reads data from XML will be:

package ua.in.iwanoff.java.third;

import java.io.IOException;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class HelloSAX extends DefaultHandler {

    @Override
    public void startDocument() {
        System.out.println("Opening document");
    }

    @Override
    public void endDocument() {
        System.out.println("Done");
    }

    @Override
    public void startElement(String uri, String localName, String qName,
          Attributes attributes) throws SAXException {
        System.out.println("Opening tag: " + qName);
        if (attributes.getLength() > 0) {
            System.out.println("Attributes: ");
            for (int i = 0; i < attributes.getLength(); i++) {
                System.out.println("  " + attributes.getQName(i) + ": "
                                    + attributes.getValue(i));
            }
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)
                                      throws SAXException {
        System.out.println("Closin tag: " + qName);
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        String s = new String(ch).substring(start, start + length).trim();
        if (s.length() > 0) {
            System.out.println(s);
        }
    }

    public static void main(String[] args) {
        SAXParser parser = null;
        try {
            parser = SAXParserFactory.newInstance().newSAXParser();
        }
        catch (ParserConfigurationException | SAXException e) {
            e.printStackTrace();
        }
        if (parser != null) {
            InputSource input = new InputSource("Hello.xml");
            try {
                parser.parse(input, new HelloSAX());
            }
            catch (SAXException | IOException e) {
                e.printStackTrace();
            }
        }
    }
}

Since the characters() method is called for each tag, it is advisable to output the contents if the string is not empty.

StAX was designed as a cross between DOM and SAX interfaces. This programming interface uses a cursor metaphor that represents the entry point within the document. The application moves the cursor forward by reading the information, receiving information from the parser as needed.

2.2.3 Using the Document Object Model

The DOM is a series of Recommendations produced by the World Wide Web Consortium (W3C). DOM began as a way of identifying and manipulating items on an HTML page (DOM Level 0).

The current DOM Recommendation (DOM Level 3) is an API that defines the objects represented in the XML document, as well as the methods and properties that are used to access and manipulate them.

Beginning with Level 1 DOM, the DOM API contains interfaces that represent all kinds of information that can be found in an XML document. It also includes the methods needed to work with these objects. Some of the most common methods of standard DOM interfaces are listed below.

The Node interface is the primary data type of the DOM. It defines a number of useful methods for obtaining data about nodes and navigating through them:

getFirstChild() and getLastChild() return the first or last child of this node;
getNextSibling() and getPreviousSibling() return the next or previous sibling of this node;
getChildNodes() returns a reference to the list of NodeList type of children of this node; using the NodeList interface methods, you can get the i-th node (item(i) method) and the total number of such nodes getLength() method);
getParentNode() returns the parent node;
getAttributes() returns an associative array of type NamedNodeMap attributes of this node;
hasChildNodes() returns true if the node has children.

There are a number of methods for modifying an XML document: insertBefore(), replaceChild(), removeChild(), appendChild(), etc.

In addition to the Node, DOM also defines several subinterfaces of the Node interface:

Element represents the XML element in the source document; the element includes a pair of tags (opening and closing) and all the text between them;
Attr represents the attribute of the element;
Text represents the element content;
Document represents the entire XML document; Only one Document object exists for each XML document; having the Document object, you can find the root of the DOM tree using the getDocumentElement() method; from the root you can manipulate the entire tree.

Additional types of nodes are:

Comment represents a comment in an XML file
ProcessingInstruction represents the processing instruction
CDATASection represents the CDATA section.

XML parsers require the creation of an instance of a particular class. The disadvantage is that when changing the parsers, you need to change the source code. For some parsers, you can use so-called factory classes. Using the static newInstance() method, an instance of the factory object is created, which creates a class object that implements the DocumentBuilder interface. Such an object is directly a necessary parser: it implements DOM methods that are needed to parse and process the XML file. When creating a parser object, exceptions may be thrown that need to be handled. Then you can create an object of type Document, load data from a file with a name, for example, fileName and pares it:

try {
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(fileName);
    . . .

After traversing and modifying the tree, you can save it in another file.

Using DOM will considered on the example with the previous file (Hello.xml). The following program outputs the text of the attribute to the console, modifies it and stores it in a new XML document:

package ua.in.iwanoff.java.third;

import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

public class HelloDOM {

    public static void main(String[] args) throws Exception {
        Document doc;
        // Create a document builder using the factory method:
        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        doc = db.parse(new File("Hello.xml"));
        // Find the root tag:
        Node rootNode = doc.getDocumentElement();
        // View all child tags:
        for (int i = 0; i < rootNode.getChildNodes().getLength(); i++) {
            Node currentNode = rootNode.getChildNodes().item(i);
            if (currentNode.getNodeName().equals("Hello")) {
                // View all attributes:
                for (int j = 0; j < currentNode.getAttributes().getLength(); j++) {
                    if (currentNode.getAttributes().item(j).getNodeName().equals("Text")) {
                        // Found the required attribute. Display the text of the attribute (greeting):
                        System.out.println(currentNode.getAttributes().item(j).getNodeValue());
                        // Changing the contents of the attribute:
                        currentNode.getAttributes().item(j).setNodeValue("Hi, there was DOM here!");
                        // Further search is inappropriate:
                        break;
                    }
                }
                // Change the text:
                System.out.println(currentNode.getTextContent());
                currentNode.setTextContent("\n    Hi, here was also DOM!\n");
                break;
            }     
        }
        // Create a converter object (in this case, to write to a file).
        // We use the factory method:
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        // Write to file:
        transformer.transform(new DOMSource(doc), 
            new StreamResult(new FileOutputStream(new File("HelloDOM.xml"))));
    }
  
}

After running the program in the project folder, you will be able to find the following file (HelloDOM.xml):

<?xml version="1.0" encoding="UTF-8" standalone="no"?><Greetings>
    <Hello Text="Hi, there was DOM here!">
        Hi, here was also DOM!
    </Hello>
</Greetings>

In the above example, the javax.xml.transform.Transformer class is used to save the modified document in the file. In general, this class is used in the implementation of the so-called XSLT-transformation. XSLT (eXtensible Stylesheet Language Transformations) is a language of converting XML documents to other XML documents or other objects such as HTML, plain text, etc. The XSLT processor accepts one or more XML source documents, as well as one or more modules, and processes them to obtain the output document. The transformation contains a set of template rules: instructions and other directives that guide the XSLT processor when generating an output document.

2.3 Use of XML Schema

Structured data stored in XML document need additional information about rules of elements' order. The most commonly used are two ways of structure representation: Document Template Definition (DTD) and XML Schema (XSD).

DTD (Document Template Definition) is a simple set of rules, which describe structure of XML documents of particular type. DTD is not an XML document itself. DTD is very simple, but it does not describe types of elements. The DTD directives can be present both in the header of the XML document itself (internal DTD) and in a separate file (external DTD). Availability of DTD is optional. However, XML documents written according to some DTD description are valid documents.

For example, we have the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<Pairs>
    <Pair>
        <x>1</x>
        <y>4</y>
    </Pair>
    <Pair>
        <x>2</x>
        <y>2</y>
    </Pair>
        . . .
</Pairs>

The DTD file that describes the structure of this document will look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT Pair (x, y)>
<!ELEMENT x (#PCDATA)>
<!ELEMENT y (#PCDATA)>
<!ELEMENT Pairs (Pair+)>

The plus sign in the last line indicates that the Pairs tag can contain one or many Pair elements inside. In addition, you can also use * (0 or many), a question mark (0 or 1). The absence of a sign means that only one element can be present.

XML Schema is an alternative to DTD method for description of a document structure. The schema is more convenient than DTD in that the description of the structure of the document is performed on the XML itself. In addition, the XML scheme of its capabilities significantly exceeds the DTD. For example, in a schema you can specify tag types and attributes, define restrictions, and more.

An XML document that is well-formed, refers to the grammatical rules and fully responds to it, is called the valid document.

In order to prevent conflicts of tag names, XML allows you to create so-called namespaces. The namespace defines the prefix associated with a particular schema of the document and is attached to the tags. Custom namespace is determined using the following construct:

<root xmlns:pref="http://www.someaddress.org/">

In this example, root is the root XML document tag, pref is the prefix that defines the namespace, "http://www.someaddress.org/" is some address, such as the domain name of the author of the schema. Applications that handle XML documents never check this address. It is only necessary to ensure the uniqueness of the namespace.

The schema itself uses the namespace xs.

The use of a document schema can be demonstrated in the following example. Suppose we have such an XML file:

<?xml version="1.0" encoding="Windows-1251" ?>
<Student Name="John" Surname="Smith">
    <Marks>
        <Mark Subject="Mathematics" Value="4"/>
        <Mark Subject="Physics" Value="5"/>
        <Mark Subject="Programming" Value="3"/>
    </Marks>
    <Comments>
        Strange student
    </Comments>
</Student>

Creating a schema file should start with a standard construct:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

</xs:schema>

The information about document structure must be placed between <xs:schema> and </xs:schema> tags. In order to describe the tags of a document, you can add standard tags inside it. For complex tags that are embedded in others or have parameters:

<xs:element name="tag name">
    <xs:complexType>
      . . .
    </xs:complexType>
</xs:element>

Inside the tag you can place a list of items:

<xs:sequence>
   . . .
</xs:sequence>

The reference to another tag:

<xs:element ref="tag_name"/>

The following element contains data:

<xs:element name="tag_name" type="type_name"/>

The following table contains some standard data types used in schema:

Name	Description
`xs:string`	string value that contains a sequence of Unicode (or ISO/IEC) characters, including spaces, tabs, LF and CR symbols.
`xs:integer`	integer value
`xs:boolean`	binary logical values: true or false,1 or 0.
`xs:float`	32-bit floating point value
`xs:double`	64-bit floating point value
`xs:anyURI`	Uniform Resource Identifier

The following tag

<xs:attribute name="attribute_name" type="type_name"/>

provides a way for an attribute description.

There is also a large number of additional tag parameters. The maxOccurs parameter specifies the maximum number of occurrences for the element, minOccurs specifies the minimum number of occurrences for an element, unbounded determines an unlimited number of occurrences, required specifies the mandatory entry, mixed specifies an element that has a mixed type, and so on.

We can offer such a scheme file for our student (Student.xsd):

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="Student">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Comments" type="xs:string"/>
                <xs:element name="Marks">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element ref="Mark" maxOccurs="unbounded"/>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="Name" type="xs:string" />
            <xs:attribute name="Surname" type="xs:string" />
        </xs:complexType>
    </xs:element>
    <xs:element name="Mark">
        <xs:complexType>
            <xs:attribute name="Subject" type="xs:string" />
            <xs:attribute name="Value" type="xs:string" />
        </xs:complexType>
    </xs:element>
</xs:schema>

2.4 Data Binding Technology

Using Java language provides a convenient way to work with XML files, namely, the mechanism of data binding. This mechanism involves generation of a set of classes that describe the elements of a file, and the creation of a corresponding structure of objects in memory.

The XML data binding tool contains a schema compiler that translates the schema into a set of schema-specific classes with associated access methods (getters and setters). It also contains a marshalling mechanism (recording of structured data in an XML document), supports the unmarshalling of XML documents in the corresponding structure of interconnected instances. The automatically created data structure can be used without manually placing data in lists or arrays.

Traditionally, the first technology of data binding was Castor technology. Later, the JAXB API (Java Architecture for XML Binding) was standardized. Version 2 of the JAXB specification implies both the generation of classes according to the scheme and the generation of the scheme according to the existing class structure.

To support the JAXB API technology standard in the Eclipse Oxygen environment, Dali Java Persistence Tools (JAXB Support) should be installed. If Eclipse does not have the necessary software installed, you can add them using the main menu Help | Install New Software, then in the Work with: line, select Oxygen - http://download.eclipse.org/releases/oxygen. Next we find the necessary software in the list and click Next, on the Install Details page press Next again, then you must accept the terms of the license and click Finish. After downloading the new software, you must restart Eclipse.

Note: the corresponding plug-ins can also be downloaded in previous versions of Eclipse, which will be listed on the Work with: line.

Data binding technology is most often used to generate classes according to an existing schema. First, a schema file (*.xsd) should be created in the project directory.

To use the class generator, you need to use JDK instead of JRE. If JDK tools are not installed on your computer, you can download it from http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html#javasejdk. page. After installing (copying) JDK, choose Java | Installed JREs, then Add... | Next | Directory... in the Eclipse (Window | Preferences) options. You must select the JDK that was previously installed and click Finish. After that, in the list of installed JREs, select the JDK to be set by default and click OK.

If the project was created earlier, it should set JDK as the default JRE in the project options. In the main menu, choose the Project | Properties | Java Build Path on the Libraries tab, select the JRE System Library, press the Edit... button, switch System Library to Alternate JRE, select the previously installed JDK in the appropriate line.

To work properly with XML documents that contain Cyrillic characters, you must set UTF-8 code page (Project | Properties | Resource | Text file encoding | Other and choose UTF-8) in the project options.

Next, you must select the xsd file in the Package Explorer tree. In the context menu, select Generate | JAXB Classes. Next, in the class generation wizard window, specify the project, package and other additional information if necessary. In case of successful completion of generation in the specified package there will be generated classes.

Consider the following example. In the project folder, we create the XML document Hello.xml (New | File the context menu of the project). It is desirable to open this file with the help of a text editor (Open With | Text Editor) and add the following text:

<?xml version="1.0" encoding="UTF-8" ?>
<Greetings>
    <Hello Text="Hello, XML!" />
</Greetings>

This file corresponds to the Hello.xsd schema file, which we also create in the project directory. It is also desirable to open this file using a text editor and add the following text:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="Greetings">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="Hello">
                    <xs:complexType>
                        <xs:attribute name="Text" type="xs:string" use="required" />
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

By means of Dali Java Persistence Tools we generate new classes. The Greetings.java and ObjectFactory.java files appear in the project tree (in the corresponding package).

The Greetings class represents the root of the XML document and contains the nested Hello class inside. In general, all nested tags correspond to nested classes located inside the class that is responsible for the root tag. Accordingly, the Greetings class contains a field of Greetings.Hello type and provides methods getHello() and setHello(). Regarding the Greetings.Hello class, this class, in accordance with the XML document schema, contains a text field of String type to represent the corresponding attribute, as well as the getText() and setText() methods. The annotations in the code control the representation of data in an XML document.

The ObjectFactory class provides the so-called factory methods for creating generated class objects: createGreetings() and createGreetingsHello(). Since class generation tool always creates a class named ObjectFactory, classes that correspond to different schemata should be located in different packages.

Now in the main() function, we can load the document, read and change the attribute value and write to the new file:

package ua.in.iwanoff.java.third;

import java.io.*;
import javax.xml.bind.*;

public class HelloJAXB {

    public static void main(String[] args) {
        try {
            // The JAXBContext class object provides access to the JAXB API:
            JAXBContext jaxbContext = JAXBContext.newInstance("ua.in.iwanoff.java.third");
                                                         // package with necessary classes
            // Read data from the file and put it into the object of the generated class:
            Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
            Greetings greetings = (Greetings)unmarshaller.unmarshal(new FileInputStream("Hello.xml"));
            // Output the old value of the attribute:
            System.out.println(greetings.getHello().getText());
            // Change the attribute value:
            greetings.getHello().setText("Hello, JAXB!");
            // Create a Marshaller object for output to a file:
            Marshaller marshaller = jaxbContext.createMarshaller();
            // Turn on formatting:
            marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
            // Save the object in a new file:
            marshaller.marshal(greetings, new FileWriter("HelloJAXB.xml"));
        }
        catch (JAXBException | IOException e) {
            e.printStackTrace();
        }
    }

}

The new HelloJAXB.xml file will be as follows:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Greetings>
    <Hello Text="Hello, JAXB!"/>
</Greetings>

As can be seen from the example, data binding technology provides better formatting of an XML document.

Note: In order to support the standard JAXB APIs in the IntelliJ IDEA Community Edition environment, you need to make some adjustments. One way to implement JAXB technology is the connection the xjc.exe utility included in the JDK toolkit. This utility can be launched at the command prompt, but it is advisable to configure the context menu. In the Settings window, we select Tools | External Tools and press the "+" button. In the Edit Tool dialog box, we enter the name (Name:) of the new Generate JAXB Classes command, the path to the xjc.exe utility (Program:), which should be selected on the particular computer in the file selection dialog (button "..." ), and parameters (Parameters:), which in our case will be as follows:

-p $FileFQPackage$ -d "$SourcepathEntry$" "$FilePath$"

In order for the created command to work correctly, the schema file should be placed in a new package, in which generated files will appear.

2.5 Serialization into XML Files

The main disadvantage of the binary serialization described before (Laboratory training # 1) is the need to work with binary (non-textual) files. Usually such files are used not for long-term storage of data, but for one-time storage and recovery of objects. Certainly, serialization in a text file, in particular in an XML document, is more convenient and manageable. There are several approaches to serialization and deserialization, built on XML. The easiest approach is to use java.beans.XMLEncoder and java.beans.XMLDecoder classes. The most natural use of these classes is the storage and restoring of the graphical interface visual controls. But you can also store objects of all classes that meet the Java Beans specification.

Java Bean is a Java class that confirms to the following requirements:

class is public
class does not contain public data (only methods can be public)
the default constructor (without arguments) must be present in the class
class must implement the java.io.Serializable interface
the pair of access methods setNnn() and getNnn() build so called property with the name nnn and appropriate type. The boolean type properties require getters with prefix "is" instead of "get" (isNnn())

Previously, the Line and Point classes were implemented. XML serialization does not require the implementation of the Serializable interface. However, classes must be public, have public access functions (getters and setters) to private fields. The Point class:

package ua.in.iwanoff.java.third;

public class Point {
    private double x, y;

    public void setX(double x) {
        this.x = x;
    }

    public void setY(double y) {
        this.y = y;
    }

    public double getX() {
        return x;
    }

    public double getY() {
        return y;
    }

}

Class Line:

package ua.in.iwanoff.java.third;

public class Line {
    private Point first = new Point(), second = new Point();

    public void setFirst(Point first) {
        this.first = first;
    }

    public Point getFirst() {
        return first;
    }

    public Point getSecond() {
        return second;
    }

    public void setSecond(Point second) {
        this.second = second;
    }

}

The following code provides XML serialization:

package ua.in.iwanoff.java.third;

import java.beans.XMLEncoder;
import java.io.*;

public class XMLSerialization {

    public static void main(String[] args) {
        Line line = new Line();
        line.getFirst().setX(1);
        line.getFirst().setY(2);
        line.getSecond().setX(3);
        line.getSecond().setY(4);
        try (XMLEncoder xmlEncoder = new XMLEncoder(new FileOutputStream("Line.xml"))) {
            xmlEncoder.writeObject(line);
            xmlEncoder.flush();
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }

}

After executing the program, we will receive the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<java version="1.8.0_66" class="java.beans.XMLDecoder">
 <object class="ua.in.iwanoff.labs.fifth.Line">
  <void property="first">
   <void property="x">
    <double>1.0</double>
   </void>
   <void property="y">
    <double>2.0</double>
   </void>
  </void>
  <void property="second">
   <void property="x">
    <double>3.0</double>
   </void>
   <void property="y">
    <double>4.0</double>
   </void>
  </void>
 </object>
</java>

Now deserialization may be accomplished using such code:

package ua.in.iwanoff.java.third;

import java.beans.XMLDecoder;
import java.io.*;

public class XMLDeserialization {

    public static void main(String[] args) {
        try (XMLDecoder xmlDecoder = new XMLDecoder(new FileInputStream("Line.xml"))) {
            Line line = (Line)xmlDecoder.readObject();
            System.out.println(line.getFirst().getX() + " " + line.getFirst().getY() + " " +
                               line.getSecond().getX() + " " + line.getSecond().getY());
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }

}

There are also other (non-standard) implementations of XML serialization.

3 Sample Programs

3.1 Using DOM Technology

Suppose a XML document with about the continent data (Continent.xml) is prepared:

<?xml version="1.0" encoding="UTF-8"?>
<ContinentData Name="Europe">
    <CountriesData>
        <CountryData Name="Ukraine" Area="603700" Population="46314736" >
            <CapitalData Name="Kiev" />
        </CountryData>
        <CountryData Name="France" Area="547030" Population="61875822" >
            <CapitalData Name="Moscow" />
        </CountryData>
        <CountryData Name="Germany" Area="357022" Population="82310000" >
            <CapitalData Name="Berlin" />
        </CountryData>
    </CountriesData>
</ContinentData>

Note: error with the capital of France is intentional.

It is necessary use DOM tools to read the data, fix the error and save it in a new file. The program will look like this:

package ua.in.iwanoff.java.third;

import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;

public class ContinentWithDOM {

    public static void main(String[] args) {
        try {
            Document doc;
            DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            doc = db.parse(new File("Continent.xml"));
            Node rootNode = doc.getDocumentElement();
          mainLoop: 
            for (int i = 0; i < rootNode.getChildNodes().getLength(); i++) {
                Node countriesNode = rootNode.getChildNodes().item(i);
                if (countriesNode.getNodeName().equals("CountriesData")) {
                    for (int j = 0; j < countriesNode.getChildNodes().getLength(); j++) {
                        Node countryNode = countriesNode.getChildNodes().item(j);
                        if (countryNode.getNodeName().equals("CountryData")) {
                            // Find the attribute by name:
                            if (countryNode.getAttributes().getNamedItem("Name").getNodeValue().equals("France")) {
                                for (int k = 0; k < countryNode.getChildNodes().getLength(); k++) {
                                    Node capitalNode = countryNode.getChildNodes().item(k);
                                    if (capitalNode.getNodeName().equals("CapitalData")) {
                                        capitalNode.getAttributes().getNamedItem("Name").setNodeValue("Paris");
                                        break mainLoop;
                                    }
                                }
                            }
                        }
                    }
                }
            }
            Transformer transformer = TransformerFactory.newInstance().newTransformer();
            transformer.transform(new DOMSource(doc), 
                new StreamResult(new FileOutputStream(new File("CorrectedConinent.xml"))));
        }
        catch (Exception  e) {
            e.printStackTrace();
        }
    }

}

2.6 Using JAXB

The previous task can be solved using JAXB technology. For JAXB classes, we create a separate nested continent package inside the current package. We put the document schema (Continent.xsd) into a new package:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="ContinentData">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="CountriesData">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element maxOccurs="unbounded" name="CountryData">
                                <xs:complexType>
                                    <xs:sequence>
                                        <xs:element name="CapitalData">
                                            <xs:complexType>
                                                <xs:attribute name="Name" type="xs:string" use="required" />
                                            </xs:complexType>
                                        </xs:element>
                                    </xs:sequence>
                                    <xs:attribute name="Name" type="xs:string" use="required" />
                                    <xs:attribute name="Area" type="xs:unsignedInt" use="required" />
                                    <xs:attribute name="Population" type="xs:unsignedInt" use="required" />
                                </xs:complexType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="Name" type="xs:string" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

Now we can generate classes. The program that uses these classes will look like this:

package ua.in.iwanoff.java.third;

import java.io.*;
import javax.xml.bind.*;
import ua.in.iwanoff.java.third.continent.*;

public class ContinentWithJAXB {

    public static void main(String[] args) {
        try {
            JAXBContext jaxbContext = JAXBContext.newInstance("ua.in.iwanoff.java.third.continent");
            Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
            ContinentData data = (ContinentData)unmarshaller.unmarshal(new FileInputStream("Continent.xml"));
            // We work with the list of CountryData elements:
            for (ContinentData.CountriesData.CountryData c : data.getCountriesData().getCountryData()) {
                if (c.getName().equals("France")) {
                    c.getCapitalData().setName("Paris");
                    break;
                }
            }
            Marshaller marshaller = jaxbContext.createMarshaller();
            marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
            marshaller.marshal(data, new FileWriter("CorrectedContinent.xml"));
        }
        catch (JAXBException | IOException e) {
            e.printStackTrace();
        }
    }

}

3.3 Classes "Country" and "Census"

Suppose it is necessary to supplement the program of work with countries and population censuses created in Laboratory training # 1 by means of reading and writing data into XML documents. We can implement two approaches (and therefore two derived classes that represent the main entity): using JAXB tools and with XML serialization

We can add the package ua.in.iwanoff.java.third to the previously created project. We also can use the previously created FileIO interface (see example for laboratory training # 1).

To implement a version of an application that works with XML documents using JAXB, we first need to develop a document and its schema. We can offer, for example, such an XML document (Ukraine.xml). It should be placed in the root directory of the project:

<?xml version="1.0" encoding="UTF-8"?>
<CountryData Name="Ukraine" Area="603628" >
    <CensusData Year="1959" Population="41869000" Comments="The first postwar census" />
    <CensusData Year="1970" Population="47126500" Comments="Population increases" />
    <CensusData Year="1979" Population="49754600" Comments="No comments" />
    <CensusData Year="1989" Population="51706700" Comments="The last soviet census" />
    <CensusData Year="2001" Population="48475100" Comments="The first census in the independent Ukraine" />
</CountryData>

To generate classes using JAXB technology, we create a new xml subpackage inside the package ua.in.iwanoff.java.third. In this package, we place the file of the document schema (Country.xsd):

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xs:element name="CountryData">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" name="CensusData">
                    <xs:complexType>
                        <xs:attribute name="Year" type="xs:int" use="required" />
                        <xs:attribute name="Population" type="xs:int" use="required" />
                        <xs:attribute name="Comments" type="xs:string" use="required" />
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
            <xs:attribute name="Name" type="xs:string" use="required" />
            <xs:attribute name="Area" type="xs:double" use="required" />
        </xs:complexType>
    </xs:element>
</xs:schema>

Note. For the project, you should install the UTF-8 code table through the properties of the project. Unfortunately, re-defining a code table for a previously created project can lead to the loss of Cyrillic data in a pre-written code. In this case, it is more correct to create a new project with the UTF-8 code table installed and transfer the required code through the clipboard.

Next, we generate the required classes using JAXB tools. The ObjectFactory and CountryData classes will be created. The last class describes the country data according to the scheme. Inside this class we can find the nested static class CensusData. A reference to this class can be used in the new XMLCensus class, which will represent a separate census for the case of reading data from an XML file. This class actually adapts CensusData to the requirements of the hierarchy of previously created classes. The XMLCensus class code will be:

package ua.in.iwanoff.java.third;

import ua.in.iwanoff.java.third.xml.CountryData;
import ua.in.iwanoff.oop.first.AbstractCensus;

public class XMLCensus extends AbstractCensus {
    CountryData.CensusData censusData;

    public XMLCensus(CountryData.CensusData censusData) {
        this.censusData = censusData;
    }

    @Override
    public int getYear() {
        return censusData.getYear();
    }

    @Override
    public void setYear(int year) {
        censusData.setYear(year);
    }

    @Override
    public int getPopulation() {
        return censusData.getPopulation();
    }

    @Override
    public void setPopulation(int population) {
        censusData.setPopulation(population);
    }

    @Override
    public String getComments() {
        return censusData.getComments();
    }

    @Override
    public void setComments(String comments) {
        censusData.setComments(comments);
    }
}

We are starting to create an XMLCountry class. The most interesting of the automatically generated classes is the CountryData class. It is advisable to describe the XMLCountry class field to work with the data of an XML file, namely, a reference to the root element:

private CountryData countryData = new CountryData();

The data structure will use the structure of objects of automatically generated classes. This structure appears in memory after reading the data from the XML document. Access to individual data will be done using methods of automatically generated classes. For sorting, we create a temporary array. The entire output file of XMLCountry.java will look like this:

package ua.in.iwanoff.java.third;

import ua.in.iwanoff.java.first.FileIO;
import ua.in.iwanoff.java.third.xml.CountryData;
import ua.in.iwanoff.oop.first.AbstractCensus;
import ua.in.iwanoff.oop.first.AbstractCountry;
import ua.in.iwanoff.oop.first.CompareByComments;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Arrays;

public class XMLCountry extends AbstractCountry implements FileIO {
    private CountryData countryData = new CountryData();

    @Override
    public String getName() {
        return countryData.getName();
    }

    @Override
    public void setName(String name) {
        countryData.setName(name);
    }

    @Override
    public double getArea() {
        return countryData.getArea();
    }

    @Override
    public void setArea(double area) {
        countryData.setArea(area);
    }

    @Override
    public AbstractCensus getCensus(int i) {
        return new XMLCensus(countryData.getCensusData().get(i));
    }

    @Override
    public void setCensus(int i, AbstractCensus census) {
        countryData.getCensusData().get(i).setYear(census.getYear());
        countryData.getCensusData().get(i).setPopulation(census.getPopulation());
        countryData.getCensusData().get(i).setComments(census.getComments());
    }

    @Override
    public boolean addCensus(AbstractCensus census) {
        CountryData.CensusData censusData = new CountryData.CensusData();
        boolean result = countryData.getCensusData().add(censusData);
        setCensus(censusesCount() - 1, census);
        return result;
    }

    @Override
    public boolean addCensus(int year, int population, String comments) {
        CountryData.CensusData censusData = new CountryData.CensusData();
        censusData.setYear(year);
        censusData.setPopulation(population);
        censusData.setComments(comments);
        return countryData.getCensusData().add(censusData);
    }

    @Override public int censusesCount() {
        return countryData.getCensusData().size();
    }

    @Override
    public void clearCensuses() {
        countryData.getCensusData().clear();
    }

    @Override
    public void sortByPopulation() {
        Collections.sort(countryData.getCensusData(),
                Comparator.comparing(CountryData.CensusData::getPopulation));
    }

    @Override
    public void sortByComments() {
        Collections.sort(countryData.getCensusData(),
                Comparator.comparing(CountryData.CensusData::getComments));
    }

    @Override
    public AbstractCensus[] getCensuses() {
        AbstractCensus[] censuses = new AbstractCensus[censusesCount()];
        for (int i = 0; i < censusesCount(); i++) {
            censuses[i] = new XMLCensus(countryData.getCensusData().get(i));
        }
        return censuses;
    }

    @Override
    public void setCensuses(AbstractCensus[] censuses) {
        clearCensuses();
        for (AbstractCensus census : censuses) {
            addCensus(census);
        }
    }

    @Override
    public void readFromFile(String fileName) throws JAXBException, FileNotFoundException {
        JAXBContext jaxbContext = JAXBContext.newInstance("ua.in.iwanoff.java.third.xml");
        Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
        countryData = (CountryData) unmarshaller.unmarshal(new FileInputStream(fileName));
    }

    @Override
    public void writeToFile(String fileName) throws JAXBException, IOException {
        JAXBContext jaxbContext = JAXBContext.newInstance("ua.in.iwanoff.java.third.xml");
        Marshaller marshaller = jaxbContext.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
        marshaller.marshal(countryData, new FileWriter(fileName));
    }

    public CountryData.CensusData getCensusData(int i) {
        return countryData.getCensusData().get(i);
    }

    public static void main(String[] args) {
        XMLCountry country = new XMLCountry();
        try {
            country.readFromFile("Ukraine.xml");
            country.testCountry();
            country.writeToFile("ByComments.xml");
        }
        catch (FileNotFoundException e) {
            System.out.println("Read failed");
            e.printStackTrace();
        }
        catch (IOException e) {
            System.out.println("Write failed");
            e.printStackTrace();
        }
        catch (JAXBException e) {
            e.printStackTrace();
            System.out.println("Wrong format");
        }
    }

}

As you can see from the above text, to sort the censuses, we should sort the list of CensusData objects in the object structure that was created during deserialization. To check the sorting condition, we can use the standard static method Comparator.comparing() and a reference to the method of calculating the criterion. The use of references to methods is considered in the Laboratory training # 3 of the course "Object-Oriented Programming".

After executing the program, the ByComments.xml file is automatically created in the project's root folder, in which the census data is arranged according to the population growth.

Now we can implement the version with XML serialization and deserialization. We can use the implementation of the "Country" class with an array inside (CountryWithArray class from the example to the Laboratory training # 1 of the course "Object-Oriented Programming"). We create the derived class XMLSerializedCountry. It will implement the FileIO interface. As in the case of binary serialization, an object cannot serialize itself, the function readFromFile() of the FileIO interface cannot be implemented. Deserialization will be performed in a separate static function deserialize(). The XMLSerializedCountry class code will be as follows:

package ua.in.iwanoff.java.third;

import ua.in.iwanoff.java.first.FileIO;
import ua.in.iwanoff.oop.first.CountryWithArray;

import java.beans.XMLDecoder;
import java.beans.XMLEncoder;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;

public class XMLSerializedCountry extends CountryWithArray implements FileIO {
    @Override public void readFromFile(String fileName) throws Exception {
        throw new UnsupportedOperationException();
    }

    public static XMLSerializedCountry deserialize(String fileName) throws FileNotFoundException {
        XMLDecoder xmlDecoder = new XMLDecoder(new FileInputStream(fileName));
        XMLSerializedCountry country = (XMLSerializedCountry)xmlDecoder.readObject();
        return country;
    }

    @Override public void writeToFile(String fileName) throws IOException {
        XMLEncoder xmlEncoder = new XMLEncoder(new FileOutputStream(fileName));
        xmlEncoder.writeObject(this);
        xmlEncoder.flush();
    }

}

Separately we create the XMLCountrySerializer class. In its main() function we create the country and serialize it. The code will be as follows:

package ua.in.iwanoff.java.third;

import java.io.IOException;

public class XMLCountrySerializer {
    public static void main(String[] args) {
        XMLSerializedCountry country = new XMLSerializedCountry();
        country.createCountry();
        try {
            country.writeToFile("UkraineSerialized.xml");
        }
        catch (IOException e) {
            System.err.println("Write failed");
            e.printStackTrace();
        }
    }
}

After executing the program, the file UkraineSerialized.xml is created in the root directory of the project.

For deserialization and testing, we create a separate class XMLCountryDeserializer:

package ua.in.iwanoff.java.third;

import java.io.FileNotFoundException;
import java.io.IOException;

public class XMLCountryDeserializer {
    public static void main(String[] args) {
        XMLSerializedCountry country = null;
        try {
            country = XMLSerializedCountry.deserialize("UkraineSerialized.xml");
            country.testCountry();
            country.writeToFile("ByCommentsSerialized.xml");
        }
        catch (FileNotFoundException e) {
            System.err.println("Read failed");
            e.printStackTrace();
        }
        catch (IOException e) {
            System.err.println("Write failed");
            e.printStackTrace();
        }
    }
}

After executing this program, you can find the ByCommentsSerialized.xml file in the project's root directory.

4 Exercises

Create a schema and XML document that describes information about the user. Generate classes using JAXB technology.
Create a schema and XML document that describes information about the book. Generate classes using JAXB technology.
Create a schema and XML document that describes information about the city. Generate classes using JAXB technology.
Create a schema and XML document that describes information about the movie. Generate classes using JAXB technology.
Create classes Faculty and Institute (with an array of faculties as a field). Create objects, implement their serialization and deserialization in XML.

5 Quiz

What are purposes of XML documents?
What restrictions are imposed on the structure of the XML document, the syntax and location of the tags?
What is the difference between SAX and DOM technologies?
How do I read and write XML documents?
What is XSLT?
What is the difference between a valid and well-formed XML document?
What are differences between document template definition and document schema?
Is document template definition (DTD) also XML document?
Is document schema also XML document?
Why do XML documents require namespaces?
What are marshalling and unmarshalling?
What are advantages of data binding technology?
What are the standard and non-standard data binding technologies?
What project settings should I set for using Dali Java Persistence Tools?
Which classes correspond to the Java Beans specification?
What are the disadvantages and advantages of XML serialization?

Bases of Java

Developed by Leo V. Ivanov

Laboratory Assignments: