Home » 2014 » October » 25 » XMLWF command in Linux to check/validate/parse an XML file

2:18 PM
XMLWF command in Linux to check/validate/parse an XML file

Share on Google+

xmlwf - Determines if an XML document is well-formed. It is non- validating parser.

FORMAT
       xmlwf  [ -s]  [ -n]  [ -p]  [ -x]  [ -e encoding]  [ -w]  [ -d output-dir]  [ -c]  [ -m]  [ -r]  [ -t]  [ -v] [ file ...]


DESCRIPTION
       xmlwf uses the Expat library to determine if an XML document is well-formed.  It is non-validating. If you do not specify any files on the command-line, and you have a recent version of xmlwf, the  input  file  will be read from standard input.


Examples of well-formed documents:
       A well-formed document must adhere to the following rules and if any of them is not followed xmlwf throws an error.

  • The  file  begins  with  an  XML declaration.  For instance,

<?xml version="1.0" standalone="yes"?>.  

<?xml version="1.0" encoding="ISO-8859-1"?>

NOTE:    xmlwf does not currently check for a valid XML declaration.

  •   Every start tag is either empty (<tag/>) or has a corresponding end tag.

<name>some data</name>

or

<name/>

  •   There is exactly one root element.  This element must contain all other elements  in  the  document.   Only comments, white space, and processing instructions may come after the close of the root element.
  •  All elements nest properly.

<root>

<branch1>

             <subbranch1>

                        somedata

              </subbranch1>

</branch1>

<branch2>

                otherdata

</branch2>

</root>

  •  All attribute values are enclosed in quotes (either single or double).

<name attribute1="one" attribute2="two">shankar</name>

 

Note:-   If  the  document  has  a  DTD,  and it strictly complies with that DTD, then the document is also considered valid.  xmlwf is a non-validating parser i.e. it does not check the DTD.  However,  it  does  support  external entities (see the -x option).


Examples:

We will try to understand this using a simple example.Lets create a demo xml file with some attributes:

 

shanky@localhost:/home/shanky/test:> cat testxml.xml
<?xml version="1.0" standalone="yes"?>
<student                             
<name>
shankar
</name>
<roll>
11234
</roll>
<marks>
99
</marks>
</student>

 

The type of file we checked is XML, note that if we don't give the declaration i.e. the first line, the file type will be shown as ASCII text.

shanky@localhost:/home/shanky/test:> file testxml.xml
testxml.xml: XML document text

 

Now we will check for the well formedness of the file using xmlwf command.

shanky@localhost:/home/shanky/test:> xmlwf testxml.xml
testxml.xml:3:0: not well-formed (invalid token)

 

We find that the xml file is not well formed because the start tag of student is wrong (see the highlighted text). Now we will correct this as bellow:

<student>

---some data and tags--

</student>

We will test the file again as below. we find that now the document is well formed.

shanky@localhost:/home/shanky/test:> xmlwf testxml.xml
shanky@localhost:/home/shanky/test:>

 

 

Like wise it will throw error for erroneous xml file. Other errors could be:

user1@host1:/home/user1/test:> xmlwf testxml.xml
testxml.xml:12:2: mismatched tag

 


OPTIONS
       When an option includes an argument, you may specify the argument either separately ("-d output") or concatenated with the option ("-doutput").  xmlwf supports both.

       -c     If the input file is well-formed and xmlwf doesn't encounter any errors,  the  input  file  is  simply copied  to  the output directory unchanged.  This implies no namespaces (turns off -n) and requires -d to specify an output file.

       -d output-dir
              Specifies a directory to contain transformed representations of the input files.  By default, -d  outputs  a  canonical representation (described below).  You can select different output formats using -c and -m.

              The output filenames will be exactly the same as the input filenames or "STDIN" if the input is coming from  standard  input.   Therefore, you must be careful that the output file does not go into the same directory as the input file.  Otherwise, xmlwf will delete the input file before it generates the output file (just like running cat < file > file in most shells).

              Two structurally equivalent XML documents have a byte-for-byte identical canonical XML representation.
              Note that ignorable white space is considered significant and is treated equivalently to  data.   More on canonical XML can be found at http://www.jclark.com/xml/canonxml.html .

       -e encoding
              Specifies  the  character  encoding  for  the  document, overriding any document encoding declaration.
              xmlwf supports four built-in encodings: US-ASCII, UTF-8, UTF-16, and  ISO-8859-1.   Also  see  the  -w option.

       -m     Outputs  some strange sort of XML file that completely describes the the input file, including character postitions.  Requires -d to specify an output file.

       -n     Turns on namespace processing.  (describe namespaces) -c disables namespaces.

 -p     Tells xmlwf to process external DTDs and parameter entities.

              Normally xmlwf never parses parameter entities.  -p tells it to always parse them.  -p implies -x.

       -r     Normally xmlwf memory-maps the XML file before parsing; this can result  in  faster  parsing  on  many platforms.  -r turns off memory-mapping and uses normal file IO calls instead.  Of course, memory-mapping is automatically turned off when reading from standard input.

              Use of memory-mapping can cause some platforms to report substantially higher memory usage for  xmlwf,
              but  this  appears  to be a matter of the operating system reporting memory in a strange way; there is not a leak in xmlwf.

       -s     Prints an error if the document is not standalone.  A document is standalone if  it  has  no  external subset and no references to parameter entities.

       -t     Turns  on  timings.   This tells Expat to parse the entire file, but not perform any processing.  This gives a fairly accurate idea of the raw speed of Expat itself without client overhead.  -t  turns  off most of the output options (-d, -m, -c, ...).

       -v     Prints  the  version  of  the Expat library being used, including some information on the compile-time configuration of the library, and then exits.

       -w     Enables support for Windows code pages.  Normally, xmlwf will throw an error  if  it  runs  across  an encoding  that  it  is  not  equipped to handle itself.  With -w, xmlwf will try to use a Windows code page.  See also -e.

       -x     Turns on parsing external entities.

 

 
 

Category: Open System-Linux | Views: 1915 | Added by: shanky | Tags: xmlwf command in unix, xmlwf, xmlwf command with examples, xmlwf examples, xmlwf command example, xmlwf command in linux | Rating: 5.0/6

Related blogs


You may also like to see:


[2014-08-31][Open System-Linux]
20 command line tools to check load and performance of a Linux System
[2014-10-26][Open System-Linux]
XMLLINT command in linux : a validating XML parser
[2014-09-13][Open System-Linux]
md5sum: calculate and check md5 message digest of a file in Linux
[2015-01-18][Open System-Linux]
The JAR archiving tool in Linux
[2014-09-20][Open System-Linux]
Export command in Linux

Total comments: 0
avatar