xmlutils.py

GitHub page Download ZIP

xmlutils.py is a set of Python scripts for processing xml files serially, namely converting them to other formats (SQL, CSV, JSON). The scripts use ElementTree.iterparse() to iterate through nodes in an XML file, thus not needing to load the whole DOM into memory. The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups.

Blind-conversion of XML to CSV and SQL is not recommended. It only works if the structure of the XML document is simple (flat).

xml2json supports complex XML documents with multiple nested hierarchies

Note: The XML files are NOT validated by the scripts.

xml2csv.py

Convert an XML document to a CSV file.
python xml2csv.py --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"

Options

--input Input XML document's filename*
--output Output CSV file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--delimiter Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore A space separated list of element tags in the XML document to ignore.
--header Whether to print the CSV header (list of fields) in the first line; 1=yes, 0=no. Default is 1.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000.

xml2sql.py

Convert an XML document to an SQL file.
python xml2sql.py --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"

Options

--input Input XML document's filename*
--output Output SQL file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--ignore A space separated list of element tags in the XML document to ignore.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--packet Maximum size of a single INSERT query in MBs. Default is 8. Set based on MySQL's max_allowed_packet configuration.

xml2json.py

Convert XML to JSON

Unlike xml2sql and xml2csv, xml2py is not a stand alone utility, but a library. Moreover, it supports hierarchies nested to any number of levels.

Usage

from xml2json import *

# given an ElementTree Element, return its json
json = xml2json(elem)


# __________ Working with files
# xml2json_file(input_filename, output_filename[optional], prettyprint[True or False], file_encoding[default: utf-8])

# read an xml file and return json
json = xml2json_file("samples/fruits.xml")

# read an xml file and write json to a file
xml2json_file("samples/fruits.xml", "samples/fruits.json")

Kailash Nadh