xmljson documentation · 2 convert data to xml 5 3 convert xml to data 7 4 conventions 9 5 options...

27
xmljson Documentation Release 0.2.0 S Anand Nov 21, 2018

Upload: others

Post on 27-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson DocumentationRelease 0.2.0

S Anand

Nov 21, 2018

Page 2: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix
Page 3: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

Contents

1 About 3

2 Convert data to XML 5

3 Convert XML to data 7

4 Conventions 9

5 Options 11

6 Installation 13

7 Simple CLI utility 15

8 Roadmap 17

9 More information 199.1 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199.2 Credits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229.4 Indices and tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

i

Page 4: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

ii

Page 5: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

xmljson converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.

Contents 1

Page 6: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

2 Contents

Page 7: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 1

About

XML can be converted to a data structure (such as JSON) and back. For example:

<employees><person>

<name value="Alice"/></person><person>

<name value="Bob"/></person>

</employees>

can be converted into this data structure (which also a valid JSON object):

{"employees": [{

"person": {"name": {

"@value": "Alice"}

}}, {

"person": {"name": {

"@value": "Bob"}

}}]

}

This uses the BadgerFish convention that prefixes attributes with @. The conventions supported by this library are:

• Abdera: Use "attributes" for attributes, "children" for nodes

• BadgerFish: Use "$" for text content, @ to prefix attributes

3

Page 8: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

• Cobra: Use "attributes" for sorted attributes (even when empty), "children" for nodes, values arestrings

• GData: Use "$t" for text content, attributes added as-is

• Parker: Use tail nodes for text content, ignore attributes

• Yahoo Use "content" for text content, attributes added as-is

4 Chapter 1. About

Page 9: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 2

Convert data to XML

To convert from a data structure to XML using the BadgerFish convention:

>>> from xmljson import badgerfish as bf>>> bf.etree({'p': {'@id': 'main', '$': 'Hello', 'b': 'bold'}})

This returns an array of etree.Element structures. In this case, the result is identical to:

>>> from xml.etree.ElementTree import fromstring>>> [fromstring('<p id="main">Hello<b>bold</b></p>')]

The result can be inserted into any existing root etree.Element:

>>> from xml.etree.ElementTree import Element, tostring>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('root'))>>> tostring(result)'<root><p id="main"/></root>'

This includes lxml.html as well:

>>> from lxml.html import Element, tostring>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('html'))>>> tostring(result, doctype='<!DOCTYPE html>')'<!DOCTYPE html>\n<html><p id="main"></p></html>'

For ease of use, strings are treated as node text. For example, both the following are the same:

>>> bf.etree({'p': {'$': 'paragraph text'}})>>> bf.etree({'p': 'paragraph text'})

By default, non-string values are converted to strings using Python’s str, except for booleans – which are convertedinto true and false (lower case). Override this behaviour using xml_fromstring:

>>> tostring(bf.etree({'x': 1.23, 'y': True}, root=Element('root')))'<root><y>true</y><x>1.23</x></root>'

(continues on next page)

5

Page 10: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

(continued from previous page)

>>> from xmljson import BadgerFish # import the class>>> bf_str = BadgerFish(xml_tostring=str) # convert using str()>>> tostring(bf_str.etree({'x': 1.23, 'y': True}, root=Element('root')))'<root><y>True</y><x>1.23</x></root>'

If the data contains invalid XML keys, these can be dropped via invalid_tags='drop' in the constructor:

>>> bf_drop = BadgerFish(invalid_tags='drop')>>> data = bf_drop.etree({'$': '1', 'x': '1'}, root=Element('root')) # Drops→˓invalid <$> tag>>> tostring(data)'<root>1<x>1</x></root>'

6 Chapter 2. Convert data to XML

Page 11: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 3

Convert XML to data

To convert from XML to a data structure using the BadgerFish convention:

>>> bf.data(fromstring('<p id="main">Hello<b>bold</b></p>')){"p": {"$": "Hello", "@id": "main", "b": {"$": "bold"}}}

To convert this to JSON, use:

>>> from json import dumps>>> dumps(bf.data(fromstring('<p id="main">Hello<b>bold</b></p>')))'{"p": {"b": {"$": "bold"}, "@id": "main", "$": "Hello"}}'

To preserve the order of attributes and children, specify the dict_type as OrderedDict (or any other dictionary-like type) in the constructor:

>>> from collections import OrderedDict>>> from xmljson import BadgerFish # import the class>>> bf = BadgerFish(dict_type=OrderedDict) # pick dict class

By default, values are parsed into boolean, int or float where possible (except in the Yahoo method). Override thisbehaviour using xml_fromstring:

>>> dumps(bf.data(fromstring('<x>1</x>')))'{"x": {"$": 1}}'>>> bf_str = BadgerFish(xml_fromstring=False) # Keep XML values as strings>>> dumps(bf_str.data(fromstring('<x>1</x>')))'{"x": {"$": "1"}}'>>> bf_str = BadgerFish(xml_fromstring=repr) # Custom string parser'{"x": {"$": "\'1\'"}}'

xml_fromstring can be any custom function that takes a string and returns a value. In the example below, onlythe integer 1 is converted to an integer. Everything else is retained as a float:

>>> def convert_only_int(val):... return int(val) if val.isdigit() else val

(continues on next page)

7

Page 12: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

(continued from previous page)

>>> bf_int = BadgerFish(xml_fromstring=convert_only_int)>>> dumps(bf_int.data(fromstring('<p><x>1</x><y>2.5</y><z>NaN</z></p>')))'{"p": {"x": {"$": 1}, "y": {"$": "2.5"}, "z": {"$": "NaN"}}}'

8 Chapter 3. Convert XML to data

Page 13: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 4

Conventions

To use a different conversion method, replace BadgerFish with one of the other classes. Currently, these aresupported:

>>> from xmljson import abdera # == xmljson.Abdera()>>> from xmljson import badgerfish # == xmljson.BadgerFish()>>> from xmljson import cobra # == xmljson.Cobra()>>> from xmljson import gdata # == xmljson.GData()>>> from xmljson import parker # == xmljson.Parker()>>> from xmljson import yahoo # == xmljson.Yahoo()

9

Page 14: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

10 Chapter 4. Conventions

Page 15: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 5

Options

Conventions may support additional options.

The Parker convention absorbs the root element by default. parker.data(preserve_root=True) preservesthe root instance:

>>> from xmljson import parker, Parker>>> from xml.etree.ElementTree import fromstring>>> from json import dumps>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>')))'{"a": 1, "b": 2}'>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>'), preserve_root=True))'{"x": {"a": 1, "b": 2}}'

11

Page 16: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

12 Chapter 5. Options

Page 17: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 6

Installation

This is a pure-Python package built for Python 2.7+ and Python 3.0+. To set up:

pip install xmljson

13

Page 18: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

14 Chapter 6. Installation

Page 19: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 7

Simple CLI utility

After installation, you can benefit from using this package as simple CLI utility. By now only XML to JSON conver-sion supported. Example:

$ python -m xmljson -husage: xmljson [-h] [-o OUT_FILE]

[-d {abdera,badgerfish,cobra,gdata,parker,xmldata,yahoo}][in_file]

positional arguments:in_file defaults to stdin

optional arguments:-h, --help show this help message and exit-o OUT_FILE, --out_file OUT_FILE

defaults to stdout-d {abdera,badgerfish,...}, --dialect {...}

defaults to parker

$ python -m xmljson -d parker tests/mydata.xml{

"foo": "spam","bar": 42

}

This is a typical UNIX filter program: it reads file (or stdin), processes it in some way (convert XML to JSON inthis case), then prints it to stdout (or file). Example with pipe:

$ some-xml-producer | python -m xmljson | some-json-processor

There is also pip’s console_script entry-point, you can call this utility as xml2json:

$ xml2json -d abdera mydata.xml

15

Page 20: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

16 Chapter 7. Simple CLI utility

Page 21: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 8

Roadmap

• Test cases for Unicode

• Support for namespaces and namespace prefixes

• Support XML comments

17

Page 22: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

18 Chapter 8. Roadmap

Page 23: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

CHAPTER 9

More information

9.1 Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

9.1.1 Types of Contributions

Report Bugs

Report bugs at https://github.com/sanand0/xmljson/issues.

If you are reporting a bug, please include:

• Your operating system name and version.

• Any details about your local setup that might be helpful in troubleshooting.

• Detailed steps to reproduce the bug.

Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “feature” is open to whoever wants to implementit.

19

Page 24: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

Write Documentation

xmljson could always use more documentation, whether as part of the official xmljson docs, in docstrings, or even onthe web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/sanand0/xmljson/issues.

If you are proposing a feature:

• Explain in detail how it would work.

• Keep the scope as narrow as possible, to make it easier to implement.

• Remember that this is a volunteer-driven project, and that contributions are welcome :)

9.1.2 Get Started!

xmljson runs on Python 2.6+ and Python 3+ in any OS. To set up the development environment:

1. Fork the xmljson repo

2. Clone your fork locally:

git clone [email protected]:your_user_id/xmljson.git

3. Install your local copy into a virtualenv. If you have virtualenvwrapper installed, this is how you set up yourfork for local development:

$ mkvirtualenv xmljson$ cd xmljson/$ python setup.py develop

4. Create a branch for local development:

git checkout -b <branch-name>

Now you can make your changes locally.

5. When you’re done making changes, check that your changes pass flake8 and the tests, as well as provide rea-sonable test coverage:

make release-test

Note: This uses the python.exe in your PATH. To change the Python used, run:

export PYTHON=/path/to/python # e.g. path to Python 3.4+

6. Commit your changes and push your branch to GitHub. Then send a pull request:

$ git add .$ git commit -m "Your detailed description of your changes."$ git push --set-upstream origin <branch-name>

7. To delete your branch:

20 Chapter 9. More information

Page 25: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

git branch -d <branch-name>git push origin --delete <branch-name>

9.1.3 Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

1. The pull request should include tests.

2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a functionwith a docstring, and add the feature to the list in README.rst.

3. The pull request should work for Python 2.7 and 3.4.

9.1.4 Release

1. Test the release by running:

make release-test

2. Update __version__ = x.x.x in xmljson

3. Update HISTORY.rst with changes

3. Commit, create an annotated tag and push the code:

git commit . -m"Release vx.x.x"git tag -a vx.x.xgit push --follow-tags

4. To release to PyPi, run:

make cleanpython setup.py sdist bdist_wheel --universaltwine upload dist/*

9.2 Credits

9.2.1 Development Lead

• S Anand <[email protected]>

9.2.2 Contributors

• Dag Wieers <[email protected]>

9.2. Credits 21

Page 26: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

9.3 History

9.3.1 0.2.0 (21 Nov 2018)

• xmljson command line script converts from XML to JSON (@tribals)

• invalid_tags='drop' in the constructor drops invalid XML tags in .etree() (@Zurga)

• Bugfix: Parker converts {‘x’: null} to <x></x> instead of <x>None</x> (@jorndoe #29)

9.3.2 0.1.9 (1 Aug 2017)

• Bugfix and test cases for multiple nested children in Abdera convention

Thanks to @mukultaneja

9.3.3 0.1.8 (9 May 2017)

• Add Abdera and Cobra conventions

• Add Parker.data(preserve_root=True) option to preserve root element in Parker convention.

Thanks to @dagwieers

9.3.4 0.1.6 (18 Feb 2016)

• Add xml_fromstring= and xml_tostring= parameters to constructor to customise string conversionfrom and to XML.

9.3.5 0.1.5 (23 Sep 2015)

• Add the Yahoo XML to JSON conversion method.

9.3.6 0.1.4 (20 Sep 2015)

• Fix GData.etree() conversion of attributes. (They were ignored. They should be added as-is.)

9.3.7 0.1.3 (20 Sep 2015)

• Simplify {'p': {'$': 'text'}} to {'p': 'text'} in BadgerFish and GData conventions.

• Add test cases for .etree() – mainly from the MDN JXON article.

• dict_type/list_type do not need to inherit from dict/list

22 Chapter 9. More information

Page 27: xmljson Documentation · 2 Convert data to XML 5 3 Convert XML to data 7 4 Conventions 9 5 Options 11 6 Installation 13 7 Simple CLI utility 15 ... Use "$"for text content, @to prefix

xmljson Documentation, Release 0.2.0

9.3.8 0.1.2 (18 Sep 2015)

• Always use the dict_type class to create dictionaries (which defaults to OrderedDict to preserve orderof keys)

• Update documentation, test cases

• Remove support for Python 2.6 (since we need collections.Counter)

• Make the Travis CI build pass

9.3.9 0.1.1 (18 Sep 2015)

• Convert true, false and numeric values from strings to Python types

• xmljson.parker.data() is compliant with Parker convention (bugs resolved)

9.3.10 0.1.0 (15 Sep 2015)

• Two-way conversions via BadgerFish, GData and Parker conventions.

• First release on PyPI.

9.4 Indices and tables

• genindex

• modindex

• search

9.4. Indices and tables 23