python常用的解析xml库

在python脚本或者其他应用场景,对xml文件进行读写和修改再正常不过;
常用的xml解析库有:
[ElementTree](https://docs.python.org/2.7/library/xml.etree.elementtree.html?highlight=elementtree#module-xml.etree.ElementTree)
[Beautiful Soup](http://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id57)

优缺点分析:

ElementTree 基本使用

1
2
3
import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

问题集锦

namespace xml的命名空间问题

ElementTree 解析xml命名空间时

如以下的:
1
<resources xmlns:tools="http://schemas.android.com/tools" tools:ignore="MissingTranslation" xmlns:xliff="urn:oasis:names:tc:xliff:document:1.2">
解析出来的属性值为:
1
{'{http://schemas.android.com/tools}ignore': 'MissingTranslation'}
1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="utf-8"?>
<resources xmlns:tools="http://schemas.android.com/tools" tools:ignore="MissingTranslation" xmlns:xliff="urn:oasis:names:tc:xliff:document:1.2">
<!-- 应用黑白名单 -->
<string name="back_app_fork_stop">The application has been blacklisted and unable to start,It is recommended to</string>
<string name="back_app_uninstall">uninstall</string>
<string name="timer_format" translatable="false"></string>
<string name="audio_db_title_format"><xliff:g id="format">yyyy-MM-dd HH:mm:ss</xliff:g></string>
</resources>

解析后的

1
2
3
4
5
6
7
8
<?xml version='1.0' encoding='utf-8'?>
<resources xmlns:ns0="http://schemas.android.com/tools" xmlns:ns1="urn:oasis:names:tc:xliff:document:1.2" ns0:ignore="MissingTranslation">
<string name="back_app_fork_stop">dsadfqwef</string>
<string name="back_app_uninstall">dsadfqwef</string>
<string name="timer_format" translatable="false">dsadfqwef</string>
<string name="audio_db_title_format">dsadfqwef<ns1:g id="format">yyyy-MM-dd HH:mm:ss</ns1:g></string>
</resources>

虽然也是正确,但是把命名空间对应为了ns1


[ElementTree: Working with Namespaces and Qualified Names](http://effbot.org/zone/element-namespaces.htm)
这边英文文章解释的很细致,唯一的缺点就是太长了;

所以关键点就是,注册命名空间

1
2
ET.register_namespace("xliff","urn:oasis:names:tc:xliff:document:1.2")
ET.register_namespace("tools","http://schemas.android.com/tools")
这就可以正确输出

###