README
¶
Examples of using ValuesFromTagPath().
A number of interesting examples have shown up in the gonuts discussion group
that could be handled - after a fashion - using the ValuesFromTagPath() function.
gonuts1.go -
Here we see that the message stream has a problem with multiple tag spellings,
though the message structure remains constant. In this example we 'anonymize'
the tag with the variant spellings.
mm := x2j.ValuesFromTagPath(doc,"data.*)
where '*' is any possible spelling - "netid" or "idnet"
and the result is a list with 1 member of map[string]interface{} type.
Once we've retrieved the map, we can parse it using the known keys - "disable",
"text1" and "word1".
gonuts2.go -
This is an interesting case where there was a need to handle messages with lists
of "ClaimStatusCodeRecord" entries as well as messages with NONE. (Here we see
some of the vagaries of dealing with mixed messages that are verging on becoming
anonymous.)
strXml - the message with two ClaimStatusCodeRecord entries
strXmlA - the message with one ClaimStatusCodeRecord entry
strXml2 - the message with NO ClaimStatusCodeRecord entries
ValuesFromTagPath() options:
path == "Envelope.Body.GetClaimStatusCodesResponse.GetClaimStatusCodesResult.ClaimStatusCodeRecord"
for doc == strXml:
returns: a list - []interface{} - with two values of map[string]interface{} type
for doc == strXmlA:
returns: a list - []interface{} - with one map[string]interface{} type
for doc == strXml2:
returns 'nil' - no values
path == "*.*.*.*.*"
for doc == strXml:
returns: a list - []interface{} - with two values of map[string]interface{} type
path == "*.*.*.*.*.Description
for doc == strXml:
returns: a list - []interface{} - with two values of string type, the individual
values from parsing the two map[string]interface{} values where key=="Description"
path == "*.*.*.*.*.*"
for doc == strXml:
returns: a list - []interface{} - with six values of string type, the individual
values from parsing all keys in the two map[string]interface{} values
IN GENERAL
Think of the wildcard character "*" as anonymizing the tag in the position of the path where
it occurs. If we have a message that looks like this.
var doc = `
<books>
<book seq="1">
<author>William H. Gaddis</author>
<title>The Recognitions</title>
<review>One of the great seminal American novels of the 20th century.</review>
</book>
<book seq="2">
<author>Austin Tappan Wright</author>
<title>Islandia</title>
<review>An example of earlier 20th century American utopian fiction.</review>
</book>
<book seq="3">
<author>John Hawkes</author>
<title>The Beetle Leg</title>
<review>A lyrical novel about the construction of Ft. Peck Dam in Montana.</review>
</book>
<book seq="4">
<author>T.E. Porter</author>
<title>King's Day</title>
<review>A magical novella.</review>
</book>
</books>
`
path == "books"
return a list with one members, map[string]interface{}, with a single key 'book' and
whose value is a list.
path == "books.book"
return a list of four members, map[string]interface{} - each map[string]interface{} is
a 'book' entry
path == "books.*"
return a list of four members, map[string]interface{} - each map[string]interface{} is
a 'book' entry [the same as "books.book"]
path == "books.*.title"
return a list of four members, interface{} - interface{} is the title of a book in
the list of 'books.book' [the same as "books.book.title"]
path == "books.*.*"
return a list of twelve members, interface{} - interface{} is the title, author, or review
of a book in the list of 'books.book' [the same as "books.book.*"]
NOTES.
Attributes.
By default interface{} values do not include attributes; though they do show up in maps.
So, if you're looking for a particular values based on attribute values, retrieve the entries
as map[string]interface{} values and parse them based on attribute key values. (Attribute
keys are have '-' prepended to their name.)
The gonuts3.go example illustrates extracting all attribute values in a doc associated with a
specific tag value.
Lists of entries vs. single entries.
The gonuts2.go example highlights some of the ambiguity that must be managed when dealing with
XML messages anonymously using the x2j package.
- for strXml "ClaimStatusCodeRecord" has a []interface{} value
- for strXmlA "ClaimStatusCodeRecord" has a map[string]interface{} value
ValuesFromTagPath() tries to normalize this by returning the []interface{} value as separate
list members rather than the absolute value in the map representation. So the same logic
could be used to parse the entries no matter how many values are returned - other than 'nil'.
"...ClaimStatusCodeRecord.*" will return all [non-attribute] values without their tags, so it's
a little hard to know what you've got. It is recommended that the wildcard - "*" - not be
used at the end of a path for that very reason.
EAT YOUR OWN DOG FOOD ...
I needed to convert a large (14.9 MB) XML data set from an Eclipse metrics report on an
application that had 355,100 lines of code in 211 packages into CSV data sets. The report
included application-, package-, class- and method-level metrics reported in an element,
"Value", with varying attributes.
<Value value=""/>
<Value name="" package="" value=""/>
<Value name="" source="" package="" value=""/>
<Value name="" source="" package="" value="" inrange=""/>
In addition, the metrics were reported with two different "Metric" compound elements:
<Metrics>
<Metric id="" description="">
<Values>
<Value.../>
...
</Values>
</Metric>
...
<Metric id="" description="">
<Value.../>
</Metric>
...
</Metrics>
Using the x2j package seemed a more straightforward approach than using Go vernacular
and the standard xml package. I wrote the program getmetrics.go to do this.
Note that the call to ValuesFromKeyPath()
metricVals := x2j.ValuesFromKeyPath(m, "Metrics.Metric", true)
requests that attributes be returned by including 'true' as the third argument. After that
everything is pretty straight forward. The metrics for processing over 120,000 "Value"
elements (Intel Core i7, 8 GB, OS X 10.8, Solid State drive) are as follows.
2013-07-24 06:59:54.97486625 -0500 CDT ... File Opened: static_analysis.xml
2013-07-24 06:59:54.982837648 -0500 CDT ... File Read - size: 14863568
2013-07-24 06:59:58.656195538 -0500 CDT ... XML Unmarshaled - len: 1
2013-07-24 06:59:58.65622715 -0500 CDT ... ValuesFromKeyPath - len: 23
2013-07-24 06:59:58.657157356 -0500 CDT id: VG desc: McCabe Cyclomatic Complexity len(Values): 24562
2013-07-24 06:59:58.996143039 -0500 CDT id: PAR desc: Number of Parameters len(Values): 24562
2013-07-24 06:59:59.333407792 -0500 CDT id: NBD desc: Nested Block Depth len(Values): 24562
2013-07-24 06:59:59.672552235 -0500 CDT id: CA desc: Afferent Coupling len(Values): 211
2013-07-24 06:59:59.675736869 -0500 CDT id: CE desc: Efferent Coupling len(Values): 211
2013-07-24 06:59:59.678760375 -0500 CDT id: RMI desc: Instability len(Values): 211
2013-07-24 06:59:59.682004251 -0500 CDT id: RMA desc: Abstractness len(Values): 211
2013-07-24 06:59:59.684985741 -0500 CDT id: RMD desc: Normalized Distance len(Values): 211
2013-07-24 06:59:59.688191982 -0500 CDT id: DIT desc: Depth of Inheritance Tree len(Values): 2074
2013-07-24 06:59:59.717945046 -0500 CDT id: WMC desc: Weighted methods per Class len(Values): 2074
2013-07-24 06:59:59.747678025 -0500 CDT id: NSC desc: Number of Children len(Values): 2074
2013-07-24 06:59:59.778378946 -0500 CDT id: NORM desc: Number of Overridden Methods len(Values): 2074
2013-07-24 06:59:59.80913273 -0500 CDT id: LCOM desc: Lack of Cohesion of Methods len(Values): 2074
2013-07-24 06:59:59.838893489 -0500 CDT id: NOF desc: Number of Attributes len(Values): 2074
2013-07-24 06:59:59.868573348 -0500 CDT id: NSF desc: Number of Static Attributes len(Values): 2074
2013-07-24 06:59:59.89918479 -0500 CDT id: NOM desc: Number of Methods len(Values): 2074
2013-07-24 06:59:59.92889002 -0500 CDT id: NSM desc: Number of Static Methods len(Values): 2074
2013-07-24 06:59:59.959311932 -0500 CDT id: SIX desc: Specialization Index len(Values): 2074
2013-07-24 06:59:59.988859435 -0500 CDT id: NOC desc: Number of Classes len(Values): 211
2013-07-24 06:59:59.992076575 -0500 CDT id: NOI desc: Number of Interfaces len(Values): 211
2013-07-24 06:59:59.995418021 -0500 CDT id: NOP desc: Number of Packages len(Value): 1
2013-07-24 06:59:59.995971956 -0500 CDT id: TLOC desc: Total Lines of Code len(Value): 1
2013-07-24 06:59:59.996635113 -0500 CDT id: MLOC desc: Method Lines of Code len(Values): 24562
Documentation
¶
There is no documentation for this package.
Click to show internal directories.
Click to hide internal directories.