Friday, July 29, 2022
HomeiOS developmentSanitizing GPX information for public sharing – Ole Begemann

Sanitizing GPX information for public sharing – Ole Begemann


GPX is a well-liked XML format for operating or biking tracks with geocoordinates. This can be a how-to for cleansing up a GPX file by eradicating undesirable or privacy-sensitive data.

Many apps that report exercise routes and may export them as GPX information embody extra knowledge than the plain GPS coordinates. As an illustration, a GPX file from my favourite recording app, Guru Maps, seems to be like this:

<?xml model="1.0" encoding="utf-8"?>
<gpx model="1.1" creator="Guru Maps/4.5.2" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  xmlns="http://www.topografix.com/GPX/1/1" 
  xmlns:gom="https://gurumaps.app/gpx/v2" 
  xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd https://gurumaps.app/gpx/v2 https://gurumaps.app/gpx/v2/schema.xsd">
  <trk>
    <identify>Barnimer Dörferweg</identify>
    <sort>TrackStyle_FF7F00C8</sort>
    <trkseg>
      <trkpt lat="52.6254614634" lon="13.4092010169">
        <ele>54.238586451</ele>
        <time>2020-05-10T05:30:38.997Z</time>
        <hdop>4.6875</hdop>
        <vdop>3.375</vdop>
        <extensions>
          <gom:velocity>5.5661926282</gom:velocity>
          <gom:course>329.1938658731</gom:course>
        </extensions>
      </trkpt><!-- 1000's of monitor factors -->

This monitor consists of the next properties for every monitor level:

  • Geocoordinates (latitude and longitude)
  • Elevation
  • Timestamp
  • Horizontal and vertical dilution of precision (hdop/vdop)
  • Present velocity
  • Present course/heading

Plus, Guru Maps makes use of the monitor’s <sort> attribute to encode the colour of the monitor as displayed within the app in a non-standardized format (TrackStyle_FF7F00C8).

Some apps additionally embody coronary heart charge or different health measurements.

All this knowledge is helpful for archiving tracks or importing them into one other app. However earlier than sharing this monitor publicly, I’d wish to clear the info up first:

  • The one actually vital items of knowledge are the coordinates and presumably the elevation.
  • Timestamps are personal knowledge. I don’t wish to share these.
  • The opposite measurements are largely irrelevant.

GPX information can turn into fairly giant (1000’s of monitor factors is widespread), so decreasing the quantity of knowledge can be good for file sizes and parsing efficiency.

Necessities

  1. XmlStarlet

    I exploit Xml to do a lot of the XML processing. On macOS, you may set up XMLStarlet through Homebrew:

  2. xmllint

    One non-obligatory processing step makes use of xmllint, which comes preinstalled on macOS.

  3. XSLT file for eradicating unused namespaces

    Lastly, obtain this XSLT file remove-unused-namespaces.xslt, both from this Gist or from my server. We’re gonna use it in a single processing step to strip unused namespaces from the GPX file.

    Unique supply: Dimitre Novatchev on Stack Overflow.

Operating the command

Assuming your supply file is known as enter.gpx and the XSLT file you downloaded above is within the present listing, that is the complete command to course of the GPX file and save the outcome to output.gpx:

xmlstarlet ed 
  -d "//_:extensions" 
  -d "/_:gpx/_:metadata/_:time" 
  -d "/_:gpx/_:trk/_:sort" 
  -d "//_:trkpt/_:time" 
  -d "//_:trkpt/_:hdop" 
  -d "//_:trkpt/_:vdop" 
  -d "//_:trkpt/_:pdop" 
  -u "/_:gpx/@creator" -v "Shell script" 
  enter.gpx 
  | xmlstarlet tr remove-unused-namespaces.xslt - 
  | xmlstarlet ed -u "/_:gpx/@xsi:schemaLocation" -v "http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd" 
  | xmllint --c14n11 --pretty 2 - 
  > output.gpx

This sequence performs the next steps:

  • Delete all <extensions> parts.
  • Delete the timestamp from the file’s <metadata> part if current.
  • Delete the <trk><sort> aspect.
  • Delete the <time>, <hdop>, <vdop>, and <pdop> parts from all monitor factors.
  • Set the file’s creator attribute.
  • Now that extension fields are gone, take away all unused XML namespaces from the file header.
  • Delete all xsi:schemaLocation entries besides the one for the GPX schema.
  • Run the file by way of xmllint for formatting. The --c14n11 possibility performs XML Canonicalization (C14N). Amongst many different issues, canonicalization replaces numeric character entities within the XML with their regular Unicode characters, which is vital for my use case.

    For instance, the textual content “Dörferweg” within the supply would turn into “Dörferweg”. I discovered that among the instruments I exploit insert non-ASCII characters as numeric codes and different instruments don’t show these appropriately.

The processed GPX file seems to be like this:

<gpx xmlns="http://www.topografix.com/GPX/1/1" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
  creator="Shell script" model="1.1" 
  xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd">
  <trk>
    <identify>Barnimer Dörferweg</identify>
    <trkseg>
      <trkpt lat="52.6254614634" lon="13.4092010169">
        <ele>54.238586451</ele>
      </trkpt>
      <trkpt lat="52.6255090307" lon="13.4091548326">
        <ele>53.9600219977</ele>
      </trkpt>

The processing steps above are those that work for me given the apps I exploit. Your mileage could differ in case your instruments add different knowledge to your GPX information. Be at liberty to edit the command accordingly. XmlStarlet makes use of XPath syntax to pick which parts to function on. The xmlstarlet sel command is helpful for inspecting a supply file and attempting out the required XPath incantations.

Validation

Lastly, it’s a good suggestion to validate the processed GPX file towards the official GPX schema:

xmlstarlet val --quiet --err --xsd 
  http://www.topografix.com/GPX/1/1/gpx.xsd 
  output.gpx

Joyful processing!


PS: In the event you’re ever in Berlin, this can be a good lengthy bike route (55 km) with minimal automotive visitors. Begins and ends at Hauptbahnhof. Obtain the (sanitized) GPX file.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments