11  Data formats

Vector data is one of the two primary types of geospatial data (the other one being raster data). In this section we will review what vector data is and go over some common data formats for it.

11.1 Points, lines, and polygons

Vector data represents specific features on the Earth’s surface. There are three types of vector data:

  • Points: each point has a single x,y location. Examples of data that can be represented as point vector data are sampling locations or animal sightings.

  • Lines: a line is composed of at least two points that are connected. Roads and streams are commonly depicted as line vector data.

  • Polygons: polygons are sets of three or more vertices that are connected and form a closed region. Political boundaries (outlines of countries, states, cities, etc) are examples of polygon vector data.

Each item in the vector data is usually referred to as a feature. So each point would be a feature, each polygon is a feature, etc.

Image Source: National Ecological Observatory Network (NEON)

In addition to the geospatial information stored, vector data can include attributes that describe each feature. For example, a vector dataset where each feature is a polygon representing the boundary of a state could have as attributes the population and are of the state.

Image Source: National Ecological Observatory Network (NEON)

11.2 Shapefiles

One of the most popular formats to store vector data is the shapefile data format. The shapefile format is developed and maintained by the Environmental Systems Research Institute (Esri).

So far we’ve been working with data that comes stored in a single file, like a csv or txt file for tabular data. A shapefile is actually a collection of files that interact together to create a single data file. All the files that make up a shapefile need to have the same name (different extensions) and be in the same directory. For our shapefiles to work we need at least these three files:

  • .shp: shape format, this file has the geometries for all features.
  • .shx: shape index format, this file indexes the features
  • .dbf: attribute format, this file stores the attributes for features as a table

Sometimes shapefiles will have additional files, including:

  • .prj: a file containing information about the projection and coordinate reference system
  • .sbn and .sbx: files that contain a spatial index of the features
  • .shp.xml: geospatial metadata in XML format.

Check the Wikipedia page about shapefiles to see a more extensive list of files associated to shapefiles.

File management

Remember: when working with a shapefile all the associated files must have the same name (different extensions) and be located in the same directory.

Different vector types = separate shapefiles

Each shapefile can only hold one type of vector data. So only points, only lines, or only polygons can be inside a single shapefile.

11.3 GeoJSON

GeoJSON, which stands for Geographic JavaScript Object Notation, is an open format for encoding vector data (points, lines, polygons, and multipolygons) and their attributes. It is a popular format for web mapping applications. The GeoJSON format uses a single file, with extension .json or .geojson.

While shapefiles can be in any coordinate reference system, the GeoJSON specification requires GeoJSON files to use the World Geodetic System 1984 (WGS84) and all points being expressed as longitude and latitude units of decimal degrees.

Data in a GeoJSON is stored as attribute-value pairs (think dictionaries!) and lists. The following are examples of how points, lines, and polygons are represented as GeoJSON features:

Source: Wikipedia

The followng is an example of a full GeoJSON file. Notice multiple types of geometries can be mixed within the same file:

# Source: Wikipedia
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [102.0, 0.5]
      },
      "properties": {
        "prop0": "value0"
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [102.0, 0.0],
          [103.0, 1.0],
          [104.0, 0.0],
          [105.0, 1.0]
        ]
      },
      "properties": {
        "prop0": "value0",
        "prop1": 0.0
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [100.0, 0.0],
            [101.0, 0.0],
            [101.0, 1.0],
            [100.0, 1.0],
            [100.0, 0.0]
          ]
        ]
      },
      "properties": {
        "prop0": "value0",
        "prop1": { "this": "that" }
      }
    }
  ]
}

The website https://geojson.io/ is a nice tool to easily create GeoJSON files.

11.4 References

Data Carpentires - Introduction to Geospatial Concepts