Basic Usage

This document describes how to get started with eeweather.

Matching to weather stations

EEweather is designed to support the process of finding sources of data that correspond to particular sites. As there are many approaches to this process of matching, the EEweather package is designed to be flexible.

EEweather provides sensible default mappings from geographical markers to weather stations so that it can be used out of the box.

EEweather uses lat/long coordinates as targets for weather matching. This method is described below.

Latitude/Longitude Coordinates

The recommended way to find the weather station(s) that correspond to a particular site is to use the lat-long coordinates of that site.

Example usage:

>>> import eeweather
>>> ranked_stations = eeweather.rank_stations(35, -95)
>>> station, warnings = eeweather.select_station(ranked_stations)
>>> station
ISDStation('720627')
>>> ranked_stations.loc[station.usaf_id]
rank                                     1
distance_meters                    32692.7
latitude                            35.283
longitude                            -95.1
iecc_climate_zone                        3
iecc_moisture_regime                     A
ba_climate_zone                Mixed-Humid
ca_climate_zone                       None
rough_quality                          low
elevation                            183.2
state                                   OK
tmy3_class                            None
is_tmy3                              False
is_cz2010                            False
difference_elevation_meters           None
Name: 720627, dtype: object
>>> warnings
[]

That particular result has no associated warnings, but other mappings may have associated warnings, such as the mapping from this point which is in the middle of the Gulf of Mexico, 700km away from the nearest weather station and outside of the climate zone boundary:

>>> ranked_stations = eeweather.rank_stations(20, -95)
>>> station, warnings = eeweather.select_station(ranked_stations)
>>> warnings
['Distance from target to weather station is greater than 50km.', 'Distance from target to weather station is greater than 200km.']

ZIP Code Tabulation Areas (ZCTAs)

ZIP codes are often abused as rough geographic markers. They are not particularly well set up be used as the basis of a GIS system - some ZIP codes correspond to single buildings or post-offices, some cover thousands of square miles of land. The US Census Bureau transforms census blocks into what they call ZIP Code Tabulation Areas, and use these instead. There are roughly 10k ZIP codes that are not used as ZCTAs, and ZCTAs do not correspond directly to ZIP codes, but for matching to weather stations, which are much sparser than ZIP codes, this rough mapping is usually sufficient. Often tens or hundreds of ZCTAs will be matched to the same weather station. We provide a function eeweather.zcta_to_lat_long which allows for a ZCTA to be converted into a latitude and longitude (the centroid of the ZCTA) which can be used to match to a weather station using the latitude/longitude method mentioned above.

_images/station-mapping.png

Note

The default mapping concentrates on weather stations in US states (including AK, HI) and territories, including PR, GU, VI etc).

Example usage:

>>> lat, long = eeweather.zcta_to_lat_long('91104')
>>> lat, long
(34.1678418058534, -118.123485581459)

Obtaining temperature data

These matching results carry a reference to a weather station object. The weather station object has some associated metadata and - most importantly - has methods for obtaining weather data.

Let’s look at the station object from above:

>>> station = result.isd_station
>>> station
ISDStation('722178')

This ISDStation object carries information about that station and methods for fetching corresponding weather data.

The .json() method gives a quick summary of associated metadata in a format that can easily be serialized:

>>> import json
>>> print(json.dumps(station.json(), indent=2)
{
  "elevation": 137.5,
  "latitude": 35.021,
  "longitude": -94.621,
  "icao_code": "KRKR",
  "name": "ROBERT S KERR AIRPORT",
  "quality": "high",
  "wban_ids": [
    "53953",
    "99999"
  ],
  "recent_wban_id": "53953",
  "climate_zones": {
    "iecc_climate_zone": "3",
    "iecc_moisture_regime": "A",
    "ba_climate_zone": "Mixed-Humid",
    "ca_climate_zone": null
  }
}

Most of these are also stored as attributes on the object:

>>> station.usaf_id
'722178'
>>> station.latitude, station.longitude
(35.021, -94.621)
>>> station.coords
(35.021, -94.621)
>>> station.name
'ROBERT S KERR AIRPORT'
>>> station.iecc_climate_zone
'3'
>>> station.iecc_moisture_regime
'A'

In addition to these simple attributes there are a host of methods that can be used to fetch temperature data. The simplest are these, which return pandas.Series objects. The start and end date timezones must be explicilty set to UTC.

Note that this temperature data is given in degrees Celsius, not Fahrenheit. (\(T_F = T_C \cdot 1.8 + 32\)), and that the pd.Timestamp index is given in UTC.

ISD temperature data as an hourly time series:

>>> import datetime
>>> import pytz
>>> start_date = datetime.datetime(2016, 6, 1, tzinfo=pytz.UTC)
>>> end_date = datetime.datetime(2017, 9, 15, tzinfo=pytz.UTC)
>>> tempC = station.load_isd_hourly_temp_data(start_date, end_date)
>>> tempC.head()
2016-06-01 00:00:00+00:00    21.3692
2016-06-01 01:00:00+00:00    20.6325
2016-06-01 02:00:00+00:00    19.4858
2016-06-01 03:00:00+00:00    19.0883
2016-06-01 04:00:00+00:00    18.8858
Freq: H, dtype: float64
>>> tempF = tempC * 1.8 + 32
>>> tempF.head()
2016-06-01 00:00:00+00:00    70.46456
2016-06-01 01:00:00+00:00    69.13850
2016-06-01 02:00:00+00:00    67.07444
2016-06-01 03:00:00+00:00    66.35894
2016-06-01 04:00:00+00:00    65.99444

ISD temperature data as a daily time series:

>>> tempC = station.load_isd_daily_temp_data(start_date, end_date)
>>> tempC.head()
2016-06-01 00:00:00+00:00    21.329063
2016-06-02 00:00:00+00:00    21.674583
2016-06-03 00:00:00+00:00    22.434306
2016-06-04 00:00:00+00:00    22.842674
2016-06-05 00:00:00+00:00    21.850521
Freq: D, dtype: float64
>>> tempF = tempC * 1.8 + 32
>>> tempF.head()
2016-06-01 00:00:00+00:00    70.392313
2016-06-02 00:00:00+00:00    71.014250
2016-06-03 00:00:00+00:00    72.381750
2016-06-04 00:00:00+00:00    73.116813
2016-06-05 00:00:00+00:00    71.330937
Freq: D, dtype: float64

GSOD temperature data as a daily time series:

>>> tempC = station.load_gsod_daily_temp_data(start_date, end_date)
>>> tempC.head()
2016-06-01 00:00:00+00:00    21.111111
2016-06-02 00:00:00+00:00    21.833333
2016-06-03 00:00:00+00:00    22.277778
2016-06-04 00:00:00+00:00    22.777778
2016-06-05 00:00:00+00:00    21.833333
Freq: D, dtype: float64
>>> tempF = temps * 1.8 + 32
>>> tempF.head()
2016-06-01 00:00:00+00:00    70.0
2016-06-02 00:00:00+00:00    71.3
2016-06-03 00:00:00+00:00    72.1
2016-06-04 00:00:00+00:00    73.0
2016-06-05 00:00:00+00:00    71.3
Freq: D, dtype: float64

This station does not contain TMY3 data. To require that TMY3 data is available at the matched weather station, restrict the ranked weather stations to only those which have TMY3 data:

>>> ranked_stations = eeweather.rank_stations(35, -95, is_tmy3=True)
>>> station, warnings = eeweather.select_station(ranked_stations)
>>> station
ISDStation('723440')

TMY3 temperature data as an hourly time series:

>>> tempC = station.load_tmy3_hourly_temp_data(start_date, end_date)
>>> tempC.head()

2016-06-01 00:00:00+00:00    26.7
2016-06-01 01:00:00+00:00    26.3
2016-06-01 02:00:00+00:00    26.0
2016-06-01 03:00:00+00:00    25.6
2016-06-01 04:00:00+00:00    25.3
Freq: D, dtype: float64
>>> tempF = temps * 1.8 + 32
>>> tempF.head()
2016-06-01 00:00:00+00:00    80.06
2016-06-01 01:00:00+00:00    79.34
2016-06-01 02:00:00+00:00    78.80
2016-06-01 03:00:00+00:00    78.08
2016-06-01 04:00:00+00:00    77.54
Freq: D, dtype: float64

A similar restriction can be made for CZ2010 stations, which are specific to California:

>>> ranked_stations = eeweather.rank_stations(35, -95, is_cz2010=True)
>>> station, warnings = eeweather.select_station(ranked_stations)
>>> station
ISDStation('723805')

CZ2010 temperature data as an hourly time series:

>>> tempC = station.load_cz2010_hourly_temp_data(start_date, end_date)
>>> tempC.head()
2016-06-01 00:00:00+00:00    26.7
2016-06-01 01:00:00+00:00    26.3
2016-06-01 02:00:00+00:00    26.0
2016-06-01 03:00:00+00:00    25.6
2016-06-01 04:00:00+00:00    25.3
Freq: D, dtype: float64
>>> tempF = temps * 1.8 + 32
>>> tempF.head()
2016-06-01 00:00:00+00:00    80.06
2016-06-01 01:00:00+00:00    79.34
2016-06-01 02:00:00+00:00    78.80
2016-06-01 03:00:00+00:00    78.08
2016-06-01 04:00:00+00:00    77.54
Freq: H, dtype: float64

The station ranking function eeweather.rank_stations has many more options, including distance restriction and climate zone restriction, which may come in handy.

If desired, eeweather.ISDStation objects can also be created directly:

>>> eeweather.ISDStation('722880')
ISDStation('722880')

If the station is not recognized, an error will be thrown:

>>> eeweather.ISDStation('BAD_STATION')
...
eeweather.exceptions.UnrecognizedUSAFIDError: BAD_STATION