Functions used to discover and explore the data exposed by ISTAT webservice.
This module implements functions to discover the data exposed by ISTAT. To do so, istatapi make metadata requests to the API endpoints. The Discovery module provides useful methods to parse and analyze API metadata responses. It makes use of the library pandas and returns data in the DataFrame format, making it convenient for interactive and exploratory analysis in Jupyter Notebooks.
The main class implemented in the Discovery module is DataSet.
parse the response containing all the available datasets and return a list of dataflows.
The simplest way to get a full list of the dataflows provided by ISTAT is to call the method all_available() which returns a list of all the explorable dataflows, together with their IDs and descriptions.
ds2 = DataSet(dataflow_identifier="22_289")test_eq(ds2.identifiers['df_id'], '22_289')test_eq(ds2.identifiers['df_description'], 'Resident population on 1st January')test_eq(ds2.identifiers['df_structure_id'], 'DCIS_POPRES1')
# test Dataset 729_1050 (https://github.com/Attol8/istatapi/issues/24)assertlen(available_datasets.query('df_id == "729_1050"')) ==1# test that it raises ValueError if no dataset is foundtest_fail(lambda: DataSet(dataflow_identifier="729_1050"), contains="No available data found for the requested query")
ds2.dimensions_info()
dimension
dimension_ID
description
0
FREQ
CL_FREQ
Frequency
1
ETA
CL_ETA1
Age class
2
ITTER107
CL_ITTER107
Territory
3
SESSO
CL_SEXISTAT1
Gender
4
STACIVX
CL_STATCIV2
Marital status
5
TIPO_INDDEM
CL_TIPO_DATO15
Data type 15
we can look at the dimensions of a dataflow by simply accessing its attribute dimensions. However, we won’t have dimensions’ descriptions here.
Return the dimensions of a specific dataflow and their descriptions.
To have a look at the dimensions together with their description, we can use the dimension_info function. It will return an easy to read pandas DataFrame.
The values that the different dimensions can take can also be explored. The available_values attribute contains a dictionary with the dimensions of the dataset as keys. The values of the dictionary are themselves dictionaries which can be accessed through the values_ids and values_description keys. The former key returns an ID of the dimension’s values, the latter a description of these values.
set filters for the dimensions of the dataset by passing dimension_name=value
# test dataset from https://github.com/Attol8/istatapi/issues/25ds = DataSet(dataflow_identifier ="155_358")assert'WAGE_E_2021'notin ds.available_values['TIP_AGGR1']['values_ids']
With DataSet.set_filters() we can filter the dimensions of the dataset by passing the values that we want to filter for. The dataset will then only return data containing our filters. A dictionary with the selected filters is contained in the attribute DataSet.filters.
Note that the arguments of DataSet.set_filters are lower case letters, but in DataSet.filters they are converted to upper case to be consistent with dimension names on ISTAT API.
dz = DataSet(dataflow_identifier="139_176")dz.set_filters(freq="M", tipo_dato=["ISAV", "ESAV"], paese_partner="WORLD")test_eq(dz.filters['FREQ'], 'M')test_eq(dz.filters['TIPO_DATO'], ["ISAV", "ESAV"])test_fail(lambda: dz.filters['freq']) #the filter is not saved in lower case