= get_datetime()
d assert isinstance(d, str)
assert len(d) > 0
Utils
gingado
Support for model documentation
get_datetime
get_datetime ()
Returns the time now
read_attr
read_attr (obj)
Read object type and values of attributes from fitted object
Details | |
---|---|
obj | Object from which to attributes will be read |
Function read_attr
helps gingado Documenters to read the object behind the scenes.
It collects the type of estimator, and any attributes resulting from fitting an object (in ie, those that end in “_” without being double underscores).
For example, the attributes of an untrained and a trained random forest are, in sequence:
from sklearn.ensemble import RandomForestRegressor
= RandomForestRegressor(n_estimators=3)
rf_unfit = RandomForestRegressor(n_estimators=3)\
rf_fit 1, 0], [0, 1]], [[0.5], [0.5]]) # random numbers
.fit([[list(read_attr(rf_unfit)), list(read_attr(rf_fit))
/var/folders/b9/p8z57lqd55xfk68xz34dg0s40000gn/T/ipykernel_45335/3975710638.py:3: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
.fit([[1, 0], [0, 1]], [[0.5], [0.5]]) # random numbers
/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute `n_features_` was deprecated in version 1.0 and will be removed in 1.2. Use `n_features_in_` instead.
warnings.warn(msg, category=FutureWarning)
([{'_estimator_type': 'regressor'}],
[{'_estimator_type': 'regressor'},
{'base_estimator_': DecisionTreeRegressor()},
{'estimators_': [DecisionTreeRegressor(max_features=1.0, random_state=1632148864),
DecisionTreeRegressor(max_features=1.0, random_state=1616501356),
DecisionTreeRegressor(max_features=1.0, random_state=2109419996)]},
{'feature_importances_': array([0., 0.])},
{'n_features_': 2},
{'n_features_in_': 2},
{'n_outputs_': 1}])
Support for time series
Objects of the class Lag
are similar to scikit-learn
’s transformers.
Lag
Lag (lags=1, jump=0, keep_contemporaneous_X=False)
A transformer that lags variables
Lag.fit
Lag.fit (X:numpy.ndarray, y=None)
Fit the Lag
transformer
Type | Default | Details | |
---|---|---|---|
X | ndarray | Array-like data of shape (n_samples, n_features) | |
y | NoneType | None | Array-like data of shape (n_samples,) or (n_samples, n_targets) or None |
Lag.transform
Lag.transform (X:numpy.ndarray)
Lag the dataset X
Type | Details | |
---|---|---|
X | ndarray | Array-like data of shape (n_samples, n_features) |
TransformerMixin.fit_transform
TransformerMixin.fit_transform (X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Type | Default | Details | |
---|---|---|---|
X | array-like of shape (n_samples, n_features) | Input samples. | |
y | NoneType | None | Target values (None for unsupervised transformations). |
fit_params | |||
Returns | ndarray array of shape (n_samples, n_features_new) | Transformed array. |
The code below demonstrates how Lag
works in practice. Note in particular that, because Lag
is a transformer, it can be used as part of a scikit-learn
’s Pipeline
.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
= np.random.rand(15, 2)
randomX = np.random.rand(15)
randomY
= 3
lags = 2
jump
= Pipeline([
pipe 'scaler', StandardScaler()),
('lagger', Lag(lags=lags, jump=jump, keep_contemporaneous_X=False))
( ]).fit_transform(randomX, randomY)
Below we confirm that the lagger removes the correct number of rows corresponding to the lagged observations:
assert randomX.shape[0] - lags - jump == pipe.shape[0]
And because Lag
is a transformer, its parameters (lags
and jump
) can be calibrated using hyperparameter tuning to achieve the best performance for a model.
Support for data augmentation with SDMX
please note that working with SDMX may take some minutes depending on the amount of information you are downloading.
list_SDMX_sources
list_SDMX_sources ()
Fetch the list of SDMX sources
= list_SDMX_sources()
sources print(sources)
assert len(sources) > 0
# all elements are of type 'str'
assert sum([isinstance(src, str) for src in sources]) == len(sources)
['ABS', 'ABS_XML', 'BBK', 'BIS', 'CD2030', 'ECB', 'ESTAT', 'ILO', 'IMF', 'INEGI', 'INSEE', 'ISTAT', 'LSD', 'NB', 'NBB', 'OECD', 'SGR', 'SPC', 'STAT_EE', 'UNICEF', 'UNSD', 'WB', 'WB_WDI']
list_all_dataflows
list_all_dataflows (codes_only:bool=False, return_pandas:bool=True)
List all SDMX dataflows. Note: When using as a parameter to an AugmentSDMX
object or to the load_SDMX_data
function, set codes_only=True
Type | Default | Details | |
---|---|---|---|
codes_only | bool | False | Whether to return only the dataflow codes |
return_pandas | bool | True | Whether to return the result in a pandas DataFrame format |
= list_all_dataflows(return_pandas=False)
dflows
assert isinstance(dflows, dict)
= list_SDMX_sources()
all_sources assert len([s for s in dflows.keys() if s in all_sources]) == len(dflows.keys())
2023-09-16 00:49:48,202 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:50:09,352 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:50:10,173 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:50:19,614 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:50:20,660 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
list_all_dataflows
returns by default a pandas Series, facilitating data discovery by users like so:
= list_all_dataflows(return_pandas=True)
dflows assert type(dflows) == pd.core.series.Series
dflows
2023-09-16 00:50:44,400 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:51:09,450 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:51:10,058 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:51:14,175 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:51:19,057 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
ABS_XML ABORIGINAL_POP_PROJ Projected population, Aboriginal and Torres St...
ABORIGINAL_POP_PROJ_REMOTE Projected population, Aboriginal and Torres St...
ABS_ABORIGINAL_POPPROJ_INDREGION Projected population, Aboriginal and Torres St...
ABS_ACLD_LFSTATUS Australian Census Longitudinal Dataset (ACLD):...
ABS_ACLD_TENURE Australian Census Longitudinal Dataset (ACLD):...
...
UNSD DF_UNData_UNFCC SDMX_GHG_UNDATA
WB DF_WITS_Tariff_TRAINS WITS - UNCTAD TRAINS Tariff Data
DF_WITS_TradeStats_Development WITS TradeStats Devlopment
DF_WITS_TradeStats_Tariff WITS TradeStats Tariff
DF_WITS_TradeStats_Trade WITS TradeStats Trade
Name: dataflow, Length: 3290, dtype: object
This format allows for more easily searching dflows
by source:
=True, return_pandas=True) list_all_dataflows(codes_only
2023-09-16 00:51:51,419 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:51:57,339 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:52:15,569 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:52:16,277 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
2023-09-16 00:52:18,956 pandasdmx.reader.sdmxml - DEBUG: Truncate sub-microsecond time in <Prepared>
ABS_XML 0 ABORIGINAL_POP_PROJ
1 ABORIGINAL_POP_PROJ_REMOTE
2 ABS_ABORIGINAL_POPPROJ_INDREGION
3 ABS_ACLD_LFSTATUS
4 ABS_ACLD_TENURE
...
UNSD 5 DF_UNData_UNFCC
WB 0 DF_WITS_Tariff_TRAINS
1 DF_WITS_TradeStats_Development
2 DF_WITS_TradeStats_Tariff
3 DF_WITS_TradeStats_Trade
Name: dataflow, Length: 3290, dtype: object
'BIS'] dflows[
WS_CBPOL_D Policy rates daily
WS_CBPOL_M Policy rates monthly
WS_CBS_PUB BIS consolidated banking
WS_CPMI_CASHLESS CPMI cashless payments (T5-6)
WS_CPMI_CT1 CPMI comparative tables type 1
WS_CPMI_CT2 CPMI comparative tables type 2
WS_CPMI_DEVICES CPMI payment devices
WS_CPMI_INSTITUTIONS CPMI institutions
WS_CPMI_MACRO CPMI Macro
WS_CPMI_PARTICIPANTS CPMI participants
WS_CPMI_SYSTEMS CPMI systems (T8-9-11-13-14-16-17-18-19)
WS_CREDIT_GAP BIS credit-to-GDP gaps
WS_DEBT_SEC2_PUB BIS debt securities
WS_DER_OTC_TOV OTC derivatives turnover
WS_DSR BIS debt service ratio
WS_EER_D BIS effective exchange rates daily
WS_EER_M BIS effective exchange rates monthly
WS_GLI Global liquidity indicators
WS_LBS_D_PUB BIS locational banking
WS_LONG_CPI BIS long consumer prices
WS_OTC_DERIV2 OTC derivatives outstanding
WS_SPP BIS property prices: selected series
WS_TC BIS long series on total credit
WS_XRU US dollar exchange rates, m,q,a
WS_XRU_D US dollar exchange rates, daily
WS_XTD_DERIV Exchange traded derivatives
Name: dataflow, dtype: object
Or the user can search dataflows by their human-readable name instead of their code. For example, this is one way to see if any dataflow has information on interest rates:
str.contains('Interest rates', case=False)] dflows[dflows.
BBK BBSDI Discount interest rates pursuant to section 25...
ECB RIR Retail Interest Rates
IMF 6SR M&B: Interest Rates and Share Prices (6SR) for...
INR Interest rates
INR_NSTD Interest rates_Non-Standard
Name: dataflow, dtype: object
The function load_SDMX_data
is a convenience function that downloads data from SDMX sources (and any specific dataflows passed as arguments) if they match the key and parameters set by the user.
load_SDMX_data
load_SDMX_data (sources:dict, keys:dict, params:dict, verbose:bool=True)
Loads datasets from SDMX.
Type | Default | Details | |
---|---|---|---|
sources | dict | A dictionary with the sources and dataflows per source | |
keys | dict | The keys to be used in the SDMX query | |
params | dict | The parameters to be used in the SDMX query | |
verbose | bool | True | Whether to communicate download steps to the user |
= load_SDMX_data(sources={'ECB': 'CISS', 'BIS': 'WS_CBPOL_D'}, keys={'FREQ': 'D'}, params={'startPeriod': 2003})
df
assert type(df) == pd.DataFrame
assert df.shape[0] > 0
assert df.shape[1] > 0
Querying data from ECB's dataflow 'CISS' - Composite Indicator of Systemic Stress...
Querying data from BIS's dataflow 'WS_CBPOL_D' - Policy rates daily...
2023-09-16 00:52:42,940 pandasdmx.reader.sdmxml - INFO: Use supplied dsd=… argument for non–structure-specific message