= ModelCard() model_doc
Model documentation
Each user has a specific documentation need, ranging from simply logging the model training to a more complex description of the model pipeline with a discusson of the model outcomes. gingado
addresses this variety of needs by offering a class of objects, “Documenters”, that facilitate model documentation. A base class facilitates the creation of generic ways to document models, and gingado
includes two specific model documentation templates off-the-shelf as described below.
The model documentation is performed by Documenters, objects that subclass from the base class ggdModelDocumentation
. This base class offers code that can be used by any Documenter to read the model in question, format the information according to a template and save the resulting documentation in a JSON format. Documenters save the underlying information using the JSON format. With the JSON documentation file at hand, the user can then deploy existing third-party libraries to transform the information stored in JSON into a variety of formats (eg, HTML, PDF) as needed.
One current area of development is the automatic filing of some fields related to the model. The objective is to automatise documentation of the information that can be fetched automatically from the model, leaving time for the analyst to concentrate on other tasks, such as considering the ethical implications of the machine learning model being trained.
Base class
gingado
has a ggdModelDocumentation
base class that contains the basic functionalities for Documenters. It is not meant to be used by itself, but only as a hyperclass for Documenters objects. gingado
ships with two such objects that subclass ggdModelDocumentation
: ModelCard
and ForecastCard
. They are both described below in their respective sections.
Users are encouraged to submit a PR with their own Documenter models subclassing ggdModelDocumentation
; see Section 5 for more information.
ggdModelDocumentation
ggdModelDocumentation ()
Base class for gingado Documenters
setup_template
setup_template ()
Set up the template from the JSON documentation
show_template
show_template (indent:bool=True)
Show documentation template in JSON format
Type | Default | Details | |
---|---|---|---|
indent | bool | True | Whether to print JSON documentation template with indentation for easier human reading |
documentation_path
documentation_path ()
Show path to documentation
save_json
save_json (file_path:str)
Save the documentation in JSON format in the specified file
Type | Details | |
---|---|---|
file_path | str | Path to save JSON file |
read_json
read_json (file_path:str)
Load documentation JSON from path
Type | Details | |
---|---|---|
file_path | str | Path to JSON file or path defined in file_path if None |
show_json
show_json ()
Show documentation in JSON format
read_model
read_model (model)
Read automatically information form the model and add to documentation
Details | |
---|---|
model | The model to be documented |
open_questions
open_questions ()
List open fields in the documentation
fill_model_info
fill_model_info (model_info:Union[str,dict], model_info_keyname:str='model_details')
Called automatically, or by the user, to add model information to the documentation according to its template
Type | Default | Details | |
---|---|---|---|
model_info | str | dict | Information about the model to be added in the documentation | |
model_info_keyname | str | model_details | Dictionary key in the Documenter template to which this information should be linked |
fill_info
fill_info (new_info:dict)
Include infomation in model documentation
Type | Details | |
---|---|---|
new_info | dict | Dictionary with information to be added to the model documentation |
Documenters
ModelCard
ModelCard
- the model documentation template inspired by the work of Mitchell et al. (2018) already comes with gingado
. Its template can be used by users as is, or tweaked according to each need. The ModelCard
template can also serve as inspiration for any custom documentation needs. Users with documentation needs beyond the out-of-the-box solutions provided by gingado
can create their own class of Documenters (more information on that below), and compatibility with these custom documentation routines with the rest of the code is ensured. Users are encouraged to submit a pull request with their own documentation models subclassing ggdModelDocumentation
if these custom templates can also benefit other users.
Like all gingado
Documenters, a ModelCard
is can be easily created on a standalone basis as shown below, or as part of a gingado.ggdBenchmark
object.
By default, it autofills the template with the current date and time. Users can add other information to be automatically added by a customised Documenter object.
= ModelCard(autofill=True)
model_doc_with_autofill = ModelCard(autofill=False) model_doc_no_autofill
Below is a comparison of the model_details
section of the model document, with and without the autofill.
'model_details'] model_doc_with_autofill.show_json()[
{'developer': 'Person or organisation developing the model',
'datetime': '2023-06-22 09:05:40 ',
'version': 'Model version',
'type': 'Model type',
'info': 'Information about training algorithms, parameters, fairness constraints or other applied approaches, and features',
'paper': 'Paper or other resource for more information',
'citation': 'Citation details',
'license': 'License',
'contact': 'Where to send questions or comments about the model'}
'model_details'] model_doc_no_autofill.show_json()[
{'developer': 'Person or organisation developing the model',
'datetime': 'Model date',
'version': 'Model version',
'type': 'Model type',
'info': 'Information about training algorithms, parameters, fairness constraints or other applied approaches, and features',
'paper': 'Paper or other resource for more information',
'citation': 'Citation details',
'license': 'License',
'contact': 'Where to send questions or comments about the model'}
ModelCard
ModelCard (file_path:str='', autofill:bool=True, indent_level:int|None=2)
A gingado Documenter based on Mitchell et al. (2018)
Type | Default | Details | |
---|---|---|---|
file_path | str | Path for the JSON file with the documentation | |
autofill | bool | True | Whether the Documenter object should autofill when created |
indent_level | int | None | 2 | Level of indentation during serialisation to JSON |
autofill_template
autofill_template ()
Create an empty model card template, then fills it with information that is automatically obtained from the system
ForecastCard
ForecastCard
is a model documentation template inspired by Mitchell et al. (2018), but with fields that are more specifically targeted towards forecasting or nowcasting use cases.
Because a ForecastCard
Documenter object is targeted to forecasting and nowcasting models, it contains some specialised fields, as illustrated below.
= ForecastCard()
model_doc
model_doc.show_template()
{
"model_details": {
"field_description": "Basic information about the model",
"variable": "Variable(s) being forecasted or nowcasted",
"jurisdiction": "Jurisdiction(s) of the variable being forecasted or nowcasted",
"developer": "Person or organisation developing the model",
"datetime": "Model date",
"version": "Model version",
"type": "Model type",
"pipeline": "Description of the pipeline steps being used",
"info": "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features",
"econometric_model": "Information about the econometric model or technique",
"paper": "Paper or other resource for more information",
"citation": "Citation details",
"license": "License",
"contact": "Where to send questions or comments about the model"
},
"intended_use": {
"field_description": "Use cases that were envisioned during development",
"primary_uses": "Primary intended uses",
"primary_users": "Primary intended users",
"out_of_scope": "Out-of-scope use cases"
},
"factors": {
"field_description": "Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others",
"relevant": "Relevant factors",
"evaluation": "Evaluation factors"
},
"metrics": {
"field_description": "Metrics should be chosen to reflect potential real world impacts of the model",
"performance_measures": "Model performance measures",
"estimation_approaches": "How are the evaluation metrics calculated? Include information on the cross-validation approach, if used"
},
"data": {
"field_description": "Details on the dataset(s) used for the training and evaluation of the model",
"datasets": "Datasets",
"preprocessing": "Preprocessing",
"cutoff_date": "Cut-off date that separates training from evaluation data"
},
"ethical_considerations": {
"field_description": "Ethical considerations that went into model development, surfacing ethical challenges and solutions to stakeholders. Ethical analysis does not always lead to precise solutions, but the process of ethical contemplation is worthwhile to inform on responsible practices and next steps in future work.",
"sensitive_data": "Does the model use any sensitive data (e.g., protected classes)?",
"risks_and_harms": "What risks may be present in model usage? Try to identify the potential recipients, likelihood, and magnitude of harms. If these cannot be determined, note that they were considered but remain unknown",
"use_cases": "Are there any known model use cases that are especially fraught?",
"additional_information": "If possible, this section should also include any additional ethical considerations that went into model development, for example, review by an external board, or testing with a specific community."
},
"caveats_recommendations": {
"field_description": "Additional concerns that were not covered in the previous sections",
"caveats": "For example, did the results suggest any further testing? Were there any relevant groups that were not represented in the evaluation dataset?",
"recommendations": "Are there additional recommendations for model use? What are the ideal characteristics of an evaluation dataset for this model?"
}
}
model_doc.show_json()
{'model_details': {'variable': 'Variable(s) being forecasted or nowcasted',
'jurisdiction': 'Jurisdiction(s) of the variable being forecasted or nowcasted',
'developer': 'Person or organisation developing the model',
'datetime': '2023-06-22 09:05:40 ',
'version': 'Model version',
'type': 'Model type',
'pipeline': 'Description of the pipeline steps being used',
'info': 'Information about training algorithms, parameters, fairness constraints or other applied approaches, and features',
'econometric_model': 'Information about the econometric model or technique',
'paper': 'Paper or other resource for more information',
'citation': 'Citation details',
'license': 'License',
'contact': 'Where to send questions or comments about the model'},
'intended_use': {'primary_uses': 'Primary intended uses',
'primary_users': 'Primary intended users',
'out_of_scope': 'Out-of-scope use cases'},
'factors': {'relevant': 'Relevant factors',
'evaluation': 'Evaluation factors'},
'metrics': {'performance_measures': 'Model performance measures',
'estimation_approaches': 'How are the evaluation metrics calculated? Include information on the cross-validation approach, if used'},
'data': {'datasets': 'Datasets',
'preprocessing': 'Preprocessing',
'cutoff_date': 'Cut-off date that separates training from evaluation data'},
'ethical_considerations': {'sensitive_data': 'Does the model use any sensitive data (e.g., protected classes)?',
'risks_and_harms': 'What risks may be present in model usage? Try to identify the potential recipients, likelihood, and magnitude of harms. If these cannot be determined, note that they were considered but remain unknown',
'use_cases': 'Are there any known model use cases that are especially fraught?',
'additional_information': 'If possible, this section should also include any additional ethical considerations that went into model development, for example, review by an external board, or testing with a specific community.'},
'caveats_recommendations': {'caveats': 'For example, did the results suggest any further testing? Were there any relevant groups that were not represented in the evaluation dataset?',
'recommendations': 'Are there additional recommendations for model use? What are the ideal characteristics of an evaluation dataset for this model?'}}
ForecastCard
ForecastCard (file_path:str='', autofill:bool=True, indent_level:int|None=2)
A gingado Documenter for forecasting or nowcasting use cases
Type | Default | Details | |
---|---|---|---|
file_path | str | Path for the JSON file with the documentation | |
autofill | bool | True | Whether the Documenter object should autofill when created |
indent_level | int | None | 2 | Level of indentation during serialisation to JSON |
autofill_template
autofill_template ()
Create an empty model card template, then fills it with information that is automatically obtained from the system
Basic functioning of model documentation
After a Documenter object, such as ModelCard
or ForecastCard
is instanciated, the user can see the underlying template with the module show_template
, as below:
= ModelCard(autofill=False)
model_doc assert model_doc.show_template(indent=False) == ModelCard.template
model_doc.show_template()
{
"model_details": {
"field_description": "Basic information about the model",
"developer": "Person or organisation developing the model",
"datetime": "Model date",
"version": "Model version",
"type": "Model type",
"info": "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features",
"paper": "Paper or other resource for more information",
"citation": "Citation details",
"license": "License",
"contact": "Where to send questions or comments about the model"
},
"intended_use": {
"field_description": "Use cases that were envisioned during development",
"primary_uses": "Primary intended uses",
"primary_users": "Primary intended users",
"out_of_scope": "Out-of-scope use cases"
},
"factors": {
"field_description": "Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others",
"relevant": "Relevant factors",
"evaluation": "Evaluation factors"
},
"metrics": {
"field_description": "Metrics should be chosen to reflect potential real world impacts of the model",
"performance_measures": "Model performance measures",
"thresholds": "Decision thresholds",
"variation_approaches": "Variation approaches"
},
"evaluation_data": {
"field_description": "Details on the dataset(s) used for the quantitative analyses in the documentation",
"datasets": "Datasets",
"motivation": "Motivation",
"preprocessing": "Preprocessing"
},
"training_data": {
"field_description": "May not be possible to provide in practice. When possible, this section should mirror 'Evaluation Data'. If such detail is not possible, minimal allowable information should be provided here, such as details of the distribution over various factors in the training datasets.",
"training_data": "Information on training data"
},
"quant_analyses": {
"field_description": "Quantitative Analyses",
"unitary": "Unitary results",
"intersectional": "Intersectional results"
},
"ethical_considerations": {
"field_description": "Ethical considerations that went into model development, surfacing ethical challenges and solutions to stakeholders. Ethical analysis does not always lead to precise solutions, but the process of ethical contemplation is worthwhile to inform on responsible practices and next steps in future work.",
"sensitive_data": "Does the model use any sensitive data (e.g., protected classes)?",
"human_life": "Is the model intended to inform decisions about matters central to human life or flourishing - e.g., health or safety? Or could it be used in such a way?",
"mitigations": "What risk mitigation strategies were used during model development?",
"risks_and_harms": "What risks may be present in model usage? Try to identify the potential recipients,likelihood, and magnitude of harms. If these cannot be determined, note that they were considered but remain unknown",
"use_cases": "Are there any known model use cases that are especially fraught?",
"additional_information": "If possible, this section should also include any additional ethical considerations that went into model development, for example, review by an external board, or testing with a specific community."
},
"caveats_recommendations": {
"field_description": "Additional concerns that were not covered in the previous sections",
"caveats": "For example, did the results suggest any further testing? Were there any relevant groups that were not represented in the evaluation dataset?",
"recommendations": "Are there additional recommendations for model use? What are the ideal characteristics of an evaluation dataset for this model?"
}
}
The method show_json
prints the Documenter’s documentation template, where the unfilled information retains the descriptions from the original template:
= ModelCard(autofill=True)
model_doc model_doc.show_json()
{'model_details': {'developer': 'Person or organisation developing the model',
'datetime': '2023-06-22 09:05:41 ',
'version': 'Model version',
'type': 'Model type',
'info': 'Information about training algorithms, parameters, fairness constraints or other applied approaches, and features',
'paper': 'Paper or other resource for more information',
'citation': 'Citation details',
'license': 'License',
'contact': 'Where to send questions or comments about the model'},
'intended_use': {'primary_uses': 'Primary intended uses',
'primary_users': 'Primary intended users',
'out_of_scope': 'Out-of-scope use cases'},
'factors': {'relevant': 'Relevant factors',
'evaluation': 'Evaluation factors'},
'metrics': {'performance_measures': 'Model performance measures',
'thresholds': 'Decision thresholds',
'variation_approaches': 'Variation approaches'},
'evaluation_data': {'datasets': 'Datasets',
'motivation': 'Motivation',
'preprocessing': 'Preprocessing'},
'training_data': {'training_data': 'Information on training data'},
'quant_analyses': {'unitary': 'Unitary results',
'intersectional': 'Intersectional results'},
'ethical_considerations': {'sensitive_data': 'Does the model use any sensitive data (e.g., protected classes)?',
'human_life': 'Is the model intended to inform decisions about matters central to human life or flourishing - e.g., health or safety? Or could it be used in such a way?',
'mitigations': 'What risk mitigation strategies were used during model development?',
'risks_and_harms': 'What risks may be present in model usage? Try to identify the potential recipients,likelihood, and magnitude of harms. If these cannot be determined, note that they were considered but remain unknown',
'use_cases': 'Are there any known model use cases that are especially fraught?',
'additional_information': 'If possible, this section should also include any additional ethical considerations that went into model development, for example, review by an external board, or testing with a specific community.'},
'caveats_recommendations': {'caveats': 'For example, did the results suggest any further testing? Were there any relevant groups that were not represented in the evaluation dataset?',
'recommendations': 'Are there additional recommendations for model use? What are the ideal characteristics of an evaluation dataset for this model?'}}
The template is protected from editing once a Documenter has been created. This way, even if a user unwarrantedly changes the template, this does not interfere with the Documenter functionality.
= None
model_doc.template
model_doc.show_template()
assert model_doc.show_template(indent=False) == ModelCard.template
{
"model_details": {
"field_description": "Basic information about the model",
"developer": "Person or organisation developing the model",
"datetime": "Model date",
"version": "Model version",
"type": "Model type",
"info": "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features",
"paper": "Paper or other resource for more information",
"citation": "Citation details",
"license": "License",
"contact": "Where to send questions or comments about the model"
},
"intended_use": {
"field_description": "Use cases that were envisioned during development",
"primary_uses": "Primary intended uses",
"primary_users": "Primary intended users",
"out_of_scope": "Out-of-scope use cases"
},
"factors": {
"field_description": "Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others",
"relevant": "Relevant factors",
"evaluation": "Evaluation factors"
},
"metrics": {
"field_description": "Metrics should be chosen to reflect potential real world impacts of the model",
"performance_measures": "Model performance measures",
"thresholds": "Decision thresholds",
"variation_approaches": "Variation approaches"
},
"evaluation_data": {
"field_description": "Details on the dataset(s) used for the quantitative analyses in the documentation",
"datasets": "Datasets",
"motivation": "Motivation",
"preprocessing": "Preprocessing"
},
"training_data": {
"field_description": "May not be possible to provide in practice. When possible, this section should mirror 'Evaluation Data'. If such detail is not possible, minimal allowable information should be provided here, such as details of the distribution over various factors in the training datasets.",
"training_data": "Information on training data"
},
"quant_analyses": {
"field_description": "Quantitative Analyses",
"unitary": "Unitary results",
"intersectional": "Intersectional results"
},
"ethical_considerations": {
"field_description": "Ethical considerations that went into model development, surfacing ethical challenges and solutions to stakeholders. Ethical analysis does not always lead to precise solutions, but the process of ethical contemplation is worthwhile to inform on responsible practices and next steps in future work.",
"sensitive_data": "Does the model use any sensitive data (e.g., protected classes)?",
"human_life": "Is the model intended to inform decisions about matters central to human life or flourishing - e.g., health or safety? Or could it be used in such a way?",
"mitigations": "What risk mitigation strategies were used during model development?",
"risks_and_harms": "What risks may be present in model usage? Try to identify the potential recipients,likelihood, and magnitude of harms. If these cannot be determined, note that they were considered but remain unknown",
"use_cases": "Are there any known model use cases that are especially fraught?",
"additional_information": "If possible, this section should also include any additional ethical considerations that went into model development, for example, review by an external board, or testing with a specific community."
},
"caveats_recommendations": {
"field_description": "Additional concerns that were not covered in the previous sections",
"caveats": "For example, did the results suggest any further testing? Were there any relevant groups that were not represented in the evaluation dataset?",
"recommendations": "Are there additional recommendations for model use? What are the ideal characteristics of an evaluation dataset for this model?"
}
}
Users can find which fields in their templates are still open by using the module open_questions
. The levels of the template are reflected in the resulting dictionary, with double underscores separating the different dictionary levels in the underlying template.
Below we see that after inputting information for the item caveats
in the section caveats_recommendations
, this item does not appear in the results of the open_questions
method.
'caveats_recommendations': {'caveats': 'This is another test'}})
model_doc.fill_info({assert model_doc.json_doc['caveats_recommendations']['caveats'] == "This is another test"
# note that caveats_recommendations__caveats is no longer considered an open question
# after being filled in through `fill_info`.
print([oq for oq in model_doc.open_questions() if oq.startswith('caveats')])
['caveats_recommendations__recommendations']
And now the complete result of the open_questions
method:
model_doc.open_questions()
['model_details__developer',
'model_details__version',
'model_details__type',
'model_details__info',
'model_details__paper',
'model_details__citation',
'model_details__license',
'model_details__contact',
'intended_use__primary_uses',
'intended_use__primary_users',
'intended_use__out_of_scope',
'factors__relevant',
'factors__evaluation',
'metrics__performance_measures',
'metrics__thresholds',
'metrics__variation_approaches',
'evaluation_data__datasets',
'evaluation_data__motivation',
'evaluation_data__preprocessing',
'training_data__training_data',
'quant_analyses__unitary',
'quant_analyses__intersectional',
'ethical_considerations__sensitive_data',
'ethical_considerations__human_life',
'ethical_considerations__mitigations',
'ethical_considerations__risks_and_harms',
'ethical_considerations__use_cases',
'ethical_considerations__additional_information',
'caveats_recommendations__recommendations']
If the user wants to fill in an empty field such as the ones identified above by the method open_questions
, the user simply needs to pass to the module fill_info
a dictionary with the corresponding information. Depending on the template, the dictionary may be nested.
it is technically possible to attribute the element directly to the attribute json_doc
, but this should be avoided in favour of using the method fill_info
. The latter tests whether the new information is valid according to the documentation template and also enables filling of more than one question at the same time. In addition, attributing information directly to json_doc
is not logged, and may unwarrantedly create new entries that are not part of the template (eg, if a new dictionary key is created due to typos).
The template serves to provide specific instances of the Documenter object with a form-like structure, indicating which fields are open and thus require some answers or information. Consequently, the template does not change when the actual document object changes after information is added by fill_info
.
= {
new_info 'metrics': {'performance_measures': "This is a test"},
'caveats_recommendations': {'caveats': "This is another test"}
}
model_doc.fill_info(new_info)print([model_doc.json_doc['metrics'], ModelCard.template['metrics']])
assert model_doc.show_template(indent=False) == ModelCard.template
[{'performance_measures': 'This is a test', 'thresholds': 'Decision thresholds', 'variation_approaches': 'Variation approaches'}, {'field_description': 'Metrics should be chosen to reflect potential real world impacts of the model', 'performance_measures': 'Model performance measures', 'thresholds': 'Decision thresholds', 'variation_approaches': 'Variation approaches'}]
Reading information from models
gingado
’s ggdModelDocumentation
base class is able to extract information from machine learning models from a number of widely used libraries and make it available to the Documenter objects. This is done through the method read_model
, which recognises whether the model is a gingado
object or any of scikit-learn
, keras
, or fastai
models and read the model characteristics appropriately. For filing out information from other models (eg, pytorch
or even models coded from scratch, machine learning or not), the user can benefit from the module fill_model_info
that every Documenter should have, as demonstrated below.
In the case of ModelCard
, these informations are included under model_details
, item info
. But the model information could be saved in another area of a custom Documenter.
the model-specific information saved is different depending on the model’s original library.
Preliminaries
The mock dataset below is used to construct models using different libraries, to demonstrate how they are read by Documenters.
from sklearn.datasets import make_classification
# some mock up data
= make_classification()
X, y
X.shape, y.shape
((100, 20), (100,))
gingado Benchmark
from gingado.benchmark import ClassificationBenchmark
# the gingado benchmark
= ClassificationBenchmark(verbose_grid=1).fit(X, y) gingado_clf
Fitting 10 folds for each of 6 candidates, totalling 60 fits
# a new instance of ModelCard is created and used to document the model
= ModelCard()
model_doc_gingado
model_doc_gingado.read_model(gingado_clf.benchmark)print(model_doc_gingado.show_json()['model_details']['info'])
# but given that gingado Benchmark objects already document the best model at every fit, we can check that they are equal:
assert model_doc_gingado.show_json()['model_details']['info'] == gingado_clf.model_documentation.show_json()['model_details']['info']
{'_estimator_type': 'classifier', 'best_estimator_': RandomForestClassifier(oob_score=True), 'best_index_': 0, 'best_params_': {'max_features': 'sqrt', 'n_estimators': 100}, 'best_score_': 0.99, 'classes_': array([0, 1]), 'cv_results_': {'mean_fit_time': array([0.13181503, 0.29619505, 0.1136914 , 0.28267403, 0.12027018,
0.29762466]), 'std_fit_time': array([0.01625579, 0.02733223, 0.00146566, 0.00806355, 0.0079422 ,
0.02163048]), 'mean_score_time': array([0.00838451, 0.01815953, 0.0072778 , 0.01760452, 0.00771282,
0.01794319]), 'std_score_time': array([0.0007298 , 0.00164848, 0.00029095, 0.00087866, 0.00064812,
0.00179062]), 'param_max_features': masked_array(data=['sqrt', 'sqrt', 'log2', 'log2', None, None],
mask=[False, False, False, False, False, False],
fill_value='?',
dtype=object), 'param_n_estimators': masked_array(data=[100, 250, 100, 250, 100, 250],
mask=[False, False, False, False, False, False],
fill_value='?',
dtype=object), 'params': [{'max_features': 'sqrt', 'n_estimators': 100}, {'max_features': 'sqrt', 'n_estimators': 250}, {'max_features': 'log2', 'n_estimators': 100}, {'max_features': 'log2', 'n_estimators': 250}, {'max_features': None, 'n_estimators': 100}, {'max_features': None, 'n_estimators': 250}], 'split0_test_score': array([0.9, 0.9, 0.9, 0.9, 0.9, 0.9]), 'split1_test_score': array([1., 1., 1., 1., 1., 1.]), 'split2_test_score': array([1., 1., 1., 1., 1., 1.]), 'split3_test_score': array([1., 1., 1., 1., 1., 1.]), 'split4_test_score': array([1., 1., 1., 1., 1., 1.]), 'split5_test_score': array([1., 1., 1., 1., 1., 1.]), 'split6_test_score': array([1., 1., 1., 1., 1., 1.]), 'split7_test_score': array([1., 1., 1., 1., 1., 1.]), 'split8_test_score': array([1., 1., 1., 1., 1., 1.]), 'split9_test_score': array([1., 1., 1., 1., 1., 1.]), 'mean_test_score': array([0.99, 0.99, 0.99, 0.99, 0.99, 0.99]), 'std_test_score': array([0.03, 0.03, 0.03, 0.03, 0.03, 0.03]), 'rank_test_score': array([1, 1, 1, 1, 1, 1], dtype=int32)}, 'multimetric_': False, 'n_features_in_': 20, 'n_splits_': 10, 'refit_time_': 0.13797831535339355, 'scorer_': <function _passthrough_scorer>}
scikit-learn
from sklearn.ensemble import RandomForestClassifier
= RandomForestClassifier().fit(X, y) sklearn_clf
= ModelCard()
model_doc_sklearn
model_doc_sklearn.read_model(sklearn_clf)print(model_doc_sklearn.show_json()['model_details']['info'])
{'_estimator_type': 'classifier', 'base_estimator_': DecisionTreeClassifier(), 'classes_': array([0, 1]), 'estimators_': [DecisionTreeClassifier(max_features='sqrt', random_state=644629283), DecisionTreeClassifier(max_features='sqrt', random_state=773222412), DecisionTreeClassifier(max_features='sqrt', random_state=172080181), DecisionTreeClassifier(max_features='sqrt', random_state=777314458), DecisionTreeClassifier(max_features='sqrt', random_state=890227001), DecisionTreeClassifier(max_features='sqrt', random_state=638760693), DecisionTreeClassifier(max_features='sqrt', random_state=221026659), DecisionTreeClassifier(max_features='sqrt', random_state=8390130), DecisionTreeClassifier(max_features='sqrt', random_state=432789656), DecisionTreeClassifier(max_features='sqrt', random_state=1972759968), DecisionTreeClassifier(max_features='sqrt', random_state=1576187512), DecisionTreeClassifier(max_features='sqrt', random_state=166168759), DecisionTreeClassifier(max_features='sqrt', random_state=1463202577), DecisionTreeClassifier(max_features='sqrt', random_state=1755000244), DecisionTreeClassifier(max_features='sqrt', random_state=1886503315), DecisionTreeClassifier(max_features='sqrt', random_state=1377389061), DecisionTreeClassifier(max_features='sqrt', random_state=594812349), DecisionTreeClassifier(max_features='sqrt', random_state=191490399), DecisionTreeClassifier(max_features='sqrt', random_state=2094686855), DecisionTreeClassifier(max_features='sqrt', random_state=1782311375), DecisionTreeClassifier(max_features='sqrt', random_state=375006184), DecisionTreeClassifier(max_features='sqrt', random_state=1886779260), DecisionTreeClassifier(max_features='sqrt', random_state=20598082), DecisionTreeClassifier(max_features='sqrt', random_state=1038188436), DecisionTreeClassifier(max_features='sqrt', random_state=1821597520), DecisionTreeClassifier(max_features='sqrt', random_state=850416929), DecisionTreeClassifier(max_features='sqrt', random_state=1853389052), DecisionTreeClassifier(max_features='sqrt', random_state=250119911), DecisionTreeClassifier(max_features='sqrt', random_state=1259325505), DecisionTreeClassifier(max_features='sqrt', random_state=76487423), DecisionTreeClassifier(max_features='sqrt', random_state=1409801089), DecisionTreeClassifier(max_features='sqrt', random_state=598699990), DecisionTreeClassifier(max_features='sqrt', random_state=76174600), DecisionTreeClassifier(max_features='sqrt', random_state=193904761), DecisionTreeClassifier(max_features='sqrt', random_state=1703822078), DecisionTreeClassifier(max_features='sqrt', random_state=1339222994), DecisionTreeClassifier(max_features='sqrt', random_state=2010201174), DecisionTreeClassifier(max_features='sqrt', random_state=1140435560), DecisionTreeClassifier(max_features='sqrt', random_state=752716613), DecisionTreeClassifier(max_features='sqrt', random_state=154000048), DecisionTreeClassifier(max_features='sqrt', random_state=834145348), DecisionTreeClassifier(max_features='sqrt', random_state=1512945615), DecisionTreeClassifier(max_features='sqrt', random_state=150048855), DecisionTreeClassifier(max_features='sqrt', random_state=1569930300), DecisionTreeClassifier(max_features='sqrt', random_state=847545530), DecisionTreeClassifier(max_features='sqrt', random_state=467086770), DecisionTreeClassifier(max_features='sqrt', random_state=904736232), DecisionTreeClassifier(max_features='sqrt', random_state=550161302), DecisionTreeClassifier(max_features='sqrt', random_state=1272240134), DecisionTreeClassifier(max_features='sqrt', random_state=1979934607), DecisionTreeClassifier(max_features='sqrt', random_state=1968170115), DecisionTreeClassifier(max_features='sqrt', random_state=1735755797), DecisionTreeClassifier(max_features='sqrt', random_state=730511601), DecisionTreeClassifier(max_features='sqrt', random_state=1676843219), DecisionTreeClassifier(max_features='sqrt', random_state=1348911102), DecisionTreeClassifier(max_features='sqrt', random_state=819688245), DecisionTreeClassifier(max_features='sqrt', random_state=834285415), DecisionTreeClassifier(max_features='sqrt', random_state=1838527500), DecisionTreeClassifier(max_features='sqrt', random_state=358239969), DecisionTreeClassifier(max_features='sqrt', random_state=1101177373), DecisionTreeClassifier(max_features='sqrt', random_state=994347621), DecisionTreeClassifier(max_features='sqrt', random_state=1882007098), DecisionTreeClassifier(max_features='sqrt', random_state=1912439135), DecisionTreeClassifier(max_features='sqrt', random_state=1021450603), DecisionTreeClassifier(max_features='sqrt', random_state=177239870), DecisionTreeClassifier(max_features='sqrt', random_state=335200354), DecisionTreeClassifier(max_features='sqrt', random_state=66252232), DecisionTreeClassifier(max_features='sqrt', random_state=956249068), DecisionTreeClassifier(max_features='sqrt', random_state=30070017), DecisionTreeClassifier(max_features='sqrt', random_state=1330396499), DecisionTreeClassifier(max_features='sqrt', random_state=207906059), DecisionTreeClassifier(max_features='sqrt', random_state=1323234017), DecisionTreeClassifier(max_features='sqrt', random_state=2083574934), DecisionTreeClassifier(max_features='sqrt', random_state=1721483578), DecisionTreeClassifier(max_features='sqrt', random_state=439587182), DecisionTreeClassifier(max_features='sqrt', random_state=888640081), DecisionTreeClassifier(max_features='sqrt', random_state=1274687214), DecisionTreeClassifier(max_features='sqrt', random_state=1394566340), DecisionTreeClassifier(max_features='sqrt', random_state=1058291198), DecisionTreeClassifier(max_features='sqrt', random_state=1868840268), DecisionTreeClassifier(max_features='sqrt', random_state=1367322994), DecisionTreeClassifier(max_features='sqrt', random_state=217588037), DecisionTreeClassifier(max_features='sqrt', random_state=1419428676), DecisionTreeClassifier(max_features='sqrt', random_state=849946680), DecisionTreeClassifier(max_features='sqrt', random_state=2096213933), DecisionTreeClassifier(max_features='sqrt', random_state=2004379246), DecisionTreeClassifier(max_features='sqrt', random_state=428198071), DecisionTreeClassifier(max_features='sqrt', random_state=774183356), DecisionTreeClassifier(max_features='sqrt', random_state=1452489189), DecisionTreeClassifier(max_features='sqrt', random_state=410945613), DecisionTreeClassifier(max_features='sqrt', random_state=2035389513), DecisionTreeClassifier(max_features='sqrt', random_state=1818418019), DecisionTreeClassifier(max_features='sqrt', random_state=1905831285), DecisionTreeClassifier(max_features='sqrt', random_state=1263948971), DecisionTreeClassifier(max_features='sqrt', random_state=1860077210), DecisionTreeClassifier(max_features='sqrt', random_state=1006638035), DecisionTreeClassifier(max_features='sqrt', random_state=2072402722), DecisionTreeClassifier(max_features='sqrt', random_state=1095293459), DecisionTreeClassifier(max_features='sqrt', random_state=716644424), DecisionTreeClassifier(max_features='sqrt', random_state=844801010)], 'feature_importances_': array([0.01427207, 0.0160448 , 0.07213446, 0.00679389, 0.01636618,
0.0119022 , 0.01342794, 0.16113749, 0.01067974, 0.47137566,
0.01179527, 0.07481836, 0.00827858, 0.02065695, 0.01097749,
0.01589413, 0.02206046, 0.01543262, 0.00987852, 0.01607318]), 'n_classes_': 2, 'n_features_': 20, 'n_features_in_': 20, 'n_outputs_': 1}
/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/sklearn/utils/deprecation.py:103: FutureWarning: Attribute `n_features_` was deprecated in version 1.0 and will be removed in 1.2. Use `n_features_in_` instead.
warnings.warn(msg, category=FutureWarning)
Keras
from tensorflow import keras
2023-06-22 09:05:56.756655: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
= keras.Sequential()
keras_clf 16, activation='relu', input_shape=(20,)))
keras_clf.add(keras.layers.Dense(8, activation='relu'))
keras_clf.add(keras.layers.Dense(1, activation='sigmoid'))
keras_clf.add(keras.layers.Dense(compile(optimizer='sgd', loss='binary_crossentropy')
keras_clf.=10, epochs=10) keras_clf.fit(X, y, batch_size
Metal device set to: AMD Radeon Pro 5500M
systemMemory: 64.00 GB
maxCacheSize: 3.99 GB
Epoch 1/10
2023-06-22 09:06:05.182863: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fda3b1488a0
2023-06-22 09:06:05.182925: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fda3b1488a0
2023-06-22 09:06:05.525360: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fda3b1488a0
2023-06-22 09:06:05.525391: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fda3b1488a0
2023-06-22 09:06:05.794085: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fda3b1488a0
2023-06-22 09:06:05.794121: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:418 : NOT_FOUND: could not find registered platform with id: 0x7fda3b1488a0
NotFoundError: Graph execution error:
Detected at node 'StatefulPartitionedCall_4' defined at (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in <module>
app.launch_new_instance()
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 724, in start
self.io_loop.start()
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 215, in start
self.asyncio_loop.run_forever()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 595, in run_forever
self._run_once()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 1881, in _run_once
handle._run()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 512, in dispatch_queue
await self.process_one()
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 501, in process_one
await dispatch(*args)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 408, in dispatch_shell
await result
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 731, in execute_request
reply_content = await reply_content
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 424, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
return super().run_cell(*args, **kwargs)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2768, in run_cell
result = self._run_cell(
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2814, in _run_cell
return runner(coro)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
coro.send(None)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3012, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3191, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3251, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/b9/p8z57lqd55xfk68xz34dg0s40000gn/T/ipykernel_34473/3587272776.py", line 6, in <module>
keras_clf.fit(X, y, batch_size=10, epochs=10)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
tmp_logs = self.train_function(iterator)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
return step_function(self, iterator)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
outputs = model.train_step(data)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/engine/training.py", line 1027, in train_step
self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
self.apply_gradients(grads_and_vars)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
return super().apply_gradients(grads_and_vars, name=name)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 634, in apply_gradients
iteration = self._internal_apply_gradients(grads_and_vars)
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1166, in _internal_apply_gradients
return tf.__internal__.distribute.interim.maybe_merge_call(
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1216, in _distributed_apply_gradients_fn
distribution.extended.update(
File "/Users/douglasaraujo/Coding/.venv_gingado/lib/python3.10/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1211, in apply_grad_to_update_var
return self._update_step_xla(grad, var, id(self._var_key(var)))
Node: 'StatefulPartitionedCall_4'
could not find registered platform with id: 0x7fda3b1488a0
[[{{node StatefulPartitionedCall_4}}]] [Op:__inference_train_function_664]
= ModelCard()
model_doc_keras
model_doc_keras.read_model(keras_clf)'model_details']['info'] model_doc_keras.show_json()[
'{"class_name": "Sequential", "config": {"name": "sequential", "layers": [{"class_name": "InputLayer", "config": {"batch_input_shape": [null, 20], "dtype": "float32", "sparse": false, "ragged": false, "name": "dense_input"}}, {"class_name": "Dense", "config": {"name": "dense", "trainable": true, "batch_input_shape": [null, 20], "dtype": "float32", "units": 16, "activation": "relu", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}}, {"class_name": "Dense", "config": {"name": "dense_1", "trainable": true, "dtype": "float32", "units": 8, "activation": "relu", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}}, {"class_name": "Dense", "config": {"name": "dense_2", "trainable": true, "dtype": "float32", "units": 1, "activation": "sigmoid", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}}]}, "keras_version": "2.8.0", "backend": "tensorflow"}'
Other models
Native support for automatic documentation of other model types, such as from fastai
, pytorch
is expected to be available in future versions. Until then, any models coded form scratch by the user as well as any other model can be documented by passing the information as an argument to the Documenter’s fill_model_info
method. This can be done with a string or dictionary. For example:
import numpy as np
import torch
import torch.nn.functional as F
class MockDataset(torch.utils.data.Dataset):
def __init__(self, X, y):
self.X = torch.from_numpy(X.astype(np.float32))
self.y = torch.from_numpy(y.astype(np.float32))
self.len = self.X.shape[0]
def __len__(self):
return self.len
def __getitem__(self, idx):
return self.X[idx], self.y[idx]
class PytorchNet(torch.nn.Module):
def __init__(self):
super(PytorchNet, self).__init__()
self.layer1 = torch.nn.Linear(20, 16)
self.layer2 = torch.nn.Linear(16, 8)
self.layer3 = torch.nn.Linear(8, 1)
def forward(self, x):
= torch.relu(self.layer1(x))
x = torch.relu(self.layer2(x))
x = torch.sigmoid(self.layer3(x))
x return x
= PytorchNet()
pytorch_clf
= MockDataset(X, y)
dataloader
= torch.nn.BCELoss()
loss_func = torch.optim.SGD(pytorch_clf.parameters(), lr=0.001, momentum=0.9)
optimizer
for epoch in range(10):
= 0.0
running_loss for i, data in enumerate(dataloader, 0):
= data
_X, _y
optimizer.zero_grad()= pytorch_clf(_X)
y_pred_epoch = loss_func(y_pred_epoch, _y.reshape(1))
loss
loss.backward() optimizer.step()
= ModelCard()
model_doc_pytorch "This model is a neural network consisting of two fully connected layers and ending in a linear layer with a sigmoid activation")
model_doc_pytorch.fill_model_info('model_details']['info'] model_doc_pytorch.show_json()[
'This model is a neural network consisting of two fully connected layers and ending in a linear layer with a sigmoid activation'
Creating a custom Documenter
gingado
users can easily transform their model documentation needs into a Documenter object. The main advantages of doing this are:
- the documentation template becomes a “recyclable” object that can be saved, loaded, and used in other models or code routines; and
- model documentation can be more closely aligned with model creation and training, thus decreasing the probability that the model and its documentation diverge during the process of model development.
A gingado
Documenter must:
- subclass
ggdModelDocumentation
(or implement all its methods if the user does not want to keep a dependency togingado
), - include the actual template for the documentation as a dictionary (with at most two levels of keys) in a class attribute called
template
, - ensure that
template
complies with JSON specifications, - have
file_path
,autofill
andindent_level
as arguments in__init__
, - follow the
scikit-learn
convention of storing the__init__
parameters inself
attributes with the same name, and - implement the
autofill_template
method using thefill_info
method to set the automatically filled information fields.