Dataset package#

This package exposes Datasets of various Samples, both primary (Common Criteria, FIPS) and auxillary (CVEs, CPEs, …)

This documentation doesn’t provide full API reference for all members of dataset package. Instead, it concentrates on the Dataset that are immediately exposed to the users. Namely, we focus on CCDataset, FIPSDataset and their abstract base class Dataset.

Tip

The examples related to this package can be found at common criteria notebook and fips notebook.

CCDataset#

class sec_certs.dataset.dataset.Dataset(certs={}, root_dir=PosixPath('/this/is/dummy/nonexisting/path'), name=None, description='', state=None, auxillary_datasets=None)#

Base class for dataset of certificates from CC and FIPS 140 schemes. Layouts public functions, the processing pipeline and common operations on the dataset and certs.

class DatasetInternalState(meta_sources_parsed: 'bool' = False, artifacts_downloaded: 'bool' = False, pdfs_converted: 'bool' = False, auxillary_datasets_processed: 'bool' = False, certs_analyzed: 'bool' = False)#
analyze_certificates()#
Does two things:
  • Extracts data from certificates (keywords, etc.)

  • Computes various heuristics on the certificates.

property auxillary_datasets_dir#

Path to directory with auxillary datasets.

property certs_dir#

Returns directory that holds files associated with certificates

compute_cpe_heuristics(download_fresh_cpes=False)#

Computes matching CPEs for the certificates.

Computes CVEs for the certificates, given their CPE matches.

convert_all_pdfs(fresh=True)#

Converts all pdf artifacts to txt, given the certification scheme.

copy_dataset(new_root_dir)#

Copies all dataset files to new_root_dir and adjusts all paths internally. Keeps the artifacts from the original location. :param str | Path new_root_dir: path to directory where the new dataset shall be stored.

download_all_artifacts(fresh=True)#

Downloads all artifacts related to certification in the given scheme.

enrich_automated_cpes_with_manual_labels()#

Prior to CVE matching, it is wise to expand the database of automatic CPE matches with those that were manually assigned.

classmethod from_web(url, progress_bar_desc, filename)#

Fetches a fully processed dataset instance from static site that hosts it.

get_keywords_df(var)#

Get dataframe of keyword hits for attribute (var) that is member of PdfData class.

move_dataset(new_root_dir)#

Moves all dataset files to new_root_dir and adjusts all paths internally. Deletes the artifacts from the original location. :param str | Path new_root_dir: path to directory where the new dataset shall be stored.

abstract process_auxillary_datasets(download_fresh=False)#

Processes all auxillary datasets (CPE, CVE, …) that are required during computation.

property root_dir#

Directory that will hold the serialized dataset files.

update_with_certs(certs)#

Enriches the dataset with certs :param List[CommonCriteriaCert] certs: new certs to include into the dataset.

property web_dir#

Path to certification-artifacts posted on web.

class sec_certs.dataset.CCDataset(certs={}, root_dir=PosixPath('/this/is/dummy/nonexisting/path'), name=None, description='', state=None, auxillary_datasets=None)#

Class that holds CommonCriteriaCert. Serializable into json, pandas, dictionary. Conveys basic certificate manipulations and dataset transformations. Many private methods that perform internal operations, feel free to exploit them.

property active_csv_tuples#

Returns List Tuple[str, Path] where first element is name of csv file and second element is its Path. The files correspond to csv files downloaded from CC website that list all active certificates.

property active_html_tuples#

Returns List Tuple[str, Path] where first element is name of html file and second element is its Path. The files correspond to html files parsed from CC website that list all active certificates.

property archived_csv_tuples#

Returns List Tuple[str, Path] where first element is name of csv file and second element is its Path. The files correspond to csv files downloaded from CC website that list all archived certificates.

property archived_html_tuples#

Returns List Tuple[str, Path] where first element is name of html file and second element is its Path. The files correspond to html files parsed from CC website that list all archived certificates.

classmethod from_web_latest()#

Fetches the fresh snapshot of CCDataset from seccerts.org

get_certs_from_web(to_download=True, keep_metadata=True, get_active=True, get_archived=True)#

Downloads CSV and HTML files that hold lists of certificates from common criteria website. Parses these files and constructs CommonCriteriaCert objects, fills the dataset with those.

Parameters
  • to_download (bool) – If CSV and HTML files shall be downloaded (or existing files utilized), defaults to True

  • keep_metadata (bool) – If CSV and HTML files shall be kept on disk after download, defaults to True

  • get_active (bool) – If active certificates shall be parsed, defaults to True

  • get_archived (bool) – If archived certificates shall be parsed, defaults to True

property mu_dataset_dir#

Returns directory that holds dataset of maintenance updates

property mu_dataset_path#

Returns json that holds the datase of maintenance updates

property pp_dataset_path#

Returns directory that holds files associated with Protection profiles

process_auxillary_datasets(download_fresh=False)#

Processes all auxillary datasets needed during computation. On top of base-class processing, CC handles protection profiles and maintenance updates.

process_maintenance_updates(to_download=True)#

Downloads or loads from json a dataset of maintenance updates. Runs analysis on that dataset if it’s not completed. :return CCDatasetMaintenanceUpdates: the resulting dataset of maintenance updates

process_protection_profiles(to_download=True, keep_metadata=True)#

Downloads new snapshot of dataset with processed protection profiles (if it doesn’t exist) and links PPs with certificates within self. Assigns PPs to all certificates

Parameters
  • to_download (bool) – If dataset should be downloaded or fetched from json, defaults to True

  • keep_metadata (bool) – If json related to the PP dataset should be kept on drive, defaults to True

Raises

RuntimeError – When building of PPDataset fails

property reports_dir#

Returns directory that holds files associated with certification reports

property reports_pdf_dir#

Returns directory that holds PDFs associated with certification reports

property reports_txt_dir#

Returns directory that holds TXTs associated with certification reports

property targets_dir#

Returns directory that holds files associated with security targets

property targets_pdf_dir#

Returns directory that holds PDFs associated with security targets

property targets_txt_dir#

Returns directory that holds TXTs associated with security targets

to_pandas()#

Return self serialized into pandas DataFrame

FIPSDataset#

class sec_certs.dataset.FIPSDataset(certs={}, root_dir=PosixPath('/this/is/dummy/nonexisting/path'), name=None, description='', state=None, auxillary_datasets=None)#

Class for processing of FIPSCertificate samples. Inherits from ComplexSerializableType and base abstract Dataset class.

classmethod from_web_latest()#

Fetches the fresh snapshot of FIPSDataset from mirror.

process_auxillary_datasets(download_fresh=False)#

Processes all auxillary datasets (CPE, CVE, …) that are required during computation.