Sample package#

This package holds mostly data objects of primary interest (Common Criteria, FIPS), or assisting objects like CPE, CVE, etc. The objects mostly hold data and allow for serialization, but can also perform some basic transformations.

Tip

The examples related to this package can be found at common criteria notebook and fips notebook.

CCCertificate#

class sec_certs.sample.CCCertificate(status, category, name, manufacturer, scheme, security_level, not_valid_before, not_valid_after, report_link, st_link, cert_link, manufacturer_web, protection_profiles, maintenance_updates, state, pdf_data, heuristics)#

Data structure for common criteria certificate. Contains several inner classes that layer the data logic. Can be serialized into/from json (ComplexSerializableType) or pandas (PandasSerializableType). Is basic element of CCDataset. The functionality is mostly related to holding data and transformations that the certificate can handle itself. CCDataset class then instrument this functionality.

class DocumentState(download_ok: 'bool' = False, convert_garbage: 'bool' = False, convert_ok: 'bool' = False, extract_ok: 'bool' = False, pdf_hash: 'str | None' = None, txt_hash: 'str | None' = None, _pdf_path: 'Path | None' = None, _txt_path: 'Path | None' = None)#
class Heuristics(extracted_versions=None, cpe_matches=None, verified_cpe_matches=None, related_cves=None, cert_lab=None, cert_id=None, st_references=<factory>, report_references=<factory>, annotated_references=None, extracted_sars=None, direct_transitive_cves=None, indirect_transitive_cves=None, scheme_data=None)#

Class for various heuristics related to CCCertificate

class InternalState(report=None, st=None, cert=None)#

Holds internal state of the certificate, whether downloads and converts of individual components succeeded. Also holds information about errors and paths to the files.

class MaintenanceReport(maintenance_date, maintenance_title, maintenance_report_link, maintenance_st_link)#

Object for holding maintenance reports.

class PdfData(report_metadata=None, st_metadata=None, cert_metadata=None, report_frontpage=None, st_frontpage=None, cert_frontpage=None, report_keywords=None, st_keywords=None, cert_keywords=None, report_filename=None, st_filename=None, cert_filename=None)#

Class that holds data extracted from pdf files.

property cert_lab#

Returns labs for which certificate data was parsed.

filename_cert_id(scheme)#

Get cert_id candidates from the matches in the report filename and cert filename.

frontpage_cert_id(scheme)#

Get cert_id candidate from the frontpage of the report.

keywords_cert_id(scheme)#

Get cert_id candidates from the keywords matches in the report and cert.

metadata_cert_id(scheme)#

Get cert_id candidates from the report metadata.

property actual_sars#

Computes actual SARs. First, SARs implied by EAL are computed. Then, these are augmented with heuristically extracted SARs :return Optional[Set[SAR]]: Set of actual SARs of a certificate, None if empty

compute_heuristics_cert_id()#

Compute the heuristics cert_id of this cert, using several methods.

The candidate cert_ids are extracted from the frontpage, PDF metadata, filename, and keywords matches.

Finally, the cert_id is canonicalized.

compute_heuristics_cert_lab()#

Fills in the heuristically obtained evaluation laboratory into attribute in heuristics class.

compute_heuristics_version()#

Fills in the heuristically obtained version of certified product into attribute in heuristics class.

static convert_cert_pdf(cert)#

Converts the pdf certificate to txt, given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to convert the certificate for

Return CCCertificate:

the modified certificate with updated state

static convert_report_pdf(cert)#

Converts the pdf certification report to txt, given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to convert the pdf report for

Return CCCertificate:

the modified certificate with updated state

static convert_st_pdf(cert)#

Converts the pdf security target to txt, given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to convert the pdf security target for

Return CCCertificate:

the modified certificate with updated state

property dgst#

Computes the primary key of the sample using first 16 bytes of SHA-256 digest

static download_pdf_cert(cert)#

Downloads pdf of the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to download the pdf of

Return CCCertificate:

returns the modified certificate with updated state

static download_pdf_report(cert)#

Downloads pdf of certification report given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to download the pdf report for

Return CCCertificate:

returns the modified certificate with updated state

static download_pdf_st(cert)#

Downloads pdf of security target given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to download the pdf security target for

Return CCCertificate:

returns the modified certificate with updated state

property eal#

Returns EAL of certificate if it was extracted, None otherwise.

static extract_cert_pdf_keywords(cert)#

Matches regular expressions in txt obtained from the certificate and extracts the matches into attribute. Static method to allow for parallelization

Parameters:

cert (CCCertificate) – certificate to extract the keywords for.

Return CCCertificate:

the modified certificate with extracted keywords.

static extract_cert_pdf_metadata(cert)#

Extracts metadata from certificate pdf given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to extract the metadata for.

Return CCCertificate:

the modified certificate with updated state

static extract_report_pdf_frontpage(cert)#

Extracts data from certification report pdf frontpage given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to extract the frontpage data for.

Return CCCertificate:

the modified certificate with updated state

static extract_report_pdf_keywords(cert)#

Matches regular expressions in txt obtained from certification report and extracts the matches into attribute. Static method to allow for parallelization

Parameters:

cert (CCCertificate) – certificate to extract the keywords for.

Return CCCertificate:

the modified certificate with extracted keywords.

static extract_report_pdf_metadata(cert)#

Extracts metadata from certification report pdf given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to extract the metadata for.

Return CCCertificate:

the modified certificate with updated state

static extract_st_pdf_keywords(cert)#

Matches regular expressions in txt obtained from security target and extracts the matches into attribute. Static method to allow for parallelization

Parameters:

cert (CCCertificate) – certificate to extract the keywords for.

Return CCCertificate:

the modified certificate with extracted keywords.

static extract_st_pdf_metadata(cert)#

Extracts metadata from security target pdf given the certificate. Staticmethod to allow for parallelization.

Parameters:

cert (CCCertificate) – cert to extract the metadata for.

Return CCCertificate:

the modified certificate with updated state

classmethod from_dict(dct)#

Deserializes dictionary into CCCertificate

classmethod from_html_row(row, status, category)#

Creates a CC sample from html row of commoncriteria.org webpage.

merge(other, other_source=None)#

Merges with other CC sample. Assuming they come from different sources, e.g., csv and html. Assuming that html source has better protection profiles, they overwrite CSV info On other values the sanity checks are made.

property pandas_tuple#

Returns tuple of attributes meant for pandas serialization

set_local_paths(report_pdf_dir, st_pdf_dir, cert_pdf_dir, report_txt_dir, st_txt_dir, cert_txt_dir)#

Sets paths to files given the requested directories

Parameters:
  • report_pdf_dir (Optional[Union[str, Path]]) – Directory where pdf reports shall be stored

  • st_pdf_dir (Optional[Union[str, Path]]) – Directory where pdf security targets shall be stored

  • cert_pdf_dir (Optional[Union[str, Path]]) – Directory where pdf certificates shall be stored

  • report_txt_dir (Optional[Union[str, Path]]) – Directory where txt reports shall be stored

  • st_txt_dir (Optional[Union[str, Path]]) – Directory where txt security targets shall be stored

  • cert_txt_dir (Optional[Union[str, Path]]) – Directory where txtcertificates shall be stored

FIPSCertificate#

class sec_certs.sample.FIPSCertificate(cert_id, web_data=None, pdf_data=None, heuristics=None, state=None)#

Data structure for common FIPS 140 certificate. Contains several inner classes that layer the data logic. Can be serialized into/from json (ComplexSerializableType). Is basic element of FIPSDataset. The functionality is mostly related to holding data and transformations that the certificate can handle itself. FIPSDataset class then instrument this functionality.

class Heuristics(algorithms=<factory>, extracted_versions=<factory>, cpe_matches=None, verified_cpe_matches=None, related_cves=None, policy_prunned_references=<factory>, module_prunned_references=<factory>, policy_processed_references=<factory>, module_processed_references=<factory>, direct_transitive_cves=None, indirect_transitive_cves=None)#

Data structure that holds data obtained by processing the certificate and applying various heuristics.

property algorithm_numbers#

Returns numbers of algorithms

class InternalState(module_download_ok=False, policy_download_ok=False, policy_convert_garbage=False, policy_convert_ok=False, module_extract_ok=False, policy_extract_ok=False, policy_pdf_hash=None, policy_txt_hash=None)#

Holds state of the FIPSCertificate

class PdfData(keywords=<factory>, policy_metadata=<factory>)#

Data structure that holds data obtained from scanning pdf files (or their converted txt documents).

property certlike_algorithm_numbers#

Returns numbers of certificates from keywords[“fips_certlike”][“Certlike”]

class ValidationHistoryEntry(date: 'date', validation_type: "Literal['initial', 'update']", lab: 'str')#
class WebData(module_name=None, validation_history=None, vendor_url=None, vendor=None, certificate_pdf_url=None, module_type=None, standard=None, status=None, level=None, caveat=None, exceptions=None, embodiment=None, description=None, tested_conf=None, hw_versions=None, fw_versions=None, sw_versions=None, mentioned_certs=None, historical_reason=None, date_sunset=None, revoked_reason=None, revoked_link=None)#

Data structure for data obtained from scanning certificate webpage at NIST.gov

compute_heuristics_version()#

Heuristically computes the version of the product.

static convert_policy_pdf(cert)#

Converts policy pdf -> txt

property dgst#

Returns primary key of the certificate, its id.

static extract_policy_pdf_keywords(cert)#

Extract keywords from policy document

static extract_policy_pdf_metadata(cert)#

Extract the PDF metadata from the security policy.

static get_algorithms_from_policy_tables(cert)#

Retrieves IDs of algorithms from tables inside security policy pdfs. External library is used to handle this.

prune_referenced_cert_ids()#

This method goes through all IDs (numbers) that correspond to FIPS Certificates and are stored in pdf_data.keywords or web_data.mentioned_certs. It performs prunning of these attributes and fills attributes heuristics.prunned_module_references and heuristics.prunned_policy_references. These variables are further processed and Reference objects are created from them.