penguin.patch_minimizer module¶

class penguin.patch_minimizer.PatchMinimizer(proj_dir, config_path, output_dir, timeout, max_iters, nworkers, verbose, minimization_target='webserver_start')[source]¶

Bases: object

calculate_network_data(run_index)[source]¶: From a directory, calculate the bytes sent, received, and entropy of received from the guest Consume vpn_{ip}_port files for data amounts (as csv) and vpn_response_{ip}_port files for entropy. Note that ip could be ipv6 with []s. Colons in names are replaced with underscores (weird for ipv6)

config_still_viable(run_index)[source]¶: Compare the results from this run to our baseline. Determine if it’s still viable. If not, we return False, indicating that this config is not valid by our minimization target.

static dicts_overlap(dict1, dict2)[source]¶: Returns a dict of the overlapping keys and values

static diff_dicts(dict1, dict2)[source]¶: Returns the difference of dict1 - dict2 i.e., the keys and values that are in dict1 not in dict2

establish_baseline()[source]¶

For our very first run, we’ll establish the baseline.

First we’ll validate that our provided config meets our expectations, or raise an exception if it fails

IF minimization target is webserver_start:

it must already start a webserver - otherwise we can’t minimize

IF minimization target is coverage

it must produce coverage information (e.g., it should typically have auto_explore patch)
actual limitation: it should have the vpn, nmap, coverage plugins

IF minimization target is network_traffic:

It must generate network traffic
actual requirements: should have the vpn and nmap plugins

After validating the config, run the baseline and do an initial static minimization by removing any pseudofiles that aren’t ever used. Split these into new patches and drop them from patches_to_test

Finally, update self.patches_to_test

static filter_conflicts(final_dict, patch, path='')[source]¶: Filter out any keys that are in the final_dict but with a different value

get_best_patchset()[source]¶: If we don’t assume orthoganality, we have to run every combination of patches Then we could get the best one this way

static lists_overlap(list1, list2)[source]¶

static percentile(data, percentile)[source]¶

remove_shadowed_options()[source]¶: walk through each patch and remove any options that the unpatched config would overwrite we remove the old patch and generate a new one

run()[source]¶

run_config(patchset, run_index)[source]¶: This function runs a single configuration and returns the score Runs in parallel… so be careful with shared resources

run_configs(patchsets)[source]¶

Parameters:: patchsets (List[str])

split_overlapping_patches()[source]¶: If we have overlapping patches, we attempt to preserve orthoganality by splitting them Ensuring that each unique configuration option is in only one patch However, in a real config only the last option is considered. So we should throw away options that are not the last one.

verify_coverage(run_index)[source]¶: Given the results of a run, determine if it’s still viable based on coverage as compared to the baseline

verify_net_traffic(run_index)[source]¶: Given the results of a run, determine if it’s still viable based on network traffic as compared to the baseline

verify_www_started(run_index)[source]¶: Check netbinds log to ensure we saw a webserver bind

penguin.patch_minimizer.calculate_entropy(buffer)[source]¶

Parameters:: buffer (bytes)
Return type:: float

penguin.patch_minimizer.minimize(proj_dir, config_path, output_dir, timeout, max_iters=1000, nworkers=1, verbose=False)[source]¶