pengutils.utils package

Pengutils Utils Package

This package contains general-purpose utility modules for querying, filtering, and processing data from the Penguin database. It provides reusable functions and CLI tools for working with event data (execution, syscall, read, write, file descriptor events) as well as other Penguin-related analysis and database tasks.

Usage

You can run these utilities:

  1. Inside the Penguin container: All CLI tools and modules are available by default.

  2. Outside the Penguin container: Clone the repository and install from source to use the utilities independently:

    git clone https://github.com/rehosting/penguin.git
    cd penguin/pengutils
    pip install .
    

Database Expectations

All CLI commands expect to find a plugins.db SQLite database inside the specified results directory (default: ./results/latest/). This database is generated by Penguin’s data collection pipeline and contains all relevant data for querying. If plugins.db is missing, commands will fail with an error.

Running Commands

Commands should be run from the root of your Penguin workspace or any directory where the results folder is accessible. You can specify a different results directory using the --results option if your data is stored elsewhere.

Example CLI usage

execs --procname myproc --fd 3 --filename log.txt --output results.txt
reads --procname myproc --fd 4 --filename input.txt --output reads.txt
writes --procname myproc --fd 5 --filename output.txt --output writes.txt
syscalls --procname myproc --syscall open --errors --output syscalls.txt
fds --procname myproc --fd 3 --output fds.txt
tasks --results ./results/latest --output tasks.txt

Each module provides command-line interfaces and helper functions for specific Penguin data types and analysis tasks.

Submodules