> ## Documentation Index
> Fetch the complete documentation index at: https://mintlify.com/NationalSecurityAgency/ghidra/llms.txt
> Use this file to discover all available pages before exploring further.

# BSim Database

> Behavioral similarity search for functions and executables

## Overview

Ghidra's **BSim (Behavioral Similarity)** Database allows reverse engineers to ingest metadata about previously analyzed binary executables to a central server or local database. The database can then be queried to quickly discover previously seen functions and libraries in new, unknown executables.

## Key Features

<CardGroup cols={2}>
  <Card title="Compilation Tolerant" icon="shield-check">
    Queries tolerate variations in function compilation
  </Card>

  <Card title="Fast Indexing" icon="bolt">
    All records are indexed for quick queries, even with millions of functions
  </Card>

  <Card title="Decompiler-Based" icon="code">
    Uses p-code from Ghidra's decompiler for robust matching
  </Card>

  <Card title="Nearest Neighbor" icon="magnifying-glass">
    Supports fuzzy matching with configurable similarity thresholds
  </Card>
</CardGroup>

## How It Works

### Feature Extraction

BSim extracts features from a **concise description of function data-flow**, not explicit machine instructions:

* Based on Ghidra's intermediate representation language (**p-code**)
* Generated by the Ghidra decompiler
* Graph-based abstract syntax tree representation
* Normalized to minimize compilation variation impact

### Normalized Comparisons

The resulting function descriptions are normalized to tolerate variations due to:

<AccordionGroup>
  <Accordion title="Equivalent Instructions">
    Different machine instructions that perform the same operation
  </Accordion>

  <Accordion title="Storage Locations">
    Variations in register allocation, stack usage, and memory locations
  </Accordion>

  <Accordion title="Instruction Ordering">
    Compiler-dependent instruction reordering
  </Accordion>

  <Accordion title="Compiler Transformations">
    Many forms of compiler optimization and transformation
  </Accordion>

  <Accordion title="Obfuscation">
    Even some forms of deliberate code obfuscation
  </Accordion>
</AccordionGroup>

### Text Retrieval Strategies

Records are indexed using text retrieval strategies enabling:

* **Nearest neighbor queries**: Features don't need exact matches
* **Configurable similarity**: Set percentage thresholds for matches
* **Functional tolerance**: Match even when source code has changed slightly
* **Microsecond queries**: Single function results typically return in microseconds

<Info>
  For a database containing **millions of functions**, query results typically return in **microseconds**.
</Info>

## Database Technologies

BSim supports three database backends:

| Backend           | Use Case    | Features                                        |
| ----------------- | ----------- | ----------------------------------------------- |
| **PostgreSQL**    | Production  | Robust, multi-connection, fault-tolerant server |
| **Elasticsearch** | Distributed | Scalable across clusters, distributed indexing  |
| **H2 (local)**    | Development | Convenience for small personal collections      |

<Note>
  PostgreSQL server software is currently only supported on **Linux** and **macOS**. Elasticsearch must be obtained separately. H2 databases are supported on all platforms.
</Note>

## Integration with Ghidra

### Ghidra Server Integration

<Steps>
  <Step title="Repository Integration">
    Ingest from Ghidra Server or local project repositories
  </Step>

  <Step title="Query Results">
    Results reference executables within repositories
  </Step>

  <Step title="Command-Line Tools">
    Easy ingestion using the `bsim` command script
  </Step>
</Steps>

### Plugin Client

Ghidra includes a plugin client that integrates:

* Query dialog directly in the main CodeBrowser
* Results windows with side-by-side comparison
* Direct navigation to matching functions

```bash theme={null}
# Command-line ingestion example
bsim createdatabase postgresql://localhost/mydb
bsim ingest postgresql://localhost/mydb /path/to/ghidra/project
```

### Query API

Ghidra provides a Java API for:

* Incorporating queries into analyst scripts
* Programmatic ingestion of executables
* Marshaling queries and results between Ghidra sessions and BSim servers

```java theme={null}
// Source: BSimOverview.html:168-177
// The API allows queries and ingest to be incorporated
// into analyst scripts, marshaling data between an active
// Ghidra session and a BSim server
```

## Database Configuration

BSim databases can be configured for different scenarios:

<CardGroup cols={2}>
  <Card title="Database Setup" icon="database">
    Create and configure PostgreSQL, H2, or Elasticsearch backends
  </Card>

  <Card title="Feature Weights" icon="weight-scale">
    Customize feature weights for domain-specific matching
  </Card>

  <Card title="Ingest Process" icon="arrow-down-to-bracket">
    Batch ingest executables from repositories
  </Card>

  <Card title="Query Interface" icon="magnifying-glass-chart">
    Interactive and programmatic query options
  </Card>
</CardGroup>

## Querying BSim Database

### Query Types

1. **Single Function Query**: Search for similar functions to a specific function
2. **Batch Query**: Query multiple functions at once
3. **Overview Query**: Get database statistics and metadata
4. **Executable Query**: Find similar executables in the database

### Query Parameters

<ParamField path="similarity" type="number" default="0.7">
  Similarity threshold (0.0 to 1.0) for matching functions
</ParamField>

<ParamField path="confidence" type="number" default="0.0">
  Confidence threshold to filter low-quality matches
</ParamField>

<ParamField path="maxResults" type="number" default="100">
  Maximum number of results to return
</ParamField>

### Using the Plugin

<Steps>
  <Step title="Select Function">
    Right-click on a function in the CodeBrowser listing
  </Step>

  <Step title="Launch Query">
    Select **BSim → Search for Similar Functions**
  </Step>

  <Step title="Configure Search">
    Set similarity threshold and other parameters
  </Step>

  <Step title="Review Results">
    Examine matches in the results window with similarity scores
  </Step>
</Steps>

## Ingesting Executables

### Prerequisites

* Executables must be analyzed in Ghidra
* Decompilation must be run on functions
* Database must be created and accessible

### Ingest Workflow

```bash theme={null}
# Create database
bsim createdatabase postgresql://localhost:5432/malware_db

# Ingest from Ghidra repository
bsim ingest postgresql://localhost:5432/malware_db ghidra://server/repo

# Ingest from local project
bsim ingest postgresql://localhost:5432/malware_db /home/user/ghidra_project
```

### Ingest Options

<AccordionGroup>
  <Accordion title="Function Filtering">
    Filter which functions to ingest based on size, complexity, or other criteria
  </Accordion>

  <Accordion title="Metadata Tags">
    Associate custom metadata tags with ingested executables
  </Accordion>

  <Accordion title="Batch Processing">
    Process large numbers of executables automatically
  </Accordion>

  <Accordion title="Update Mode">
    Re-ingest updated executables without duplicates
  </Accordion>
</AccordionGroup>

## Command-Line Reference

The `bsim` command-line utility provides comprehensive database management:

| Command           | Description                        |
| ----------------- | ---------------------------------- |
| `createdatabase`  | Initialize a new BSim database     |
| `dropdatabase`    | Delete an existing database        |
| `ingest`          | Add executables to database        |
| `update`          | Update existing executable records |
| `delete`          | Remove executables from database   |
| `queryfunctions`  | Query for similar functions        |
| `queryexe`        | Query for similar executables      |
| `dumpdb`          | Export database contents           |
| `installmetadata` | Install database schema            |

<Warning>
  Ensure you have proper database permissions before running administrative commands.
</Warning>

## Advanced Features

### Features and Weights

Customize how BSim weights different aspects of function behavior:

* **Data-flow features**: Weight importance of data operations
* **Control-flow features**: Emphasize branching patterns
* **Call graph features**: Consider function call relationships
* **Constant features**: Factor in constant values used

### Performance Optimization

<Tip>
  For optimal performance with large databases:

  * Use PostgreSQL or Elasticsearch for production
  * Configure appropriate database indexes
  * Allocate sufficient memory to database server
  * Use batch queries when analyzing multiple functions
</Tip>

## Use Cases

<CardGroup cols={2}>
  <Card title="Malware Analysis" icon="virus">
    Identify known malware families and variants
  </Card>

  <Card title="Vulnerability Research" icon="bug">
    Find vulnerable code patterns across executables
  </Card>

  <Card title="Library Detection" icon="books">
    Recognize commercial and open-source libraries
  </Card>

  <Card title="Code Reuse" icon="recycle">
    Track code reuse and software lineage
  </Card>
</CardGroup>

## Source Code References

```bash theme={null}
# Main implementation
~/workspace/source/Ghidra/Features/BSim/

# Help documentation  
BSim/src/main/help/help/topics/BSim/

# Database schemas
BSim/data/

# Command-line scripts
BSim/support/
```

## Database Maintenance

### Regular Tasks

* **Backup**: Regularly backup your BSim database
* **Vacuum**: Run database optimization (PostgreSQL)
* **Monitor**: Track database size and query performance
* **Update**: Keep ingested executables synchronized with analysis

### Troubleshooting

<AccordionGroup>
  <Accordion title="Slow Queries">
    Check database indexes, increase memory allocation, or optimize feature weights
  </Accordion>

  <Accordion title="Connection Issues">
    Verify network connectivity, database server status, and authentication credentials
  </Accordion>

  <Accordion title="Ingest Failures">
    Ensure executables are fully analyzed and decompiled in Ghidra
  </Accordion>
</AccordionGroup>

## Next Steps

<CardGroup cols={2}>
  <Card title="Debugger" icon="bug" href="/features/debugger">
    Perform dynamic analysis on executables
  </Card>

  <Card title="Version Tracking" icon="code-compare" href="/features/version-tracking">
    Track changes between program versions
  </Card>
</CardGroup>
