# data_model_validation

**Repository Path**: mirrors_mitre/data_model_validation

## Basic Information

- **Project Name**: data_model_validation
- **Description**: This tool performs data validation and reporting for a dataset using the ACL CODI data model.
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-04-25
- **Last Updated**: 2026-05-17

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# ACL CODI Data Model Validation Script

This tool performs data validation and reporting for a dataset using the ACL CODI data model.

Checks currently implemented:
 - Does the table exist?
 - Does the table include all of the defined attributes? Are the attributes in the requested format?
 - How many rows are in the table?
 - Are the primary keys in each table unique?
 - Are the attributes with allowed values populated with entries that align with the valuesets?
 - Are any required attributes ever missing?
 - Are any individuals represented in SDE tables without corresponding demographic information?
 - Are the volume of records and trends plausible?
 - Availability of enrollment and delivery data for assets
 - Availability of enrollment and delivery data for programs
 - Number of asset types represented
 - Number of programs represented
 - Missing demographic values

## Setup and Running
The validation script runs as a Jupyter notebook to include results and images in one place. A script is provided to run the notebook against a folder containing the relevant CSVs and export an html report.

### Set up a virtual environment _(Optional, but recommended)_

It can be helpful to set up a virtual environment to isolate project dependencies from system dependencies.
There are a few libraries that can do this, but this documentation will stick with `venv` since that is included
in the Python Standard Library.

```shell
# Navigate to the project folder
cd data_model_validation/
# Create a virtual environment in a `venv/` folder
python -m venv venv/
# Activate the virtual environment
## Mac/Linux:
source venv/bin/activate
## Windows:
venv\Scripts\activate
```

### Installing dependencies

```shell
pip install --upgrade pip
pip install -r requirements.txt
```

### Running

```shell
# Windows:
run.bat path-to-folder-with-data

# Mac/Linux:
./run.sh path-to-folder-with-data
```

Examples:
```shell
# Windows:
run.bat C:\Data\

# Mac/Linux:
./run.sh /home/user/codiData/
```

The output will be created in `validation_results.html`

Copyright 2025 The MITRE Corporation. All Rights Reserved. Approved for Public Release: 25-0288. Distribution Unlimited.


## License

Copyright 2025 The MITRE Corporation

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.