> 📦 **Note:** This documentation refers to the `wine-analysis-package` branch, which contains the most accessible and minimal version of the GC-MS Wine Analysis tools. > It is intended for testing and basic usage. > Other branches may contain experimental or extended versions # General Documentation Welcome to the **Wine Analysis Library** documentation! ## Overview The Wine Analysis Library is a comprehensive toolkit designed for analyzing and processing wine-related data. The library provides various modules to facilitate data loading, preprocessing, dimensionality reduction, classification, and visualization of wine chromatograms and related datasets. ### Key Features - **Data Loading & Preprocessing**: Load and preprocess wine datasets efficiently using custom utilities. - **Dimensionality Reduction**: Apply various dimensionality reduction techniques like PCA (Principal Component Analysis) to simplify complex datasets. - **Classification**: Use machine learning classifiers to categorize wine samples based on their chemical compositions or other features. - **Visualization**: Generate informative visualizations, including chromatograms and scatter plots, to explore and present the data effectively. - **Analysis**: Perform detailed analysis on wine data, including peak detection and alignment across samples. ## Installation This repository contains multiple development branches for different use cases and experimental pipelines. The wine-analysis-package branch is the simplest and most stable version, specifically intended for basic GC-MS data analysis workflows. It includes the core functionalities for chromatogram preprocessing, alignment, classification, and visualization, and is ideal for most users working with wine or chemical analysis datasets. To use this version, make sure to clone and switch to this branch: ```bash # Clone the repository and switch to the correct branch git clone https://github.com/pougetlab/wine_analysis.git cd wine_analysis git checkout wine-analysis-package # (Optional) Create and activate a virtual environment python3 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install the package in editable mode pip install -e . # Install dependencies pip install -r requirements.txt ``` Some modules in this library may require extra dependencies that are not automatically listed in requirements.txt. If you encounter import errors when running scripts, make sure to install the following commonly used packages: ```bash pip install torch torchvision pynndescent netCDF4 seaborn umap-learn tqdm scikit-optimize ``` ## Preparing the GC-MS Data Before running the analysis scripts, your GC-MS data must be prepared in a specific directory structure. ### Required Format Each sample must be stored in its own `.D` folder For example: ``` datasets/ ├── PINOT_NOIR/ │ ├── Sample1.D/ │ ├── Sample2.D/ │ └── ... └── ... ``` Then, within each sample there should be a CSV file like this: ![csv_content.png](images/csv_content.png) , where the first column is the retention time and the next columns are the intensity signals of each m/z channel (starting at 40 in this example). ## Running Scripts To execute one of the analysis scripts, navigate to the root of the project (where the scripts/ directory is located) and run the script using Python. For example, to run the Pinot Noir classification pipeline: ```bash python scripts/pinot_noir/train_test_pinot_noir.py ``` Note: Each script is documented in detail in the corresponding section of the online documentation.