Creating Custom Dimension Reduction Models in Acadia

Introduction

Dimension reduction is a critical process in data science, especially when dealing with high-dimensional data. It simplifies the data, reducing its dimensions while trying to preserve as much information as possible. In Acadia, custom dimension reduction models can be implemented to tailor the dimension reduction process to specific datasets or analytical requirements. This guide will walk you through creating a custom dimension reduction model in Acadia.

Implementing a Custom Model

To implement a custom dimension reduction model, you will extend a base class provided by Acadia and override the necessary methods to apply your dimension reduction algorithm.

Base Class

All custom dimension reduction models should inherit from the DimensionReductionModel abstract base class. This class requires the implementation of the reduce_dimensions method, which defines how the dimensions of the data should be reduced.

from abc import ABC, abstractmethod
from acadia.types import IdToEmbeddingDictType
 
class DimensionReductionModel(ABC):
    @abstractmethod
    def reduce_dimensions(
        self, datum_id_to_embedding_dict: IdToEmbeddingDictType
    ) -> IdToEmbeddingDictType:
        """
        Abstract method that reduces the dimensions of embeddings.
 
        Args:
            datum_id_to_embedding_dict (IdToEmbeddingDictType): A dictionary mapping datum IDs to their embeddings.
 
        Returns:
            IdToEmbeddingDictType: A dictionary mapping datum IDs to their reduced embeddings.
        """
        pass

Example Implementation

Here is an example of a simple dimension reduction model that reduces the dimensionality of embeddings by truncating them to a specified number of dimensions and optionally normalizing them.

from acadia.types import IdToEmbeddingDictType, EmbeddingType
from acadia.models.dimension_reduction_models.base import DimensionReductionModel
 
class MockDimensionReductionModel(DimensionReductionModel):
    def __init__(self, dims=2, normalize=True, hyperparameter_1=0.5, hyperparameter_2=0.5):
        self.dims = dims
        self.normalize = normalize
        self.hyperparameter_1 = hyperparameter_1
        self.hyperparameter_2 = hyperparameter_2
 
    def normalize_id_to_embedding(self, id_to_embedding_dict: IdToEmbeddingDictType) -> IdToEmbeddingDictType:
        max_values = [max(embedding[i] for embedding in id_to_embedding_dict.values()) for i in range(self.dims)]
        return {id: [value / max_value for value, max_value in zip(embedding, max_values)] for id, embedding in id_to_embedding_dict.items()}
 
    def reduce_dimensions(self, id_to_embedding_dict: IdToEmbeddingDictType) -> IdToEmbeddingDictType:
        reduced_embeddings = {id: embedding[:self.dims] for id, embedding in id_to_embedding_dict.items()}
        if self.normalize:
            reduced_embeddings = self.normalize_id_to_embedding(reduced_embeddings)
        return reduced_embeddings

Tips for Implementing Custom Models

Understanding Data: Know your data well to choose or develop the most appropriate dimension reduction technique.
Normalization: Consider whether to normalize data before or after dimension reduction, as it can significantly impact the results.
Parameter Tuning: Experiment with different parameters to find the optimal settings for your specific needs.
Validation: Always validate your model to ensure it performs well and preserves necessary information after dimension reduction.

Integration into Acadia

Once your custom model is implemented, it can be integrated into Acadia’s data processing pipeline. Use it to reduce dimensions of embeddings in datasets, improving efficiency in downstream tasks such as clustering, visualization, or machine learning.

By creating custom dimension reduction models, you can fine-tune the preprocessing steps to better fit your analytical goals and data characteristics, leveraging Acadia's flexible architecture.

Dimension Reduction Embedding Datums