Tagging
Creating Custom Tagging Models

Creating Custom Tagging Models in Acadia

Overview

In Acadia, tagging models are essential tools for automatically applying structured labels (tags) to datums within datasets. These models use predefined logic or machine learning algorithms to determine the relevance of topics to specific data points. While Acadia provides some built-in tagging models, the framework also supports the creation of custom tagging models. This capability allows users to tailor the tagging logic to fit specific data characteristics or business requirements.

What is a Custom Tagging Model?

A custom tagging model in Acadia is a user-defined model that implements the Acadia tagging interface. It enables users to define their own logic for how data points in a dataset are tagged with topics. This flexibility is crucial for applications where standard tagging models do not suffice due to unique data types or complex tagging criteria.

Implementing a Custom Tagging Model

To create a custom tagging model, you must define a class that extends the TaggingModel abstract base class (ABC) provided by Acadia. This class must implement the generate_tag_tuples method, which is responsible for generating the tags for each datum in the dataset.

Requirements

  • Inherit from TaggingModel: Your custom model must inherit from the TaggingModel ABC.
  • Implement generate_tag_tuples: You need to provide a concrete implementation of the generate_tag_tuples method, which yields batches of tag tuples.

Understanding TagTupleBatchType and TagContentType

  • TagContentType: This is a dictionary type where each key-value pair represents metadata associated with the tag. The keys (TagContentKey) are strings that denote the attribute names, and the values (TagContentValue) can be any data type relevant to the tagging context.

  • TagTupleType: This tuple represents a single instance of a tag application, consisting of three elements: a Datum object, a Topic object, and a TagContentType dictionary containing the tag's metadata.

  • TagTupleBatchType: A list of TagTupleType instances. This list represents a batch of tags generated by the tagging model in one iteration of the generate_tag_tuples method. Batching helps in processing large datasets efficiently by grouping tag assignments.

Step-by-Step Guide

  1. Import Necessary Classes: Import the TaggingModel ABC from Acadia's tagging models module.

    from acadia.models.tagging_models.base import TaggingModel
  2. Define the Custom Model: Define your model by extending TaggingModel. Implement the required methods.

    from typing import Generator, List
    from acadia.database.schemas import Datum, Topic
    from acadia.types import TagTupleBatchType, TagContentType
    from sqlalchemy.orm import Query
     
    class MyCustomTaggingModel(TaggingModel):
        def generate_tag_tuples(
            self, datums: Query, topics: List[Topic]
        ) -> Generator[TagTupleBatchType, None, None]:
            # Implementation of your custom tagging logic
            for datum in datums:
                for topic in topics:
                    # Example logic to determine if a tag should be applied
                    should_tag = some_custom_logic(datum, topic)
                    if should_tag:
                        tag_content = {"example_meta": "value"}
                        yield (datum, topic, tag_content)
  3. Integrate with Acadia: Once your custom model is defined, you can use it just like any built-in model.

    custom_model = MyCustomTaggingModel()
    acadia.tag.tag_datums(dataset, custom_model)

Best Practices

  • Testing: Thoroughly test your custom tagging model with different datasets to ensure it behaves as expected.
  • Performance Optimization: Consider the efficiency of your tagging logic, especially if dealing with large datasets.
  • Documentation: Document your custom model's logic and configurations clearly for maintenance and future modifications.

By creating custom tagging models, users can leverage the full potential of Acadia's tagging capabilities to meet specialized data tagging needs, enhancing the accuracy and relevance of data categorization.

This documentation outlines how users can create and integrate custom tagging models into Acadia, ensuring that they have the tools needed to tailor data tagging processes to their specific needs, with detailed explanations on types involved in the tagging process.