Topic Modeling
Creating Topic Trees

Creating Topic Trees

Overview

In Acadia, topic trees serve as a powerful tool to organize and categorize data within datasets through hierarchical relationships between topics. These trees are structured as nested dictionaries, representing topics and their subtopics. This setup allows for a modular and detailed breakdown of data, enhancing both navigation and analysis capabilities.

Defining a Topic Tree Dictionary

A topic tree dictionary is a structured representation of topics and their relationships in a hierarchical format.

Format of a Topic Tree Dictionary

Each topic in the dictionary is defined with several key attributes:

  • name: The name of the topic, serving as a unique identifier.
  • description: A brief explanation of what the topic covers.
  • children: An array of dictionaries, each representing a subtopic or child of the current topic.

Example of a Topic Tree Dictionary

Here's a basic example of a topic tree dictionary with two root topics, each having their own subtopics:

topic_tree_example = [
    {
        "name": "Topic 1",
        "description": "This is the first topic",
        "children": [
            {
                "name": "Subtopic 1",
                "description": "This is the first subtopic",
            },
            {
                "name": "Subtopic 2",
                "description": "This is the second subtopic",
            },
        ],
    },
    {
        "name": "Topic 2",
        "description": "This is the second topic",
        "children": [
            {
                "name": "Subtopic 3",
                "description": "This is the third subtopic",
            },
            {
                "name": "Subtopic 4",
                "description": "This is the fourth subtopic",
            },
        ],
    }
]

This structure delineates a clear hierarchical organization from general topics to more specific subtopics.

Creating and Utilizing Topic Trees

Users have two primary methods to establish topic trees within Acadia:

  1. Manually Defining Topic Trees: Users can directly write their own topic tree dictionaries, structuring them according to the needs of their dataset and analysis. This manual method provides complete control over the categorization and hierarchical structure of the data.

  2. Using a Topic Generator: For dynamic topic tree creation, users can employ a TopicGeneratorModel. This model analyzes the dataset and automatically generates a list of root topics based on the data's inherent characteristics and relationships.

Loading and Using Topic Tree Dictionaries

  • load_example_root_topics(): This function loads a set of predefined root topics, demonstrating the structure and utility of topic trees.
  • generate_topics(dataset, topic_generator_model): Generates a list of root topic trees dynamically from the dataset using the specified model, facilitating automatic and data-driven topic creation.

Adding Topic Trees to Datasets

Once defined or generated, topic trees are integrated into datasets as root topics using the add_root_topics function. Each entry in the list passed to this function represents a separate root node in the topic tree, potentially branching into its own set of subtopics.

# Adding manually defined or generated topic trees to a dataset
list_of_example_root_topics = acadia.topic.load_example_root_topics()
topic_tree = acadia.topic.add_root_topics(dataset, list_of_example_root_topics)

By incorporating topic trees, users can significantly enhance their ability to manage, analyze, and derive insights from complex datasets. Topic trees help in segmenting data into manageable categories, supporting targeted analysis and efficient data wrangling, especially beneficial in datasets with diverse and extensive content.