Practical AI for Everyday Decision‑Making

Introduction

AI ​doesn’t have to be opaque, mysterious, over‑engineered, or powered by massive deep‑learning models to be effective. Some of the most useful systems are built by thoughtfully combining classical machine‑learning techniques with explicit, human‑readable rules. This edition implements a Python program called “Budget Categorizer & Spending Insights AI” as a strong and tangible example of this approach in action. The application takes everyday financial transactions and transforms them into clear, actionable insights, thus demonstrating how hybrid AI systems can support real decision‑making while remaining interpretable and maintainable.

The Problem: Data Without Insight

Most people generate dozens—sometimes hundreds—of financial transactions each month. Individually, these transactions are easy to understand. Collectively, they become noise. Patterns are hard to see, anomalies go unnoticed, and spending habits remain opaque.

The goal of this project was simple but meaningful:
turn raw transaction data into understandable categories, highlight unusual behavior, and provide concrete guidance a human can act on.

A Hybrid AI Approach That Works

 Instead of relying on deep learning, the system deliberately uses a hybrid AI architecture, blending machine learning with expert‑system logic:

Machine Learning

  • TF‑IDF converts merchant names and transaction descriptions into numerical features.
  • Logistic Regression predicts spending categories (e.g., Groceries, Dining, Utilities).
  • KMeans clustering uncovers latent spending patterns that can inform future enhancements.

Expert System Logic

  • Explicit merchant‑to‑category rules encode domain knowledge (e.g., “Starbucks” → Dining).
  • Threshold‑based rules flag anomalies, such as unusually high dining expenses.
  • The system generates human‑readable explanations, showing whether a decision came from a rule, a model, or both.

When both rules and model predictions are available, the system prioritizes symbolic knowledge—ensuring predictable and explainable outcomes.

Why the Architecture Matters?

 The real strength of this project isn’t any single algorithm—it’s how the pieces are orchestrated. The application follows a clear, goal‑driven execution flow:

    Load data → build features → cluster → classify → apply rules → generate insights → report

This mirrors concepts from symbolic planning, where each step transforms the system’s knowledge state and contributes to a coherent end goal. The result is an AI pipeline that is easy to reason about, debug, and extend.

Interpretability as a First-Class Feature

A key design choice was to favor interpretability over complexity.

For small, structured text data like merchant names and transaction descriptions, classical machine‑learning methods are:

  • Sufficiently accurate
  • Easier to explain
  • Less costly to maintain

By avoiding unnecessary deep learning, the system remains transparent. Users don’t just see what was flagged—they see why. This is critical for trust, especially in decision‑support systems.

Why This Matters for Interop?

From an interoperability and platform‑engineering perspective, this project reinforces an important lesson:

    Effective AI is about choosing the right techniques—not the most fashionable ones.

By combining symbolic reasoning with statistical learning, the system achieves:

  • Clear boundaries between logic and learned behavior
  • Explainable outcomes suitable for enterprise contexts
  • A modular design that can evolve without re‑architecting everything

This is a pattern worth repeating across internal tools, platforms, and AI‑enabled services.

Code Listing

# File: main.py
# Written by: Angel Hernandez
# This project is a budget categorizer & Spending insights AI application

import os
import sys
import subprocess
import textwrap

class DependencyChecker:
        @staticmethod
        def ensure_package(package_name):
               try:
                     __import__(package_name)
              except ImportError:
                       print(f"Installing missing package: {package_name}...")
                       subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
                       print(f"Package '{package_name}' was installed successfully.")

class SyntheticDataFactory:
        @staticmethod
        def get_data():
                return [
         ("2026-02-01", "Safeway", "Groceries and household items", 85.23, "Groceries"),
        ("2026-02-02", "Starbucks", "Coffee and snack", 7.50, "Dining"),
        ("2026-02-03", "Shell", "Gas station fuel", 52.10, "Transportation"),
        ("2026-02-04", "Netflix", "Monthly subscription", 15.99, "Entertainment"),
        ("2026-02-05", "Amazon", "Electronics purchase", 220.00, "Shopping"),
        ("2026-02-06", "PUD", "Electric utility bill", 130.45, "Utilities"),
        ("2026-02-07", "Safeway", "Groceries", 95.10, "Groceries"),
       ("2026-02-08", "Chipotle", "Lunch burrito", 13.25, "Dining"),
       ("2026-02-09", "Lyft", "Ride to airport", 48.75, "Transportation"),
       ("2026-02-10", "Disney+", "Streaming subscription", 9.99, "Entertainment"),
       ("2026-02-11", "Home Depot", "Home improvement supplies", 180.00, "Shopping"),
       ("2026-02-12", "City Water", "Water utility bill", 60.30, "Utilities"),
       ("2026-02-13", "Safeway", "Groceries and snacks", 72.40, "Groceries"),
       ("2026-02-14", "Local Bistro", "Dinner with family", 120.00, "Dining"),
       ("2026-02-15", "Shell", "Snacks and Drinks", 40.00, "Transportation"),
       ("2026-02-16", "Spotify", "Music subscription", 10.99, "Entertainment"),
       ("2026-02-17", "Target", "Clothing and toys", 150.00, "Shopping"),
       ("2026-02-18", "Gas & Power Co", "Gas utility bill", 90.00, "Utilities"),
       ("2026-02-19", "Fancy Steakhouse", "Anniversary dinner", 550.00, "Dining"),
       ("2026-02-20", "Seafood Restaurant", "Celebration dinner", 620.00, "Dining")
          ]

class RuleEngine:
       def __init__(self):
              self.__dining_anomaly_threshold = 500.0
              self.__merchant_category_map = {
      "safeway": "Groceries",
      "whole foods": "Groceries",
     "starbucks": "Dining",
     "chipotle": "Dining",
     "shell": "Transportation",
     "lyft": "Transportation",
     "uber": "Transportation",
     "netflix": "Entertainment",
     "disney+": "Entertainment",
     "spotify": "Entertainment",
     "pud": "Utilities",
     "city water": "Utilities",
     "gas & power co": "Utilities",
     "home depot": "Shopping",
     "target": "Shopping",
     "amazon": "Shopping",
            }

     def __infer_category_from_merchant(self, merchant: str):
            m = merchant.strip().lower()
           for key, cat in self.__merchant_category_map.items():
                  if key in m:
                        return cat
            return None

   def apply_rules(self, row, model_category: str | None):
          merchant = str(row.get("merchant", ""))
          amount = float(row.get("amount", 0.0))

         explanation_parts = []

        rule_cat = self.__infer_category_from_merchant(merchant)
        if rule_cat:
              explanation_parts.append(f"Rule: Merchant '{merchant}' implies category '{rule_cat}'.")

        anomaly_flag = False
        candidate_cat = rule_cat or model_category or "Uncategorized"

       if candidate_cat == "Dining" and amount > self.__dining_anomaly_threshold:
            anomaly_flag = True
           explanation_parts.append(f"Rule: Amount {amount:.2f} > {self.__dining_anomaly_threshold:.2f} → Flag as unusual.")

    final_cat = rule_cat or model_category or "Uncategorized"

    if model_category and not rule_cat:
           explanation_parts.append(f"Model: Logistic Regression predicted '{model_category}'.")

     explanation = " ".join(explanation_parts) if explanation_parts else "No rules fired."

     return {
         "rule_category": rule_cat,
         "final_category": final_cat,
         "is_anomaly": anomaly_flag,
         "explanation": explanation,
   }

class BudgetCategorizer:
        def __init__(self):
               import numpy as np
               import pandas as pd
               from sklearn.cluster import KMeans as kmeans
               from sklearn.feature_extraction.text import TfidfVectorizer as tfidvectorizer
               from sklearn.preprocessing import LabelEncoder as labelencoder
               from sklearn.linear_model import LogisticRegression as logisticregression
self.__np = np
self.__pd = pd
self.__kmeans = kmeans
self.__label_encoder = labelencoder
self.__tfidvectorizer = tfidvectorizer
self.__logistic_regression = logisticregression

      def __load_transactions(self, path: str | None = None):
            if path is not None and os.path.exists(path):
                 retval = self.__pd.read_csv(path)
                 retval.columns = [c.strip().lower() for c in retval.columns]
           else:
                  retval = self.__pd.DataFrame(SyntheticDataFactory.get_data(),
                                                                               columns=["date", "merchant", "description",
                                                                                                     "amount", "category"])
                  retval.columns = [c.strip().lower() for c in retval.columns]
           
          retval["merchant"] = retval["merchant"].fillna("")
          retval["description"] = retval["description"].fillna("")
          retval["amount"] = retval["amount"].fillna(0.0).astype(float)

          if "category" not in retval.columns:
                retval["category"] = self.__np.nan

          return retval

     def __build_text_features(self, df):
            texts = (df["merchant"].astype(str) + " " + df["description"].astype(str)).tolist()
            vectorizer = self.__tfidvectorizer(stop_words="english", max_features=500)
            x_text = vectorizer.fit_transform(texts)
            return vectorizer, x_text

     def __perform_clustering(self, x_text, n_clusters: int = 4):
           kmeans = self.__kmeans(n_clusters=n_clusters, random_state=42, n_init=10)
          cluster_labels = kmeans.fit_predict(x_text)
          return kmeans, cluster_labels

      def __train_classifier(self, df, x_text):
            labeled_mask = df["category"].notna()
            mask = labeled_mask.to_numpy()
            x_labeled = x_text[mask]
            y_labeled = df.loc[mask, "category"].astype(str)
           label_encoder = self.__label_encoder()
           y_encoded = label_encoder.fit_transform(y_labeled)
           clf = self.__logistic_regression(max_iter=300, class_weight="balanced")
           clf.fit(x_labeled, y_encoded)
return clf, label_encoder

    @staticmethod
    def __predict_categories(df, vectorizer, clf, label_encoder):
            texts = (df["merchant"].astype(str) + " " + df["description"].astype(str)).tolist()
            x_text = vectorizer.transform(texts)
            y_pred = clf.predict(x_text)
           return label_encoder.inverse_transform(y_pred)
 
   @staticmethod
   def __heuristic_select_category(rule_cat, model_cat):
           return rule_cat or model_cat or "Uncategorized"

@staticmethod
def __generate_insights(df):
       retval = {}
      recommendations = []
      cat_totals = df.groupby("final_category")["amount"].sum().sort_values(ascending=False)
      retval["category_totals"] = cat_totals
      anomalies = df[df["is_anomaly"] == True]
      retval["anomalies"] = anomalies
      total_spend = df["amount"].sum()

     for cat, total in cat_totals.items():
            share = total / total_spend if total_spend > 0 else 0.0
           if share > 0.30 and cat not in ["Utilities"]:
                plan_steps = [f"Review recurring subscriptions or habits in '{cat}'.",
                                           f"Set a monthly budget cap for '{cat}' at {0.8 * total:.2f}.",
                                           f"Track '{cat}' transactions weekly and pause non-essential purchases."]
                recommendations.append({"category": cat, "share": share, "plan_steps": plan_steps})
      retval["recommendations"] = recommendations
      return retval

@staticmethod
def __print_report(df, insights,records_to_show):
      wrapper = textwrap.TextWrapper(width=80)
      print("\n================ Budget Categorizer & Spending Insights Report ================\n")
      print("Sample of categorized transactions:")
     df["amount"] = df["amount"].apply(lambda x: f"${x:,.2f}")
     print(df[["date", "merchant", "description", "amount", "final_category", "is_anomaly"]].head(records_to_show))
     print()
    print("Category totals:")
    for cat, total in insights["category_totals"].items():
           print(f" - {cat}: ${total:.2f}")
    print()

    if len(insights["anomalies"]) > 0:
         print("Unusual expenses flagged:")
         for _, row in insights["anomalies"].iterrows():
               print(
                      f" - {row['date']} | {row['merchant']} | {row['description']} | "
                      f"${row['amount']:.2f} | Category: {row['final_category']}"
               )
         print()
   else:
          print("No anomalies detected.\n")

    if insights["recommendations"]:
         print("High-impact categories and suggested budgeting plans:")
         for rec in insights["recommendations"]:
                print(f"\nCategory '{rec['category']}' accounts for {rec['share'] * 100:.1f}% of spending.")
                print("\nSuggested plan:\n")
                for step in rec["plan_steps"]:
                      for line in wrapper.wrap(f"- {step}"):
                             print(" " + line)
           print()
    else:
            print("No categories exceed the planning threshold.\n")
            print("=========================================================================\n")

  def run_orchestration(self):
          explanations = []
    anomaly_flags = []
         final_categories = []
         csv_path = sys.argv[1] if len(sys.argv) > 1 else None
         records_to_show = input(f"Enter the number of records to show (default is 20): ").strip()
         records_to_show = int(records_to_show) if records_to_show else 20

         print("Loading transactions...")
        df = self.__load_transactions(csv_path)

       print("Building text features (TF-IDF)...")
       vectorizer, x_text = self.__build_text_features(df)

      print("Running KMeans clustering...")
      kmeans, cluster_labels = self.__perform_clustering(x_text, n_clusters=4)
      df["cluster"] = cluster_labels

     print("Training Logistic Regression classifier...")
     clf, label_encoder = self.__train_classifier(df, x_text)

     print("Predicting categories...")
     model_categories = self.__predict_categories(df, vectorizer, clf, label_encoder)
     df["model_category"] = model_categories

     print("Applying rule-based expert system...")
     rule_engine = RuleEngine()

    for _, row in df.iterrows():
         model_cat = row["model_category"]
        rule_result = rule_engine.apply_rules(row, model_cat)
        final_cat = self.__heuristic_select_category(rule_result["rule_category"], model_cat)
        final_categories.append(final_cat)
        anomaly_flags.append(rule_result["is_anomaly"])
        explanations.append(rule_result["explanation"])

    df["final_category"] = final_categories
    df["is_anomaly"] = anomaly_flags
    df["explanation"] = explanations

    print("Generating insights...")
    insights = self.__generate_insights(df)
    self.__print_report(df, insights, records_to_show)

class TestCaseRunner:
       @staticmethod
       def run_test():
              budget_categorizer = BudgetCategorizer()
              budget_categorizer.run_orchestration()

def clear_screen():
      command = 'cls' if os.name == 'nt' else 'clear'
      os.system(command)

def main():
        try:
              dependencies = ['numpy', 'pandas', 'scikit-learn']
              for d in dependencies: DependencyChecker.ensure_package(d)
              clear_screen()
              print('*** Budget Categorizer & Spending Insights AI Application ***\n')
              TestCaseRunner.run_test()
       except Exception as e:
              print(e)

if __name__ == '__main__': main()

Program’s Output

Conclusion

Not every problem needs deep learning. Many problems need clarity. Hybrid AI systems—where rules, models, and structured workflows coexist—remain highly relevant. They offer a pragmatic path to building AI that people can understand, trust, and actually use. That’s not just good AI engineering. It’s good system design.

References

  • GeeksforGeeks. (2025, July 23). Natural language processing (NLP) pipeline. Retrieved from
    https://www.geeksforgeeks.org/nlp/natural-language-processing-nlp-pipeline
  • Houstis, E. N., & Rice, J. R. (1992). Artificial intelligence, expert systems & symbolic computing.
    North Holland. Retrieved from https://perlego.com/book/1885131/artificial-intelligence-expert-
    systems-symbolic-computing-pdf
  • Saxena, P. (2024). Ultimate machine learning with Scikit-Learn: Unleash the power of Scikit-Learn and
    Python to build cutting-edge predictive modeling applications and unlock deeper insights into
    machine learning. Orange Education Pvt. Ltd. Retrieved from
    https://perlego.com/book/4424696/ultimate-machine-learning-with-scikitlearn-unleash-the-
    power-of-scikitlearn-and-python-to-build-cuttingedge-predictive-modeling-applications-and-
    unlock-deeper-insights-into-machine-learning-english-edition-pdf
  • Sharda, R., Delen, D., & Turban, E. (2023). Business intelligence, analytics, data science, and AI (5th
    ed.). Pearson.

Leave a Reply

Your email address will not be published. Required fields are marked *