Reddit
Persona
Analytics.

ソーシャルNLP分析

End-to-end ETL pipeline scraping Reddit activity via PRAW API, processing 100+ posts per user and generating structured psychological persona reports using Groq LLM (Llama3-70B). Has garnered community interest with 1 external fork.

Year2025

RoleML Engineer

TypeNLP / ETL Pipeline

StatusPublic · 1 Fork

[01] Overview 概要

Turning Reddit signals into structured human psychology.

Applied multi-library NLP preprocessing (NLTK, TextBlob, VaderSentiment, SpaCy) for sentiment scoring, keyword extraction, and Big Five personality trait estimation with citation tracking. The pipeline is fully modular with separate scraper, processor, analyzer, and output modules.

An optional Streamlit UI enables non-technical stakeholders to explore results without code.

ETL pipeline scraping Reddit via PRAW API into PostgreSQL
VADER Sentiment + TextBlob for multi-dimensional emotion scoring
Big Five personality trait estimation with citation tracking
Groq LLM (Llama3-70B) for structured persona report generation
Streamlit dashboard for interactive persona visualization

Python Groq API PRAW NLTK SpaCy TextBlob PostgreSQL Streamlit

[ DATA EXTRACTION ] └── PRAW API ──► Scrape Posts + Comments │ ▼ [ ETL LAYER ] ├── Clean & Tokenize (NLTK / SpaCy) ├── Sentiment Score (VADER + TextBlob) ├── Keyword Extraction └── Store JSON → PostgreSQL │ ▼ [ NLP ANALYSIS ] ├── Big Five Personality Estimation ├── Citation Tracking per Trait └── Behavioral Pattern Recognition │ ▼ [ LLM REPORT GENERATION ] └── Groq API (Llama3-70B) ── Prompt Engineering ── Structured JSON Persona Output │ ▼ [ PRESENTATION ] └── Streamlit UI Dashboard (Interactive) ── Persona visualization ── No-code exploration

RedditPersonaAnalytics.

Turning Reddit signals into structured human psychology.

Reddit
Persona
Analytics.