Turning Reddit signals into structured human psychology.
Applied multi-library NLP preprocessing (NLTK, TextBlob, VaderSentiment, SpaCy) for sentiment scoring, keyword extraction, and Big Five personality trait estimation with citation tracking. The pipeline is fully modular with separate scraper, processor, analyzer, and output modules.
An optional Streamlit UI enables non-technical stakeholders to explore results without code.
- ETL pipeline scraping Reddit via PRAW API into PostgreSQL
- VADER Sentiment + TextBlob for multi-dimensional emotion scoring
- Big Five personality trait estimation with citation tracking
- Groq LLM (Llama3-70B) for structured persona report generation
- Streamlit dashboard for interactive persona visualization