[Python/AI] Hugging Face 감정 분석 웹 애플리케이션 코드 분석

LLM(Open AI)

[Python/AI] Hugging Face 감정 분석 웹 애플리케이션 코드 분석 - 2편

code2772 2024. 11. 17. 14:11

728x90

0.전체 코드

https://hunseop2772.tistory.com/373

[Python/AI] Hugging Face와 Streamlit으로 구현하는 감정 분석 웹 애플리케이션 -1편 코드

개요이번 포스팅에서는 Hugging Face의 Transformers 라이브러리와 Streamlit을 활용하여 감정 분석 웹 애플리케이션을 구현하는 방법을 다룹니다. 프로덕션 레벨의 코드 구조와 실제 구현 방법에 중점을

hunseop2772.tistory.com

1. 프로젝트 아키텍처 분석

1.1 전체 구조

sentiment_analysis/
├── src/
│   ├── analyzer.py     # 핵심 감정 분석 로직
│   ├── utils.py       # 유틸리티 함수
│   └── app.py         # Streamlit 웹 인터페이스
└── requirements.txt    # 의존성 관리

이 구조는 관심사 분리(Separation of Concerns) 원칙을 따릅니다:

analyzer.py: 핵심 비즈니스 로직
utils.py: 재사용 가능한 유틸리티 함수
app.py: 프레젠테이션 레이어

2. analyzer.py 코드 분석

2.1 SentimentAnalyzer 클래스

class SentimentAnalyzer:
    def __init__(self):
        self.model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(
            self.model_name
        )

핵심 포인트:

모델 초기화
- AutoTokenizer: 텍스트를 모델이 이해할 수 있는 형태로 변환
- AutoModelForSequenceClassification: 사전 학습된 BERT 모델 로드
GPU 활용

self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model = self.model.to(self.device)

GPU 사용 가능 시 자동 감지
to(self.device): 모델을 GPU 메모리로 이동

2.2 텍스트 분석 메서드

def analyze(self, text: str) -> Dict[str, Any]:
    try:
        inputs = self.tokenizer(
            text,
            return_tensors="pt",
            truncation=True,
            max_length=512
        )

분석 프로세스:

토큰화
- 텍스트를 토큰으로 분할
- truncation=True: 최대 길이 초과 시 자동 절삭
- max_length=512: BERT 모델의 최대 입력 길이
예측 수행

with torch.no_grad():
    outputs = self.model(**inputs)
    scores = torch.softmax(outputs.logits, dim=1)

torch.no_grad(): 추론 시 메모리 최적화
softmax: 출력을 확률 분포로 변환

결과 처리

prediction = torch.argmax(scores).item() + 1
confidence = torch.max(scores).item()

argmax: 가장 높은 확률의 클래스 선택
item(): 텐서를 Python 스칼라로 변환

3. utils.py 분석

3.1 데이터 처리

def prepare_dataframe(texts: List[str], results: List[Dict[str, Any]]) -> pd.DataFrame:
    return pd.DataFrame({
        '텍스트': texts,
        '감정': [r['label'] for r in results],
        '확신도': [f"{r['score']:.2%}" for r in results]
    })

주요 기능:

분석 결과를 pandas DataFrame으로 변환
리스트 컴프리헨션을 사용한 효율적인 데이터 처리
포맷팅을 통한 가독성 있는 출력

3.2 시각화

def create_sentiment_chart(score: float) -> Figure:
    fig = go.Figure()
    fig.add_trace(
        go.Bar(
            x=['부정', '중립', '긍정'],
            y=[0.3, 0.3, score],
            marker_color=['#FF6B6B', '#4ECDC4', '#45B7D1']
        )
    )

Plotly 차트 특징:

인터랙티브한 시각화
커스텀 색상 테마
반응형 레이아웃

4. app.py 분석

4.1 세션 관리

def init_session_state() -> None:
    if 'analyzer' not in st.session_state:
        st.session_state.analyzer = SentimentAnalyzer()

세션 상태 관리:

모델 인스턴스 재사용
메모리 효율성 개선
상태 지속성 보장

4.2 멀티페이지 인터페이스

analysis_type = st.sidebar.radio(
    "분석 유형 선택",
    ["단일 텍스트", "다중 텍스트", "파일 업로드"]
)

인터페이스 특징:

사이드바를 통한 네비게이션
라디오 버튼으로 직관적 UI
유연한 입력 방식

4.3 비동기 처리

with st.spinner("분석 중..."):
    result = st.session_state.analyzer.analyze(text)

처리 특징:

로딩 상태 표시
사용자 경험 개선
오류 처리 포함

5. 성능 최적화 포인트

5.1 배치 처리

def batch_analyze(self, texts: List[str], batch_size: int = 16):
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]

최적화 전략:

메모리 효율적 처리
GPU 활용 최적화
처리 시간 단축

5.2 캐싱 구현

@st.cache_data
def load_model():
    return SentimentAnalyzer()

캐시 특징:

모델 로딩 시간 단축
리소스 사용 최적화
응답 시간 개선

6. 에러 처리

6.1 예외 처리

try:
    result = analyzer.analyze(text)
except Exception as e:
    st.error(f"분석 중 오류 발생: {str(e)}")

주요 처리:

모델 로딩 오류
입력 데이터 오류
메모리 부족 오류

7. 성능 측정

7.1 처리 시간 측정

@measure_time
def analyze_batch(texts):
    return analyzer.batch_analyze(texts)

모니터링 항목:

처리 시간
메모리 사용량
GPU 활용율

8. 개선 가능한 부분

8.1 코드 최적화

모델 최적화
- 모델 양자화
- TorchScript 변환
- ONNX 변환
메모리 관리

def clean_memory():
    import gc
    gc.collect()
    torch.cuda.empty_cache()

병렬 처리

from concurrent.futures import ThreadPoolExecutor

def parallel_process(texts):
    with ThreadPoolExecutor() as executor:
        results = list(executor.map(analyze_single, texts))

9. 테스트 케이스

9.1 단위 테스트


def test_sentiment_analyzer():
    analyzer = SentimentAnalyzer()
    result = analyzer.analyze("테스트 텍스트")
    assert 'sentiment' in result
    assert 'confidence' in result

결론

이 프로젝트는 다음과 같은 특징을 가진 견고한 AI 웹 애플리케이션입니다:

모듈화된 설계
- 명확한 관심사 분리
- 재사용 가능한 컴포넌트
- 확장 가능한 구조
최적화된 성능
- GPU 활용
- 배치 처리
- 캐싱 전략
사용자 친화적 UI
- 직관적인 인터페이스
- 실시간 피드백
- 다양한 입력 방식

저작자표시 비영리 변경금지

'LLM(Open AI)' 카테고리의 다른 글

LLM의 중추, 트랜스포머 아키텍처 (2)	2025.04.04
LLM의 기초 뼈대 세우기 (2)	2025.04.01
[Python/AI] Hugging Face와 Streamlit으로 구현하는 감정 분석 웹 애플리케이션 -1편 코드 (2)	2024.11.16
Hugging Face 회원가입, 토큰 발급, APIKEY 발급 방법, 개발 환경 설정 (22)	2024.11.15
NLP와 Hugging Face 란 어원 등 사용법 (17)	2024.11.14

현재글[Python/AI] Hugging Face 감정 분석 웹 애플리케이션 코드 분석 - 2편

저장소

복습용 블로그

자바스크립트, html, 자바, 기본, java, Spring, 오블완, 리눅스, 자료구조, ChatGPT, 국비지원, javascript, 백준, 프로젝트, 알고리즘, 파이썬, 코딩테스트, 티스토리챌린지, CSS, jsp,

Today :
Yesterday :

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30