Java 멀티스레딩과 동시성 프로그래밍 완벽 가이드

개요

Java 백엔드 개발자라면 멀티스레딩과 동시성 프로그래밍은 반드시 깊이 이해해야 하는 핵심 주제이다. 웹 서버는 본질적으로 수많은 클라이언트 요청을 동시에 처리해야 하며, 이 과정에서 스레드 안전성(Thread Safety)을 보장하지 못하면 데이터 손실, 교착 상태, 성능 저하 등 심각한 문제가 발생할 수 있다. 이번 글에서는 Java의 멀티스레딩 메커니즘을 기초부터 고급 주제까지 체계적으로 정리하고, 실무에서 바로 적용할 수 있는 패턴과 모범 사례를 함께 다룬다.

1. 프로세스와 스레드의 차이

운영체제에서 프로세스(Process) 는 실행 중인 프로그램의 인스턴스로, 독립된 메모리 공간(코드, 데이터, 힙, 스택)을 가진다. 각 프로세스는 서로의 메모리에 직접 접근할 수 없으며, 프로세스 간 통신(IPC)을 위해서는 파이프, 소켓, 공유 메모리 등 별도의 메커니즘이 필요하다.

반면 스레드(Thread) 는 프로세스 내에서 실행되는 가장 작은 실행 단위이다. 같은 프로세스에 속한 스레드들은 힙 메모리와 코드 영역을 공유하지만, 각자 고유한 스택과 프로그램 카운터(PC)를 가진다. 이러한 메모리 공유 특성 덕분에 스레드 간 데이터 교환이 프로세스 간 통신보다 훨씬 빠르고 효율적이지만, 동시에 공유 자원에 대한 동기화 문제가 발생하는 근본적인 원인이 되기도 한다.

Java에서는 JVM이 시작될 때 main 스레드가 자동으로 생성되며, 이 메인 스레드에서 추가적인 스레드를 생성하여 병렬 처리를 수행할 수 있다. JVM 자체도 가비지 컬렉터 스레드, JIT 컴파일러 스레드 등 여러 백그라운드 스레드를 내부적으로 운용한다.

구분	프로세스	스레드
메모리	독립적인 메모리 공간	힙 메모리 공유, 스택은 독립
생성 비용	높음	낮음
컨텍스트 스위칭	비용이 큼	상대적으로 비용이 적음
통신	IPC 필요	공유 메모리로 직접 통신 가능
안정성	하나가 죽어도 다른 프로세스에 영향 없음	하나가 죽으면 프로세스 전체에 영향

2. Thread 클래스와 Runnable 인터페이스

Java에서 스레드를 생성하는 방법은 크게 두 가지이다.

2.1 Thread 클래스 상속

public class MyThread extends Thread {
    @Override
    public void run() {
        for (int i = 0; i < 5; i++) {
            System.out.println(Thread.currentThread().getName() + " - 카운트: " + i);
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                System.out.println("스레드가 인터럽트되었습니다.");
            }
        }
    }

    public static void main(String[] args) {
        MyThread thread1 = new MyThread();
        MyThread thread2 = new MyThread();
        thread1.setName("작업자-1");
        thread2.setName("작업자-2");
        thread1.start(); // run()이 아닌 start()를 호출해야 새 스레드에서 실행
        thread2.start();
    }
}

start() 메서드를 호출하면 JVM이 새로운 스레드를 생성하고, 그 스레드에서 run() 메서드를 실행한다. 만약 run()을 직접 호출하면 현재 스레드에서 일반 메서드처럼 실행되므로 멀티스레딩 효과가 없다는 점에 주의해야 한다.

2.2 Runnable 인터페이스 구현

public class MyRunnable implements Runnable {
    private final String taskName;

    public MyRunnable(String taskName) {
        this.taskName = taskName;
    }

    @Override
    public void run() {
        for (int i = 0; i < 5; i++) {
            System.out.println(taskName + " 실행 중 - " + i);
        }
    }

    public static void main(String[] args) {
        Thread thread1 = new Thread(new MyRunnable("작업A"));
        Thread thread2 = new Thread(new MyRunnable("작업B"));
        thread1.start();
        thread2.start();

        // 람다식으로 더 간결하게 작성 가능
        Thread thread3 = new Thread(() -> {
            System.out.println("람다로 생성된 스레드: " + Thread.currentThread().getName());
        });
        thread3.start();
    }
}

실무에서는 Runnable 인터페이스 구현 방식이 권장된다. Java는 단일 상속만 지원하므로 Thread를 상속하면 다른 클래스를 상속받을 수 없기 때문이다. 또한 Runnable은 실행 로직과 스레드 메커니즘을 분리하여 더 유연한 설계를 가능하게 하며, ExecutorService와 같은 고수준 API와도 자연스럽게 통합된다.

2.3 Callable 인터페이스

Runnable은 반환값이 없고 체크 예외를 던질 수 없다는 한계가 있다. 이를 보완하기 위해 Callable<V> 인터페이스가 도입되었다.

import java.util.concurrent.*;

public class CallableExample {
    public static void main(String[] args) throws ExecutionException, InterruptedException {
        Callable<Integer> sumTask = () -> {
            int sum = 0;
            for (int i = 1; i <= 100; i++) {
                sum += i;
            }
            return sum;
        };

        ExecutorService executor = Executors.newSingleThreadExecutor();
        Future<Integer> future = executor.submit(sumTask);
        System.out.println("1부터 100까지의 합: " + future.get()); // 5050
        executor.shutdown();
    }
}

3. 스레드 생명주기

Java 스레드는 Thread.State 열거형으로 정의된 6가지 상태를 가진다. 각 상태와 전이 조건을 정확히 이해하는 것이 동시성 프로그래밍의 기초이다.

NEW: 스레드 객체가 생성되었지만 start()가 아직 호출되지 않은 상태이다.
RUNNABLE: start()가 호출되어 실행 가능한 상태이다. 실제로 CPU에서 실행 중이거나, 운영체제의 스케줄러에 의해 실행 대기 중인 상태를 모두 포함한다.
BLOCKED: 모니터 락을 얻기 위해 대기 중인 상태이다. synchronized 블록에 진입하려 하지만 다른 스레드가 이미 락을 보유하고 있을 때 이 상태가 된다.
WAITING: 다른 스레드의 특정 동작을 무기한 대기하는 상태이다. Object.wait(), Thread.join(), LockSupport.park() 호출 시 진입한다.
TIMED_WAITING: 지정된 시간 동안 대기하는 상태이다. Thread.sleep(millis), Object.wait(timeout), Thread.join(timeout) 등을 호출하면 이 상태가 된다.
TERMINATED: run() 메서드의 실행이 완료되었거나 예외가 발생하여 스레드가 종료된 상태이다.

public class ThreadLifecycleDemo {
    public static void main(String[] args) throws InterruptedException {
        Object lock = new Object();

        Thread thread = new Thread(() -> {
            synchronized (lock) {
                try {
                    lock.wait(); // WAITING 상태로 전환
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }
        });

        System.out.println("생성 직후 상태: " + thread.getState()); // NEW

        thread.start();
        Thread.sleep(50);
        System.out.println("wait() 호출 후 상태: " + thread.getState()); // WAITING

        synchronized (lock) {
            lock.notify(); // 스레드를 깨움
        }

        thread.join();
        System.out.println("종료 후 상태: " + thread.getState()); // TERMINATED
    }
}

스레드 상태 전이를 잘 이해하면 스레드 덤프(Thread Dump)를 분석하여 교착 상태나 성능 병목을 진단하는 데 큰 도움이 된다. 운영 환경에서 jstack 도구를 사용하여 현재 모든 스레드의 상태와 스택 트레이스를 확인할 수 있다.

4. synchronized 키워드와 모니터 락

synchronized는 Java에서 가장 기본적인 동기화 메커니즘으로, 임계 영역(Critical Section)에 대한 상호 배제(Mutual Exclusion)를 보장한다. 모든 Java 객체는 내부적으로 모니터(Monitor)를 가지고 있으며, synchronized는 이 모니터를 이용하여 락을 획득하고 해제한다.

4.1 메서드 수준 동기화

public class BankAccount {
    private int balance;

    public BankAccount(int initialBalance) {
        this.balance = initialBalance;
    }

    // 인스턴스 메서드 동기화 - this 객체의 모니터 락 사용
    public synchronized void deposit(int amount) {
        balance += amount;
        System.out.println(Thread.currentThread().getName() + " 입금: " + amount + ", 잔액: " + balance);
    }

    public synchronized void withdraw(int amount) {
        if (balance >= amount) {
            balance -= amount;
            System.out.println(Thread.currentThread().getName() + " 출금: " + amount + ", 잔액: " + balance);
        } else {
            System.out.println(Thread.currentThread().getName() + " 출금 실패: 잔액 부족");
        }
    }

    public synchronized int getBalance() {
        return balance;
    }
}

4.2 블록 수준 동기화

메서드 전체를 동기화하면 불필요하게 넓은 범위에 락이 걸려 성능이 저하될 수 있다. 블록 수준 동기화를 사용하면 임계 영역의 범위를 최소화할 수 있다.

public class FineGrainedSync {
    private final Object depositLock = new Object();
    private final Object withdrawLock = new Object();
    private int balance;

    public void deposit(int amount) {
        // 입금 전처리 로직 (동기화 불필요)
        validateAmount(amount);

        synchronized (depositLock) {
            balance += amount; // 임계 영역만 동기화
        }

        // 입금 후처리 로직 (동기화 불필요)
        logTransaction("deposit", amount);
    }

    public void complexOperation() {
        // 여러 락을 순서대로 획득 (데드락 방지를 위해 항상 같은 순서로)
        synchronized (depositLock) {
            synchronized (withdrawLock) {
                // 두 락이 모두 필요한 작업 수행
            }
        }
    }

    private void validateAmount(int amount) {
        if (amount <= 0) throw new IllegalArgumentException("금액은 양수여야 합니다.");
    }

    private void logTransaction(String type, int amount) {
        System.out.println(type + ": " + amount);
    }
}

4.3 static synchronized

static synchronized 메서드는 클래스 레벨의 락을 사용한다. 즉, 해당 클래스의 Class 객체를 모니터로 사용하므로 해당 클래스의 모든 인스턴스에 대해 동기화가 적용된다.

public class Counter {
    private static int count = 0;

    // 클래스 레벨 락 - Counter.class 객체의 모니터 사용
    public static synchronized void increment() {
        count++;
    }

    // 동일한 효과
    public static void incrementBlock() {
        synchronized (Counter.class) {
            count++;
        }
    }
}

synchronized는 재진입 가능(Reentrant)하다는 점도 중요하다. 같은 스레드가 이미 획득한 락을 다시 요청하면 즉시 획득할 수 있으므로, 동기화된 메서드에서 같은 객체의 다른 동기화된 메서드를 호출하는 것이 가능하다.

5. volatile 키워드

volatile 키워드는 변수의 가시성(Visibility)을 보장하는 메커니즘이다. 멀티스레드 환경에서 각 스레드는 성능 향상을 위해 메인 메모리의 값을 자신의 CPU 캐시에 복사하여 사용하는데, 이로 인해 한 스레드가 변경한 값을 다른 스레드가 즉시 볼 수 없는 문제가 발생할 수 있다.

volatile로 선언된 변수는 항상 메인 메모리에서 직접 읽고 쓰도록 보장되며, 컴파일러와 JVM의 명령어 재정렬(Instruction Reordering) 최적화도 방지한다.

public class VolatileExample {
    // volatile이 없으면 무한 루프에 빠질 수 있음
    private volatile boolean running = true;

    public void start() {
        new Thread(() -> {
            int count = 0;
            while (running) {
                count++;
            }
            System.out.println("종료됨. 반복 횟수: " + count);
        }).start();
    }

    public void stop() {
        running = false; // 다른 스레드에서 이 변경을 즉시 볼 수 있음
    }

    public static void main(String[] args) throws InterruptedException {
        VolatileExample example = new VolatileExample();
        example.start();
        Thread.sleep(1000);
        example.stop();
    }
}

그러나 volatile은 원자성(Atomicity)을 보장하지 않는다는 점에 주의해야 한다. 예를 들어 volatile int count에 대해 count++ 연산은 읽기-수정-쓰기의 세 단계로 이루어지므로 스레드 안전하지 않다. 원자적 증가가 필요한 경우에는 AtomicInteger나 synchronized를 사용해야 한다.

volatile은 다음과 같은 상황에서 적합하다:

플래그 변수로 사용할 때 (위 예제처럼 하나의 스레드가 쓰고 다른 스레드가 읽기만 하는 경우)
Double-Checked Locking 패턴에서 인스턴스 변수에 적용할 때
변수에 대한 쓰기가 현재 값에 의존하지 않는 경우

6. wait(), notify(), notifyAll()

이 세 메서드는 Object 클래스에 정의되어 있으며, 스레드 간의 협력적 통신을 위해 사용된다. 반드시 synchronized 블록 안에서만 호출할 수 있으며, 호출하는 객체의 모니터 락을 보유한 상태에서만 동작한다.

wait(): 현재 스레드를 대기 상태로 만들고 모니터 락을 해제한다.
notify(): 대기 중인 스레드 중 하나를 깨운다 (어떤 스레드가 깨어날지는 보장되지 않음).
notifyAll(): 대기 중인 모든 스레드를 깨운다.

생산자-소비자 패턴 구현

import java.util.LinkedList;
import java.util.Queue;

public class ProducerConsumer {
    private final Queue<Integer> buffer = new LinkedList<>();
    private final int capacity;

    public ProducerConsumer(int capacity) {
        this.capacity = capacity;
    }

    public void produce(int item) throws InterruptedException {
        synchronized (buffer) {
            // while 루프 사용 (spurious wakeup 방지)
            while (buffer.size() == capacity) {
                System.out.println("버퍼가 가득 참. 생산자 대기 중...");
                buffer.wait();
            }
            buffer.offer(item);
            System.out.println("생산: " + item + ", 버퍼 크기: " + buffer.size());
            buffer.notifyAll(); // 소비자 스레드를 깨움
        }
    }

    public int consume() throws InterruptedException {
        synchronized (buffer) {
            while (buffer.isEmpty()) {
                System.out.println("버퍼가 비어 있음. 소비자 대기 중...");
                buffer.wait();
            }
            int item = buffer.poll();
            System.out.println("소비: " + item + ", 버퍼 크기: " + buffer.size());
            buffer.notifyAll(); // 생산자 스레드를 깨움
            return item;
        }
    }

    public static void main(String[] args) {
        ProducerConsumer pc = new ProducerConsumer(5);

        Thread producer = new Thread(() -> {
            try {
                for (int i = 0; i < 20; i++) {
                    pc.produce(i);
                    Thread.sleep(50);
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }, "생산자");

        Thread consumer = new Thread(() -> {
            try {
                for (int i = 0; i < 20; i++) {
                    pc.consume();
                    Thread.sleep(100);
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }, "소비자");

        producer.start();
        consumer.start();
    }
}

wait()를 if가 아닌 while 루프 안에서 호출하는 것은 매우 중요한 관행이다. Java에서는 허위 깨우기(Spurious Wakeup) 가 발생할 수 있으므로, 깨어난 후에도 조건을 다시 확인해야 한다. 이는 Java 공식 문서에서도 명시적으로 권장하는 패턴이다.

7. java.util.concurrent 패키지

Java 5에서 도입된 java.util.concurrent 패키지는 저수준의 synchronized/wait/notify보다 훨씬 강력하고 사용하기 쉬운 고수준 동시성 유틸리티를 제공한다. 현대 Java 개발에서는 이 패키지의 도구들을 적극 활용하는 것이 바람직하다.

7.1 ExecutorService

스레드를 직접 생성하고 관리하는 대신, 스레드 풀(Thread Pool)을 사용하면 스레드 생성 비용을 절감하고 자원을 효율적으로 관리할 수 있다.

import java.util.concurrent.*;
import java.util.List;
import java.util.ArrayList;

public class ExecutorServiceExample {
    public static void main(String[] args) throws Exception {
        // 고정 크기 스레드 풀 생성
        ExecutorService executor = Executors.newFixedThreadPool(4);

        // Runnable 작업 제출
        executor.execute(() -> {
            System.out.println("Runnable 실행: " + Thread.currentThread().getName());
        });

        // Callable 작업 제출 및 결과 수신
        Future<String> future = executor.submit(() -> {
            Thread.sleep(1000);
            return "작업 완료";
        });
        System.out.println("결과: " + future.get()); // 블로킹 호출

        // 여러 작업을 한번에 제출
        List<Callable<Integer>> tasks = new ArrayList<>();
        for (int i = 0; i < 10; i++) {
            final int taskId = i;
            tasks.add(() -> {
                Thread.sleep(100);
                return taskId * taskId;
            });
        }
        List<Future<Integer>> futures = executor.invokeAll(tasks);
        for (Future<Integer> f : futures) {
            System.out.println("결과: " + f.get());
        }

        // 반드시 shutdown 호출
        executor.shutdown();
        if (!executor.awaitTermination(60, TimeUnit.SECONDS)) {
            executor.shutdownNow();
        }
    }
}

Executors 팩토리 클래스가 제공하는 주요 스레드 풀은 다음과 같다:

newFixedThreadPool(n): 고정 크기의 스레드 풀. 작업이 많아져도 스레드 수가 n개를 초과하지 않으며, 초과 작업은 큐에 대기한다.
newCachedThreadPool(): 필요에 따라 스레드를 생성하고, 유휴 스레드가 있으면 재사용한다. 60초간 사용되지 않은 스레드는 제거된다.
newSingleThreadExecutor(): 단일 스레드로 작업을 순차적으로 처리한다.
newScheduledThreadPool(n): 지정된 시간 이후 또는 주기적으로 작업을 실행할 수 있는 스레드 풀이다.

실무에서는 Executors 팩토리 메서드보다 ThreadPoolExecutor를 직접 생성하여 세밀하게 설정하는 것이 권장된다. newCachedThreadPool()은 스레드 수에 상한이 없고, newFixedThreadPool()은 작업 큐의 크기에 상한이 없어 메모리 문제를 유발할 수 있기 때문이다.

7.2 CompletableFuture

CompletableFuture는 Java 8에서 도입된 비동기 프로그래밍의 핵심 도구이다. 콜백 기반의 비동기 처리, 작업 조합, 예외 처리 등을 유연하게 지원한다.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class CompletableFutureExample {
    private static final ExecutorService executor = Executors.newFixedThreadPool(4);

    public static void main(String[] args) {
        // 비동기 작업 생성 및 체이닝
        CompletableFuture<String> result = CompletableFuture
            .supplyAsync(() -> {
                System.out.println("1단계: 데이터 조회 - " + Thread.currentThread().getName());
                return fetchDataFromDB();
            }, executor)
            .thenApplyAsync(data -> {
                System.out.println("2단계: 데이터 변환 - " + Thread.currentThread().getName());
                return transformData(data);
            }, executor)
            .thenApplyAsync(transformed -> {
                System.out.println("3단계: 결과 포맷팅 - " + Thread.currentThread().getName());
                return formatResult(transformed);
            }, executor)
            .exceptionally(ex -> {
                System.err.println("에러 발생: " + ex.getMessage());
                return "기본값";
            });

        // 두 개의 비동기 작업을 조합
        CompletableFuture<String> userFuture = CompletableFuture.supplyAsync(() -> "사용자 정보");
        CompletableFuture<String> orderFuture = CompletableFuture.supplyAsync(() -> "주문 정보");

        CompletableFuture<String> combined = userFuture.thenCombine(orderFuture,
            (user, order) -> user + " + " + order);

        System.out.println("조합 결과: " + combined.join());

        // 여러 작업이 모두 완료될 때까지 대기
        CompletableFuture<Void> allOf = CompletableFuture.allOf(userFuture, orderFuture);
        allOf.join();

        executor.shutdown();
    }

    private static String fetchDataFromDB() { return "원본 데이터"; }
    private static String transformData(String data) { return data + " -> 변환됨"; }
    private static String formatResult(String data) { return "[" + data + "]"; }
}

CompletableFuture는 Spring WebFlux나 리액티브 프로그래밍의 기반이 되므로, 백엔드 개발자라면 반드시 숙달해야 하는 API이다.

8. Lock 인터페이스와 ReentrantLock

java.util.concurrent.locks.Lock 인터페이스는 synchronized보다 유연한 락 메커니즘을 제공한다. 대표적인 구현체인 ReentrantLock은 다음과 같은 추가 기능을 지원한다.

락 획득 시도에 타임아웃을 설정할 수 있다 (tryLock(timeout, unit))
인터럽트 가능한 락 획득을 지원한다 (lockInterruptibly())
공정성(Fairness) 정책을 설정할 수 있다
Condition 객체를 통해 더 세밀한 대기/통지 메커니즘을 사용할 수 있다

import java.util.concurrent.locks.*;
import java.util.concurrent.TimeUnit;

public class ReentrantLockExample {
    private final ReentrantLock lock = new ReentrantLock(true); // 공정 모드
    private final Condition notEmpty = lock.newCondition();
    private final Condition notFull = lock.newCondition();
    private final int[] buffer;
    private int count, putIndex, takeIndex;

    public ReentrantLockExample(int capacity) {
        this.buffer = new int[capacity];
    }

    public void put(int item) throws InterruptedException {
        lock.lock();
        try {
            while (count == buffer.length) {
                notFull.await(); // 버퍼가 비워질 때까지 대기
            }
            buffer[putIndex] = item;
            putIndex = (putIndex + 1) % buffer.length;
            count++;
            notEmpty.signal(); // 소비자에게만 알림
        } finally {
            lock.unlock(); // 반드시 finally에서 해제
        }
    }

    public int take() throws InterruptedException {
        lock.lock();
        try {
            while (count == 0) {
                notEmpty.await(); // 데이터가 들어올 때까지 대기
            }
            int item = buffer[takeIndex];
            takeIndex = (takeIndex + 1) % buffer.length;
            count--;
            notFull.signal(); // 생산자에게만 알림
            return item;
        } finally {
            lock.unlock();
        }
    }

    // tryLock을 사용한 비블로킹 시도
    public boolean tryPut(int item, long timeout) throws InterruptedException {
        if (lock.tryLock(timeout, TimeUnit.MILLISECONDS)) {
            try {
                if (count < buffer.length) {
                    buffer[putIndex] = item;
                    putIndex = (putIndex + 1) % buffer.length;
                    count++;
                    notEmpty.signal();
                    return true;
                }
                return false;
            } finally {
                lock.unlock();
            }
        }
        return false;
    }
}

Condition을 사용하면 notifyAll()과 달리 특정 조건을 기다리는 스레드만 선택적으로 깨울 수 있어 성능이 향상된다. 위 예제에서 notEmpty.signal()은 소비자 스레드만, notFull.signal()은 생산자 스레드만 깨운다.

ReadWriteLock

읽기 작업이 쓰기 작업보다 훨씬 빈번한 경우에는 ReadWriteLock을 사용하면 동시 읽기를 허용하여 성능을 크게 향상시킬 수 있다.

import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;
import java.util.HashMap;
import java.util.Map;

public class ThreadSafeCache<K, V> {
    private final Map<K, V> cache = new HashMap<>();
    private final ReadWriteLock rwLock = new ReentrantReadWriteLock();

    public V get(K key) {
        rwLock.readLock().lock(); // 여러 스레드가 동시에 읽기 가능
        try {
            return cache.get(key);
        } finally {
            rwLock.readLock().unlock();
        }
    }

    public void put(K key, V value) {
        rwLock.writeLock().lock(); // 쓰기 시에는 배타적 접근
        try {
            cache.put(key, value);
        } finally {
            rwLock.writeLock().unlock();
        }
    }
}

9. 동시성 컬렉션

java.util.concurrent 패키지는 멀티스레드 환경에서 안전하게 사용할 수 있는 다양한 컬렉션을 제공한다. 이들은 Collections.synchronizedXxx()로 감싼 컬렉션보다 훨씬 뛰어난 성능을 보인다.

9.1 ConcurrentHashMap

ConcurrentHashMap은 세그먼트 기반(Java 7) 또는 노드 기반(Java 8+)의 세밀한 락 전략을 사용하여, 여러 스레드가 동시에 서로 다른 버킷에 접근할 수 있도록 설계되었다.

import java.util.concurrent.ConcurrentHashMap;

public class ConcurrentHashMapExample {
    private final ConcurrentHashMap<String, Integer> wordCount = new ConcurrentHashMap<>();

    public void countWord(String word) {
        // 원자적 연산 메서드 활용
        wordCount.merge(word, 1, Integer::sum);

        // compute 메서드로 원자적 업데이트
        wordCount.compute(word, (key, value) -> value == null ? 1 : value + 1);

        // putIfAbsent: 키가 없을 때만 삽입
        wordCount.putIfAbsent(word, 0);
    }

    public void processInParallel() {
        // 병렬 벌크 연산 (Java 8+)
        // 두 번째 인자는 병렬화 임계값 (요소 수가 이 값 이상이면 병렬 처리)
        wordCount.forEach(2, (key, value) -> {
            System.out.println(key + ": " + value);
        });

        // 검색
        String result = wordCount.search(2, (key, value) -> {
            return value > 100 ? key : null;
        });

        // 리듀스
        int total = wordCount.reduce(2,
            (key, value) -> value,
            Integer::sum);
    }
}

9.2 CopyOnWriteArrayList

CopyOnWriteArrayList는 쓰기 작업 시 내부 배열 전체를 복사하는 전략을 사용한다. 읽기 작업에는 락이 전혀 필요 없어 매우 빠르지만, 쓰기 작업의 비용이 크다. 따라서 읽기가 압도적으로 많고 쓰기가 드문 상황에서 적합하다.

import java.util.concurrent.CopyOnWriteArrayList;
import java.util.List;

public class EventListenerManager {
    // 리스너 등록/해제는 드물고, 이벤트 발생 시 모든 리스너를 순회하는 경우가 많은 패턴
    private final List<EventListener> listeners = new CopyOnWriteArrayList<>();

    public void addListener(EventListener listener) {
        listeners.add(listener); // 내부 배열을 복사하여 새 요소를 추가
    }

    public void removeListener(EventListener listener) {
        listeners.remove(listener);
    }

    public void fireEvent(Event event) {
        // 반복 중에 ConcurrentModificationException이 발생하지 않음
        for (EventListener listener : listeners) {
            listener.onEvent(event);
        }
    }

    interface EventListener { void onEvent(Event event); }
    static class Event { }
}

9.3 BlockingQueue

BlockingQueue는 생산자-소비자 패턴을 매우 간단하게 구현할 수 있게 해주는 인터페이스이다.

import java.util.concurrent.*;

public class BlockingQueueExample {
    public static void main(String[] args) {
        BlockingQueue<String> queue = new LinkedBlockingQueue<>(10);

        // 생산자
        new Thread(() -> {
            try {
                for (int i = 0; i < 100; i++) {
                    queue.put("메시지-" + i); // 큐가 가득 차면 자동으로 블록
                    System.out.println("생산: 메시지-" + i);
                }
                queue.put("종료"); // 종료 시그널
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }).start();

        // 소비자
        new Thread(() -> {
            try {
                while (true) {
                    String message = queue.take(); // 큐가 비면 자동으로 블록
                    if ("종료".equals(message)) break;
                    System.out.println("소비: " + message);
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            }
        }).start();
    }
}

10. 데드락, 라이브락, 기아 상태

10.1 데드락 (Deadlock)

데드락은 두 개 이상의 스레드가 서로가 보유한 자원을 기다리며 영원히 블록되는 상태이다. 데드락이 발생하기 위한 네 가지 조건(상호 배제, 점유와 대기, 비선점, 순환 대기)이 모두 충족되어야 한다.

public class DeadlockExample {
    private final Object lockA = new Object();
    private final Object lockB = new Object();

    public void method1() {
        synchronized (lockA) {
            System.out.println("Thread1: lockA 획득");
            try { Thread.sleep(100); } catch (InterruptedException e) { }
            synchronized (lockB) { // lockB를 기다리지만, Thread2가 보유 중
                System.out.println("Thread1: lockB 획득");
            }
        }
    }

    public void method2() {
        synchronized (lockB) {
            System.out.println("Thread2: lockB 획득");
            try { Thread.sleep(100); } catch (InterruptedException e) { }
            synchronized (lockA) { // lockA를 기다리지만, Thread1이 보유 중
                System.out.println("Thread2: lockA 획득");
            }
        }
    }
}

데드락 방지 전략:

락 순서 고정: 모든 스레드가 항상 같은 순서로 락을 획득하도록 한다.
타임아웃 설정: tryLock(timeout)을 사용하여 일정 시간 내에 락을 얻지 못하면 포기한다.
데드락 탐지: ThreadMXBean을 사용하여 런타임에 데드락을 감지할 수 있다.

import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;

public class DeadlockDetector {
    public static void detectDeadlock() {
        ThreadMXBean bean = ManagementFactory.getThreadMXBean();
        long[] deadlockedThreads = bean.findDeadlockedThreads();
        if (deadlockedThreads != null) {
            System.err.println("데드락이 감지되었습니다! 관련 스레드 수: " + deadlockedThreads.length);
        }
    }
}

10.2 라이브락 (Livelock)

라이브락은 스레드들이 서로에게 양보하느라 실제로는 아무 진전도 이루지 못하는 상태이다. 데드락과 달리 스레드가 블록되지는 않지만, 유용한 작업을 수행하지 못한다는 점에서 동일하게 문제가 된다. 해결책으로는 랜덤 지연이나 백오프(backoff) 전략을 도입하는 것이 있다.

10.3 기아 상태 (Starvation)

기아 상태는 특정 스레드가 자원을 획득하지 못하고 무한히 대기하는 현상이다. 우선순위가 낮은 스레드가 높은 우선순위의 스레드에 의해 계속 밀리거나, 불공정한 락 정책으로 인해 발생할 수 있다. ReentrantLock의 공정 모드(new ReentrantLock(true))를 사용하면 기아 상태를 방지할 수 있지만, 처리량(throughput)이 다소 감소할 수 있다.

11. ThreadLocal

ThreadLocal은 각 스레드마다 독립적인 변수 복사본을 유지하여, 동기화 없이도 스레드 안전성을 확보하는 메커니즘이다. 스레드 간에 데이터를 공유하지 않으므로, 스레드별 상태를 저장하는 데 이상적이다.

public class ThreadLocalExample {
    // 각 스레드마다 독립적인 SimpleDateFormat 인스턴스 보유
    private static final ThreadLocal<java.text.SimpleDateFormat> dateFormatter =
        ThreadLocal.withInitial(() -> new java.text.SimpleDateFormat("yyyy-MM-dd HH:mm:ss"));

    // 트랜잭션 컨텍스트 저장
    private static final ThreadLocal<String> transactionId = new ThreadLocal<>();

    public static String formatDate(java.util.Date date) {
        return dateFormatter.get().format(date);
    }

    public static void setTransactionId(String txId) {
        transactionId.set(txId);
    }

    public static String getTransactionId() {
        return transactionId.get();
    }

    public static void clear() {
        // 스레드 풀 환경에서는 반드시 remove 호출 (메모리 누수 방지)
        dateFormatter.remove();
        transactionId.remove();
    }

    public static void main(String[] args) {
        Runnable task = () -> {
            try {
                String txId = "TX-" + Thread.currentThread().getId();
                setTransactionId(txId);
                System.out.println(Thread.currentThread().getName()
                    + " 트랜잭션: " + getTransactionId());
                System.out.println(Thread.currentThread().getName()
                    + " 날짜: " + formatDate(new java.util.Date()));
            } finally {
                clear(); // 반드시 정리
            }
        };

        new Thread(task, "워커-1").start();
        new Thread(task, "워커-2").start();
    }
}

Spring Framework에서는 RequestContextHolder가 내부적으로 ThreadLocal을 사용하여 현재 HTTP 요청 정보를 스레드에 바인딩한다. 또한 Spring Security의 SecurityContextHolder도 기본적으로 ThreadLocal 전략을 사용한다.

주의사항: 스레드 풀을 사용하는 환경에서는 스레드가 재사용되므로, 작업 완료 후 반드시 ThreadLocal.remove()를 호출하여 이전 작업의 데이터가 남아있지 않도록 해야 한다. 이를 지키지 않으면 메모리 누수가 발생하거나, 다른 요청의 데이터가 잘못 참조되는 심각한 버그가 발생할 수 있다.

12. atomic 패키지

java.util.concurrent.atomic 패키지는 CAS(Compare-And-Swap) 연산을 기반으로 락 없이(lock-free) 원자적 연산을 수행할 수 있는 클래스들을 제공한다. CAS는 하드웨어 수준에서 지원되는 원자적 명령어로, 기대하는 값과 현재 값이 일치할 때만 새 값으로 업데이트한다.

import java.util.concurrent.atomic.*;

public class AtomicExample {
    private final AtomicInteger counter = new AtomicInteger(0);
    private final AtomicLong totalAmount = new AtomicLong(0);
    private final AtomicBoolean isRunning = new AtomicBoolean(true);
    private final AtomicReference<String> currentState = new AtomicReference<>("INIT");

    public void incrementCounter() {
        counter.incrementAndGet();       // 원자적 증가
        counter.getAndAdd(5);             // 원자적 덧셈
        counter.updateAndGet(x -> x * 2); // 원자적 함수 적용 (Java 8+)
    }

    // CAS 기반 논블로킹 스택 구현 예시
    private final AtomicReference<Node> top = new AtomicReference<>();

    public void push(int value) {
        Node newNode = new Node(value);
        Node oldTop;
        do {
            oldTop = top.get();
            newNode.next = oldTop;
        } while (!top.compareAndSet(oldTop, newNode));
        // CAS가 실패하면 (다른 스레드가 먼저 수정) 재시도
    }

    public Integer pop() {
        Node oldTop;
        Node newTop;
        do {
            oldTop = top.get();
            if (oldTop == null) return null;
            newTop = oldTop.next;
        } while (!top.compareAndSet(oldTop, newTop));
        return oldTop.value;
    }

    static class Node {
        final int value;
        Node next;
        Node(int value) { this.value = value; }
    }
}

LongAdder와 LongAccumulator

높은 경합(contention) 상황에서는 AtomicLong보다 LongAdder가 더 나은 성능을 보인다. LongAdder는 내부적으로 여러 셀(cell)에 값을 분산 저장하여 경합을 줄이고, sum() 호출 시 모든 셀의 값을 합산한다.

import java.util.concurrent.atomic.LongAdder;

public class HighContentionCounter {
    // 많은 스레드가 동시에 증가시키는 카운터에 적합
    private final LongAdder requestCount = new LongAdder();

    public void onRequest() {
        requestCount.increment(); // AtomicLong.incrementAndGet()보다 빠름
    }

    public long getTotalRequests() {
        return requestCount.sum(); // 정확한 값이 아닌 근사치일 수 있음
    }
}

13. Fork/Join 프레임워크

Fork/Join 프레임워크는 Java 7에서 도입된 병렬 처리 프레임워크로, 분할 정복(Divide and Conquer) 알고리즘을 멀티스레드 환경에서 효율적으로 실행할 수 있도록 설계되었다. 작업 훔치기(Work-Stealing) 알고리즘을 사용하여 스레드 간 부하를 자동으로 분산한다.

import java.util.concurrent.*;

public class ParallelSum extends RecursiveTask<Long> {
    private static final int THRESHOLD = 10_000;
    private final long[] array;
    private final int start;
    private final int end;

    public ParallelSum(long[] array, int start, int end) {
        this.array = array;
        this.start = start;
        this.end = end;
    }

    @Override
    protected Long compute() {
        int length = end - start;
        if (length <= THRESHOLD) {
            // 기본 케이스: 순차적으로 합산
            long sum = 0;
            for (int i = start; i < end; i++) {
                sum += array[i];
            }
            return sum;
        }

        // 재귀 케이스: 작업을 반으로 분할
        int mid = start + length / 2;
        ParallelSum leftTask = new ParallelSum(array, start, mid);
        ParallelSum rightTask = new ParallelSum(array, mid, end);

        leftTask.fork();  // 왼쪽 작업을 다른 스레드에서 비동기 실행
        Long rightResult = rightTask.compute(); // 오른쪽 작업은 현재 스레드에서 실행
        Long leftResult = leftTask.join();      // 왼쪽 작업의 결과를 대기

        return leftResult + rightResult;
    }

    public static void main(String[] args) {
        long[] array = new long[10_000_000];
        for (int i = 0; i < array.length; i++) {
            array[i] = i + 1;
        }

        ForkJoinPool pool = new ForkJoinPool(); // CPU 코어 수만큼 스레드 생성
        ParallelSum task = new ParallelSum(array, 0, array.length);

        long startTime = System.currentTimeMillis();
        long result = pool.invoke(task);
        long endTime = System.currentTimeMillis();

        System.out.println("합계: " + result);
        System.out.println("소요 시간: " + (endTime - startTime) + "ms");
        System.out.println("활성 스레드 수: " + pool.getParallelism());
    }
}

Java 8의 병렬 스트림(Parallel Stream)은 내부적으로 공용 ForkJoinPool을 사용한다. Arrays.parallelSort()도 Fork/Join 프레임워크를 기반으로 동작한다. 그러나 I/O 바운드 작업에는 Fork/Join이 적합하지 않으며, CPU 바운드의 순수 계산 작업에서 가장 효과적이다.

14. 실무에서의 멀티스레딩 Best Practices

14.1 불변 객체 활용

불변 객체(Immutable Object)는 생성 후 상태가 변하지 않으므로 본질적으로 스레드 안전하다. 동기화가 전혀 필요 없으므로 성능과 안전성 측면에서 모두 우수하다.

public final class ImmutableOrder {
    private final String orderId;
    private final String productName;
    private final int quantity;
    private final java.time.LocalDateTime createdAt;

    public ImmutableOrder(String orderId, String productName, int quantity) {
        this.orderId = orderId;
        this.productName = productName;
        this.quantity = quantity;
        this.createdAt = java.time.LocalDateTime.now();
    }

    // getter만 제공, setter 없음
    public String getOrderId() { return orderId; }
    public String getProductName() { return productName; }
    public int getQuantity() { return quantity; }
    public java.time.LocalDateTime getCreatedAt() { return createdAt; }

    // 변경이 필요하면 새 객체를 생성
    public ImmutableOrder withQuantity(int newQuantity) {
        return new ImmutableOrder(this.orderId, this.productName, newQuantity);
    }
}

14.2 스레드 풀 올바르게 사용하기

import java.util.concurrent.*;

public class ThreadPoolBestPractice {
    // 실무 권장: ThreadPoolExecutor를 직접 구성
    private final ThreadPoolExecutor executor = new ThreadPoolExecutor(
        4,                      // corePoolSize: 기본 스레드 수
        8,                      // maximumPoolSize: 최대 스레드 수
        60L, TimeUnit.SECONDS,  // 유휴 스레드 대기 시간
        new LinkedBlockingQueue<>(1000), // 작업 큐 (크기 제한 필수)
        new ThreadFactory() {
            private int count = 0;
            @Override
            public Thread newThread(Runnable r) {
                Thread t = new Thread(r, "주문처리-" + count++);
                t.setDaemon(false);
                return t;
            }
        },
        new ThreadPoolExecutor.CallerRunsPolicy() // 거부 정책
    );

    // CPU 바운드: 코어 수 + 1
    // I/O 바운드: 코어 수 * 2 또는 코어 수 * (1 + 대기시간/서비스시간)
    public static int optimalPoolSize(boolean isCpuBound) {
        int cores = Runtime.getRuntime().availableProcessors();
        return isCpuBound ? cores + 1 : cores * 2;
    }

    public void shutdown() {
        executor.shutdown();
        try {
            if (!executor.awaitTermination(30, TimeUnit.SECONDS)) {
                executor.shutdownNow();
                if (!executor.awaitTermination(30, TimeUnit.SECONDS)) {
                    System.err.println("스레드 풀이 정상 종료되지 않았습니다.");
                }
            }
        } catch (InterruptedException e) {
            executor.shutdownNow();
            Thread.currentThread().interrupt();
        }
    }
}

14.3 동시성 문제 디버깅 팁

동시성 버그는 재현이 어렵고 디버깅이 까다롭다. 다음 도구와 기법을 활용하면 도움이 된다.

스레드 덤프 분석: jstack <pid> 명령어로 모든 스레드의 상태와 스택 트레이스를 확인할 수 있다. 데드락이 발생하면 JVM이 이를 자동으로 감지하여 표시해준다.
동시성 테스트: CountDownLatch를 활용하여 여러 스레드가 동시에 작업을 시작하도록 하여 경합 조건을 재현할 수 있다.

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;

public class ConcurrencyTest {
    public static void main(String[] args) throws InterruptedException {
        int threadCount = 100;
        CountDownLatch startLatch = new CountDownLatch(1);
        CountDownLatch endLatch = new CountDownLatch(threadCount);
        AtomicInteger counter = new AtomicInteger(0);

        for (int i = 0; i < threadCount; i++) {
            new Thread(() -> {
                try {
                    startLatch.await(); // 모든 스레드가 동시에 시작
                    counter.incrementAndGet();
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                } finally {
                    endLatch.countDown();
                }
            }).start();
        }

        startLatch.countDown(); // 모든 스레드 동시 시작
        endLatch.await();       // 모든 스레드 완료 대기
        System.out.println("최종 카운터: " + counter.get()); // 반드시 100
    }
}

정적 분석 도구: IntelliJ IDEA의 Thread Safety 검사, SpotBugs(FindBugs 후속), Google의 ErrorProne 등이 동시성 관련 잠재적 버그를 컴파일 타임에 감지해줄 수 있다.

14.4 핵심 원칙 정리

원칙	설명
가변 상태 최소화	불변 객체와 final 필드를 적극 활용한다
공유 최소화	스레드 간 공유하는 가변 상태를 줄인다. ThreadLocal 활용을 고려한다
고수준 API 사용	synchronized보다 java.util.concurrent 패키지의 도구를 우선 사용한다
락 범위 최소화	임계 영역을 가능한 한 좁게 유지하여 경합을 줄인다
문서화	클래스와 메서드의 스레드 안전성 수준을 명확히 문서화한다
테스트	동시성 코드는 단위 테스트만으로 부족하며, 부하 테스트와 스트레스 테스트를 병행한다

마무리

Java 멀티스레딩은 광범위하고 깊이 있는 주제이다. 이번 글에서는 기본적인 스레드 생성부터 synchronized, volatile 같은 저수준 동기화 메커니즘, java.util.concurrent 패키지의 고수준 도구들, 그리고 실무 모범 사례까지 폭넓게 다루었다.

핵심은 가변 공유 상태를 최소화하고, 필요한 경우에는 적절한 수준의 동기화 도구를 선택하는 것이다. 단순한 플래그 변수에는 volatile을, 원자적 카운터에는 AtomicInteger를, 복잡한 비즈니스 로직에는 ReentrantLock이나 동시성 컬렉션을, 비동기 작업 흐름에는 CompletableFuture를 사용하는 것처럼, 상황에 맞는 최적의 도구를 선택하는 안목을 기르는 것이 중요하다.

특히 Spring Boot 기반의 백엔드 개발에서는 프레임워크가 내부적으로 스레드 풀을 관리해주지만, 커스텀 비동기 처리나 공유 자원 접근이 필요한 경우에는 이러한 동시성 프로그래밍 지식이 필수적이다. 앞으로도 Java의 가상 스레드(Virtual Thread, Project Loom), 구조적 동시성(Structured Concurrency) 등 새로운 동시성 모델이 등장하고 있으므로, 기초를 탄탄히 다진 위에서 새로운 기술을 학습해 나가는 자세가 필요하다.