Seaborn

Matplotlib를 기반으로 더 다양한 시각화 방법을 제공하는 라이브러리

  • 커널밀도그림
  • 카운트그림
  • 캣그림
  • 스트립그림
  • 히트맵

Seaborn Import 하기

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

커널밀도그림 (Kernel Density Plot)

  • 히스토그램과 같은 연속적인 분포를 곡선화해서 그린 그림
  • sns.kdeplot()
# in Histogram

x = np.arange(0, 22, 2)
y = np.random.randint(0, 20, 20)

plt.hist(y, bins=x)
(array([1., 2., 5., 0., 2., 2., 3., 3., 1., 1.]),
 array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20]),
 <BarContainer object of 10 artists>)

png

# kdeplot

sns.kdeplot(y, shade=True)

plt.show()

png

카운트그림 (Count plot)

  • 범주형 column의 빈도수를 시각화 -> Groupby 후의 도수를 하는 것과 동일한 효과
  • sns.countplot()
vote_df = pd.DataFrame({"name":['Andy', 'Bob', 'Cat'], "vote":[True, True, False]})
vote_df
name vote
0 Andy True
1 Bob True
2 Cat False
# in matplotlib barplot

vote_count = vote_df.groupby('vote').count()
vote_count
name
vote
False 1
True 2
plt.bar(x=[False, True], height=vote_count['name'])
plt.show()

png

# sns의 countplot

sns.countplot(x=vote_df['vote'])

plt.show()

png

캣그림 (Cat Plot)

  • 숫자형 변수와 하나 이상의 범주형 변수의 관계를 보여주는 함수
  • sns.catplot()
covid = pd.read_csv("./country_wise_latest.csv")

covid.head(5)
Country/Region Confirmed Deaths Recovered Active New cases New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases Deaths / 100 Recovered Confirmed last week 1 week change 1 week % increase WHO Region
0 Afghanistan 36263 1269 25198 9796 106 10 18 3.50 69.49 5.04 35526 737 2.07 Eastern Mediterranean
1 Albania 4880 144 2745 1991 117 6 63 2.95 56.25 5.25 4171 709 17.00 Europe
2 Algeria 27973 1163 18837 7973 616 8 749 4.16 67.34 6.17 23691 4282 18.07 Africa
3 Andorra 907 52 803 52 10 0 0 5.73 88.53 6.48 884 23 2.60 Europe
4 Angola 950 41 242 667 18 1 0 4.32 25.47 16.94 749 201 26.84 Africa
s = sns.catplot(x='WHO Region', y='Confirmed' , data=covid) # hue 속성은 점들 간의 분포를 나눠서 다룰 수 있다(?) - EDA에서 다뤄볼 것..
s.fig.set_size_inches(10, 6)
plt.show() # strip plot 형태.

s = sns.catplot(x='WHO Region', y='Confirmed' , data=covid, kind="violin")
s.fig.set_size_inches(10, 6)
plt.show() # violin plot 형태.

png

png

스트립그림 (Strip Plot)

  • scatter plot과 유사하게 데이터의 수치를 표현하는 그래프
  • sns.stripplot()
sns.stripplot(x="WHO Region", y="Recovered", data=covid)

plt.show()

png

# cf) swarmplot
# 값이 겹치면 양옆으로 분산 시켜준 형태

s = sns.swarmplot(x="WHO Region", y="Recovered", data=covid)

plt.show()

C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 22.7% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
  warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 69.6% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
  warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 79.2% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
  warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 54.3% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
  warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 31.2% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
  warnings.warn(msg, UserWarning)

png

히트맵 (Heatmap)

  • 데이터의 행렬을 색상으로 표현해주는 그래프
  • sns.heatmap()
# 히트맵 예제

covid.corr()

Confirmed Deaths Recovered Active New cases New deaths New recovered Deaths / 100 Cases Recovered / 100 Cases Deaths / 100 Recovered Confirmed last week 1 week change 1 week % increase
Confirmed 1.000000 0.934698 0.906377 0.927018 0.909720 0.871683 0.859252 0.063550 -0.064815 0.025175 0.999127 0.954710 -0.010161
Deaths 0.934698 1.000000 0.832098 0.871586 0.806975 0.814161 0.765114 0.251565 -0.114529 0.169006 0.939082 0.855330 -0.034708
Recovered 0.906377 0.832098 1.000000 0.682103 0.818942 0.820338 0.919203 0.048438 0.026610 -0.027277 0.899312 0.910013 -0.013697
Active 0.927018 0.871586 0.682103 1.000000 0.851190 0.781123 0.673887 0.054380 -0.132618 0.058386 0.931459 0.847642 -0.003752
New cases 0.909720 0.806975 0.818942 0.851190 1.000000 0.935947 0.914765 0.020104 -0.078666 -0.011637 0.896084 0.959993 0.030791
New deaths 0.871683 0.814161 0.820338 0.781123 0.935947 1.000000 0.889234 0.060399 -0.062792 -0.020750 0.862118 0.894915 0.025293
New recovered 0.859252 0.765114 0.919203 0.673887 0.914765 0.889234 1.000000 0.017090 -0.024293 -0.023340 0.839692 0.954321 0.032662
Deaths / 100 Cases 0.063550 0.251565 0.048438 0.054380 0.020104 0.060399 0.017090 1.000000 -0.168920 0.334594 0.069894 0.015095 -0.134534
Recovered / 100 Cases -0.064815 -0.114529 0.026610 -0.132618 -0.078666 -0.062792 -0.024293 -0.168920 1.000000 -0.295381 -0.064600 -0.063013 -0.394254
Deaths / 100 Recovered 0.025175 0.169006 -0.027277 0.058386 -0.011637 -0.020750 -0.023340 0.334594 -0.295381 1.000000 0.030460 -0.013763 -0.049083
Confirmed last week 0.999127 0.939082 0.899312 0.931459 0.896084 0.862118 0.839692 0.069894 -0.064600 0.030460 1.000000 0.941448 -0.015247
1 week change 0.954710 0.855330 0.910013 0.847642 0.959993 0.894915 0.954321 0.015095 -0.063013 -0.013763 0.941448 1.000000 0.026594
1 week % increase -0.010161 -0.034708 -0.013697 -0.003752 0.030791 0.025293 0.032662 -0.134534 -0.394254 -0.049083 -0.015247 0.026594 1.000000
sns.heatmap(covid.corr())
<AxesSubplot:>

png