[3rd_week-day3]Python으로 데이터 다루기 Ⅱ - seaborn
Seaborn
Matplotlib를 기반으로 더 다양한 시각화 방법을 제공하는 라이브러리
- 커널밀도그림
- 카운트그림
- 캣그림
- 스트립그림
- 히트맵
Seaborn Import 하기
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
커널밀도그림 (Kernel Density Plot)
- 히스토그램과 같은 연속적인 분포를 곡선화해서 그린 그림
sns.kdeplot()
# in Histogram
x = np.arange(0, 22, 2)
y = np.random.randint(0, 20, 20)
plt.hist(y, bins=x)
(array([1., 2., 5., 0., 2., 2., 3., 3., 1., 1.]),
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]),
<BarContainer object of 10 artists>)
# kdeplot
sns.kdeplot(y, shade=True)
plt.show()
카운트그림 (Count plot)
- 범주형 column의 빈도수를 시각화 -> Groupby 후의 도수를 하는 것과 동일한 효과
- sns.countplot()
vote_df = pd.DataFrame({"name":['Andy', 'Bob', 'Cat'], "vote":[True, True, False]})
vote_df
name | vote | |
---|---|---|
0 | Andy | True |
1 | Bob | True |
2 | Cat | False |
# in matplotlib barplot
vote_count = vote_df.groupby('vote').count()
vote_count
name | |
---|---|
vote | |
False | 1 |
True | 2 |
plt.bar(x=[False, True], height=vote_count['name'])
plt.show()
# sns의 countplot
sns.countplot(x=vote_df['vote'])
plt.show()
캣그림 (Cat Plot)
- 숫자형 변수와 하나 이상의 범주형 변수의 관계를 보여주는 함수
sns.catplot()
covid = pd.read_csv("./country_wise_latest.csv")
covid.head(5)
Country/Region | Confirmed | Deaths | Recovered | Active | New cases | New deaths | New recovered | Deaths / 100 Cases | Recovered / 100 Cases | Deaths / 100 Recovered | Confirmed last week | 1 week change | 1 week % increase | WHO Region | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 36263 | 1269 | 25198 | 9796 | 106 | 10 | 18 | 3.50 | 69.49 | 5.04 | 35526 | 737 | 2.07 | Eastern Mediterranean |
1 | Albania | 4880 | 144 | 2745 | 1991 | 117 | 6 | 63 | 2.95 | 56.25 | 5.25 | 4171 | 709 | 17.00 | Europe |
2 | Algeria | 27973 | 1163 | 18837 | 7973 | 616 | 8 | 749 | 4.16 | 67.34 | 6.17 | 23691 | 4282 | 18.07 | Africa |
3 | Andorra | 907 | 52 | 803 | 52 | 10 | 0 | 0 | 5.73 | 88.53 | 6.48 | 884 | 23 | 2.60 | Europe |
4 | Angola | 950 | 41 | 242 | 667 | 18 | 1 | 0 | 4.32 | 25.47 | 16.94 | 749 | 201 | 26.84 | Africa |
s = sns.catplot(x='WHO Region', y='Confirmed' , data=covid) # hue 속성은 점들 간의 분포를 나눠서 다룰 수 있다(?) - EDA에서 다뤄볼 것..
s.fig.set_size_inches(10, 6)
plt.show() # strip plot 형태.
s = sns.catplot(x='WHO Region', y='Confirmed' , data=covid, kind="violin")
s.fig.set_size_inches(10, 6)
plt.show() # violin plot 형태.
스트립그림 (Strip Plot)
- scatter plot과 유사하게 데이터의 수치를 표현하는 그래프
sns.stripplot()
sns.stripplot(x="WHO Region", y="Recovered", data=covid)
plt.show()
# cf) swarmplot
# 값이 겹치면 양옆으로 분산 시켜준 형태
s = sns.swarmplot(x="WHO Region", y="Recovered", data=covid)
plt.show()
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 22.7% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 69.6% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 79.2% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 54.3% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
C:\Users\Taeho Kim\AppData\Local\Programs\Python\Python38\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 31.2% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
히트맵 (Heatmap)
- 데이터의 행렬을 색상으로 표현해주는 그래프
sns.heatmap()
# 히트맵 예제
covid.corr()
Confirmed | Deaths | Recovered | Active | New cases | New deaths | New recovered | Deaths / 100 Cases | Recovered / 100 Cases | Deaths / 100 Recovered | Confirmed last week | 1 week change | 1 week % increase | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Confirmed | 1.000000 | 0.934698 | 0.906377 | 0.927018 | 0.909720 | 0.871683 | 0.859252 | 0.063550 | -0.064815 | 0.025175 | 0.999127 | 0.954710 | -0.010161 |
Deaths | 0.934698 | 1.000000 | 0.832098 | 0.871586 | 0.806975 | 0.814161 | 0.765114 | 0.251565 | -0.114529 | 0.169006 | 0.939082 | 0.855330 | -0.034708 |
Recovered | 0.906377 | 0.832098 | 1.000000 | 0.682103 | 0.818942 | 0.820338 | 0.919203 | 0.048438 | 0.026610 | -0.027277 | 0.899312 | 0.910013 | -0.013697 |
Active | 0.927018 | 0.871586 | 0.682103 | 1.000000 | 0.851190 | 0.781123 | 0.673887 | 0.054380 | -0.132618 | 0.058386 | 0.931459 | 0.847642 | -0.003752 |
New cases | 0.909720 | 0.806975 | 0.818942 | 0.851190 | 1.000000 | 0.935947 | 0.914765 | 0.020104 | -0.078666 | -0.011637 | 0.896084 | 0.959993 | 0.030791 |
New deaths | 0.871683 | 0.814161 | 0.820338 | 0.781123 | 0.935947 | 1.000000 | 0.889234 | 0.060399 | -0.062792 | -0.020750 | 0.862118 | 0.894915 | 0.025293 |
New recovered | 0.859252 | 0.765114 | 0.919203 | 0.673887 | 0.914765 | 0.889234 | 1.000000 | 0.017090 | -0.024293 | -0.023340 | 0.839692 | 0.954321 | 0.032662 |
Deaths / 100 Cases | 0.063550 | 0.251565 | 0.048438 | 0.054380 | 0.020104 | 0.060399 | 0.017090 | 1.000000 | -0.168920 | 0.334594 | 0.069894 | 0.015095 | -0.134534 |
Recovered / 100 Cases | -0.064815 | -0.114529 | 0.026610 | -0.132618 | -0.078666 | -0.062792 | -0.024293 | -0.168920 | 1.000000 | -0.295381 | -0.064600 | -0.063013 | -0.394254 |
Deaths / 100 Recovered | 0.025175 | 0.169006 | -0.027277 | 0.058386 | -0.011637 | -0.020750 | -0.023340 | 0.334594 | -0.295381 | 1.000000 | 0.030460 | -0.013763 | -0.049083 |
Confirmed last week | 0.999127 | 0.939082 | 0.899312 | 0.931459 | 0.896084 | 0.862118 | 0.839692 | 0.069894 | -0.064600 | 0.030460 | 1.000000 | 0.941448 | -0.015247 |
1 week change | 0.954710 | 0.855330 | 0.910013 | 0.847642 | 0.959993 | 0.894915 | 0.954321 | 0.015095 | -0.063013 | -0.013763 | 0.941448 | 1.000000 | 0.026594 |
1 week % increase | -0.010161 | -0.034708 | -0.013697 | -0.003752 | 0.030791 | 0.025293 | 0.032662 | -0.134534 | -0.394254 | -0.049083 | -0.015247 | 0.026594 | 1.000000 |
sns.heatmap(covid.corr())
<AxesSubplot:>