第 5 章 数据的可视化分析
 5.1 特殊统计图的绘制
 5.1.1 数学函数图
 5.1.2 气泡图
 5.1.3 三维曲面图
 5.1.4 三维散点图
 5.2 Seaborn 统计绘图
 5.2.1 基本概念
 5.2.2 常用统计图
 5.3 ggplot 绘图系统
 5.3.1 qplot 快速制图
 5.3.2 ggplot 基本绘图

第5章 数据的可视化分析

5.1 特殊统计图的绘制

5.1.1 数学函数图

(1)初等函数图

In [11]:
import matplotlib.pyplot as plt            #加载基本绘图包
plt.rcParams['font.sans-serif']=['KaiTi']; #SimHei黑体
plt.rcParams['axes.unicode_minus']=False;  #正常显示图中负号
import numpy as np #加载软件包numpy
import math        #加载软件包math
In [12]:
x=np.linspace(0,2*math.pi);x #[0,2*pi]序列 
Out[12]:
array([0.        , 0.12822827, 0.25645654, 0.38468481, 0.51291309,
       0.64114136, 0.76936963, 0.8975979 , 1.02582617, 1.15405444,
       1.28228272, 1.41051099, 1.53873926, 1.66696753, 1.7951958 ,
       1.92342407, 2.05165235, 2.17988062, 2.30810889, 2.43633716,
       2.56456543, 2.6927937 , 2.82102197, 2.94925025, 3.07747852,
       3.20570679, 3.33393506, 3.46216333, 3.5903916 , 3.71861988,
       3.84684815, 3.97507642, 4.10330469, 4.23153296, 4.35976123,
       4.48798951, 4.61621778, 4.74444605, 4.87267432, 5.00090259,
       5.12913086, 5.25735913, 5.38558741, 5.51381568, 5.64204395,
       5.77027222, 5.89850049, 6.02672876, 6.15495704, 6.28318531])
In [13]:
plt.plot(x,np.sin(x)) #y=sinx
Out[13]:
[<matplotlib.lines.Line2D at 0x233019a9dd8>]
In [14]:
plt.plot(x,np.cos(x)) #y=cosx
Out[14]:
[<matplotlib.lines.Line2D at 0x233012f09e8>]
In [15]:
plt.plot(x,np.log(x)) #y=lnx
C:\Users\mlin\AppData\Roaming\Python\Python36\site-packages\ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log
  """Entry point for launching an IPython kernel.
Out[15]:
[<matplotlib.lines.Line2D at 0x233019fa128>]
In [16]:
plt.plot(x,np.exp(x)) #y=e^x
Out[16]:
[<matplotlib.lines.Line2D at 0x233012db7b8>]

(2)极坐标图(加公式)

In [17]:
t=np.linspace(0,2*math.pi) 
x=2*np.sin(t); 
y=3*np.cos(t) 
plt.plot(x,y); 
plt.text(0,0,r'$\frac{x^2}{2}+\frac{y^2}{3}=1$',fontsize=20)
Out[17]:
Text(0,0,'$\\frac{x^2}{2}+\\frac{y^2}{3}=1$')

5.1.2 气泡图

In [18]:
import pandas as pd
BSdata=pd.read_excel('DaPy_data.xlsx','BSdata');
plt.scatter(BSdata['身高'], BSdata['体重'], s=BSdata['支出']); 

5.1.3 三维曲面图

In [19]:
from mpl_toolkits.mplot3d import Axes3D 
fig = plt.figure() 
ax = Axes3D(fig) 
X = np.arange(-4, 4, 0.5) 
Y = np.arange(-4, 4, 0.5) 
X, Y = np.meshgrid(X, Y)
Z = np.sqrt(X**2 + Y**2)
ax.plot_surface(X, Y, Z);

5.1.4 三维散点图

In [20]:
from mpl_toolkits.mplot3d import Axes3D 
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(BSdata['身高'], BSdata['体重'], BSdata['支出'])
Out[20]:
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x23301d77a90>

5.2 Seaborn 统计绘图

5.2.2 常用统计图

In [21]:
#先在系统上安装seaborn软件包
!pip install seaborn
Requirement already satisfied: seaborn in d:\programfile\lib\site-packages (0.8.1)
Requirement already satisfied: numpy in d:\programfile\lib\site-packages (from seaborn) (1.16.2)
Requirement already satisfied: scipy in d:\programfile\lib\site-packages (from seaborn) (1.3.1)
Requirement already satisfied: matplotlib in d:\programfile\lib\site-packages (from seaborn) (2.2.2)
Requirement already satisfied: pandas in d:\programfile\lib\site-packages (from seaborn) (0.25.3)
Requirement already satisfied: cycler>=0.10 in d:\programfile\lib\site-packages (from matplotlib->seaborn) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in d:\programfile\lib\site-packages (from matplotlib->seaborn) (2.2.0)
Requirement already satisfied: python-dateutil>=2.1 in d:\programfile\lib\site-packages (from matplotlib->seaborn) (2.6.1)
Requirement already satisfied: pytz in d:\programfile\lib\site-packages (from matplotlib->seaborn) (2017.3)
Requirement already satisfied: six>=1.10 in d:\programfile\lib\site-packages (from matplotlib->seaborn) (1.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in d:\programfile\lib\site-packages (from matplotlib->seaborn) (1.1.0)
Requirement already satisfied: setuptools in d:\programfile\lib\site-packages (from kiwisolver>=1.0.1->matplotlib->seaborn) (40.8.0)
WARNING: You are using pip version 20.0.2; however, version 20.2.3 is available.
You should consider upgrading via the 'd:\programfile\python.exe -m pip install --upgrade pip' command.
5.2.2.1 箱线图(boxplot)
In [22]:
import seaborn as sns  #加载软件包seaborn
sns.boxplot(x=BSdata['身高'])
Out[22]:
<matplotlib.axes._subplots.AxesSubplot at 0x233119f1eb8>
In [23]:
#竖着放的箱线图,也就是将 x 换成 y
sns.boxplot(y=BSdata['身高']) 
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x23311a55a58>
In [24]:
# 分组绘制箱线图
sns.boxplot(x='性别', y='身高',data=BSdata) 
Out[24]:
<matplotlib.axes._subplots.AxesSubplot at 0x23311aa9630>
5.2.2.2 小提琴图(violinplot)
In [25]:
sns.violinplot(x='开设', y='支出', hue='性别', data=BSdata) 
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x23311b264a8>
5.2.2.3 点图(stripplot)
In [26]:
sns.stripplot(x='性别', y='身高', data=BSdata, jitter=True) 
Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x23313bb9908>
5.2.2.4 条图(barplot)
In [27]:
sns.barplot(x='性别', y='身高', data=BSdata, ci=0, palette="Blues_d") 
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x23313c16898>
5.2.2.5 计数图(countplot)
In [28]:
sns.countplot(x='性别', hue="开设", data=BSdata) 
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x23313c5d940>
5.2.2.6 分组关系图(factorplot)
In [29]:
sns.factorplot(x='性别', col="开设", col_wrap=3, data=BSdata, kind="count", size=2.5, aspect=.8) 
Out[29]:
<seaborn.axisgrid.FacetGrid at 0x23313c55198>
5.2.2.7 概率分布图(distplot)
In [30]:
sns.distplot(BSdata['身高'], kde=True, bins=20, rug=True); 
D:\programfile\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
In [31]:
sns.jointplot(x='身高', y='体重', data=BSdata); 
D:\programfile\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg.
  warnings.warn("The 'normed' kwarg is deprecated, and has been "
In [32]:
#针对多个变量
sns.pairplot(BSdata[['身高','体重','支出']]);

5.3 ggplot 绘图系统

In [33]:
#在系统上安装plotnine包
!pip install plotnine
Collecting plotnine
  Downloading plotnine-0.7.1-py3-none-any.whl (4.4 MB)
ERROR: Exception:
Traceback (most recent call last):
  File "d:\programfile\lib\site-packages\pip\_vendor\urllib3\response.py", line 425, in _error_catcher
    yield
  File "d:\programfile\lib\site-packages\pip\_vendor\urllib3\response.py", line 507, in read
    data = self._fp.read(amt) if not fp_closed else b""
  File "d:\programfile\lib\site-packages\pip\_vendor\cachecontrol\filewrapper.py", line 62, in read
    data = self.__fp.read(amt)
  File "d:\programfile\lib\http\client.py", line 449, in read
    n = self.readinto(b)
  File "d:\programfile\lib\http\client.py", line 493, in readinto
    n = self.fp.readinto(b)
  File "d:\programfile\lib\socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "d:\programfile\lib\ssl.py", line 1009, in recv_into
    return self.read(nbytes, buffer)
  File "d:\programfile\lib\ssl.py", line 871, in read
    return self._sslobj.read(len, buffer)
  File "d:\programfile\lib\ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\programfile\lib\site-packages\pip\_internal\cli\base_command.py", line 186, in _main
    status = self.run(options, args)
  File "d:\programfile\lib\site-packages\pip\_internal\commands\install.py", line 331, in run
    resolver.resolve(requirement_set)
  File "d:\programfile\lib\site-packages\pip\_internal\legacy_resolve.py", line 177, in resolve
    discovered_reqs.extend(self._resolve_one(requirement_set, req))
  File "d:\programfile\lib\site-packages\pip\_internal\legacy_resolve.py", line 333, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "d:\programfile\lib\site-packages\pip\_internal\legacy_resolve.py", line 282, in _get_abstract_dist_for
    abstract_dist = self.preparer.prepare_linked_requirement(req)
  File "d:\programfile\lib\site-packages\pip\_internal\operations\prepare.py", line 482, in prepare_linked_requirement
    hashes=hashes,
  File "d:\programfile\lib\site-packages\pip\_internal\operations\prepare.py", line 287, in unpack_url
    hashes=hashes,
  File "d:\programfile\lib\site-packages\pip\_internal\operations\prepare.py", line 159, in unpack_http_url
    link, downloader, temp_dir.path, hashes
  File "d:\programfile\lib\site-packages\pip\_internal\operations\prepare.py", line 303, in _download_http_url
    for chunk in download.chunks:
  File "d:\programfile\lib\site-packages\pip\_internal\utils\ui.py", line 160, in iter
    for x in it:
  File "d:\programfile\lib\site-packages\pip\_internal\network\utils.py", line 39, in response_chunks
    decode_content=False,
  File "d:\programfile\lib\site-packages\pip\_vendor\urllib3\response.py", line 564, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "d:\programfile\lib\site-packages\pip\_vendor\urllib3\response.py", line 529, in read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
  File "d:\programfile\lib\contextlib.py", line 99, in __exit__
    self.gen.throw(type, value, traceback)
  File "d:\programfile\lib\site-packages\pip\_vendor\urllib3\response.py", line 430, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
WARNING: You are using pip version 20.0.2; however, version 20.2.3 is available.
You should consider upgrading via the 'd:\programfile\python.exe -m pip install --upgrade pip' command.
In [34]:
from plotnine import *    #加载和调用ggplot所有方法
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-34-57207855e582> in <module>()
----> 1 from plotnine import *    #加载和调用ggplot所有方法

ModuleNotFoundError: No module named 'plotnine'

5.3.1 qplot 快速制图

5.3.1.1 直方图
In [ ]:
qplot(x='身高',data=BSdata, geom='histogram')+ theme_grey(base_family = 'KaiTi')
5.3.1.2 条形图
In [ ]:
qplot('开设',data=BSdata, geom='bar')+ theme_grey(base_family = 'KaiTi')
5.3.1.3 散点图
In [ ]:
qplot('身高', '体重', data=BSdata, color='性别') + theme_grey(base_family = 'KaiTi')

5.3.2 ggplot 基本绘图

5.3.2.2 图层概念
In [ ]:
#绘制直角坐标系
GP=ggplot(aes(x='身高',y='体重'),data=BSdata)+ theme_grey(base_family = 'KaiTi');GP 
In [ ]:
#增加点图 
GP + geom_point() + theme_grey(base_family = 'KaiTi')
In [ ]:
#增加线图
GP + geom_line()+ theme_grey(base_family = 'KaiTi')
5.3.2.3 常见图形

(1)直方图

In [ ]:
ggplot(BSdata,aes(x='身高'))+ geom_histogram()+ theme_grey(base_family = 'KaiTi')

(2)散点图

In [ ]:
ggplot(BSdata,aes(x='身高',y='体重')) + geom_point() + theme_grey(base_family = 'KaiTi')
In [ ]:
#不同类型画不同记号(shape)/颜色(color)
ggplot(BSdata,aes(x='身高',y='体重',color='性别'))+geom_point()+ theme_grey(base_family = 'KaiTi')

(3)线图

In [ ]:
ggplot(BSdata,aes(x='支出'))+geom_line(aes(y='身高')) + theme_grey(base_family = 'KaiTi')
In [ ]:
#共用一个坐标,绘制不同的 y 值
ggplot(BSdata,aes(x='支出'))+geom_line(aes(y='身高')) +geom_line(aes(y='体重'))+ theme_grey(base_family = 'KaiTi')

(4)分面图

In [ ]:
ggplot(BSdata,aes(x='身高',y='体重'))+geom_point()+facet_wrap('性别') + theme_grey(base_family = 'KaiTi')
5.3.2.4 图形主题
In [ ]:
ggplot(BSdata,aes(x='身高',y='体重',color='性别'))+geom_point()+theme_bw()+ theme_grey(base_family = 'KaiTi')