为满足财经高校师生对数据科学技术强烈的学习需求和图书馆空间改造的需要，中国财经教育资源共享联盟计划开展“数据科学空间”项目，本次财经数据科学实战训练营是项目建设内容的重要组成部分。本次训练营的录像以及其他课程相关文件，都会在财经慕课平台发布 https://www.cjmooc.com.cn

运行过程中，如果发现某个包未安装，可以新建单元格，输入以下内容，然后重启环境即可。
- !pip install 包名 --user -i https://pypi.tuna.tsinghua.edu.cn/simple
相关数据已部署在实操平台上，就在本实验文件相同目录下。可以直接通过数据名称获取。

Pyecharts 基本介绍¶

概述¶

Echarts 是一个由百度开源的数据可视化，凭借着良好的交互性，精巧的图表设计，得到了众多开发者的认可。而 Python 是一门富有表达力的语言，很适合用于数据处理。当数据分析遇上数据可视化时，pyecharts 诞生了。

特性
- 简洁的 API 设计，使用如丝滑般流畅，支持链式调用
- 囊括了 30+ 种常见图表，应有尽有
- 支持主流 Notebook 环境，Jupyter Notebook 和 JupyterLab
- 可轻松集成至 Flask，Django 等主流 Web 框架
- 高度灵活的配置项，可轻松搭配出精美的图表
- 详细的文档和示例，帮助开发者更快的上手项目
- 多达 400+ 地图文件以及原生的百度地图，为地理数据可视化提供强有力的支持

安装¶

注：pyecharts.org 官网不做版本管理，所看到的当前文档为最新版文档，若文档与您使用的版本出现不一致情况，请及时更新 pyecharts。

可用以下代码查看当前版本。

import pyecharts

print(pyecharts.__version__)

1.9.0

安装方式：
- pip install pyecharts
如果安装速度慢，可以考虑换源安装
- pip install pyecharts -i https://pypi.tuna.tsinghua.edu.cn/simple
版本更新
- pip install --upgrade pyecharts
同理，速度慢的话可以考虑换源更新
- pip install --upgrade pyecharts -i https://pypi.tuna.tsinghua.edu.cn/simple

全局配置项¶

全局配置项可通过 set_global_options 方法设置

Pyecharts 应用实例¶

柱状图¶

from pyecharts.charts import Bar,Map #图表对象导入
from pyecharts import options as opts #配置项
import pandas as pd
import re
import random

bar = Bar()
bar.add_xaxis(["衬衫", "羊毛衫", "雪纺衫", "裤子", "高跟鞋", "袜子"])

bar.add_yaxis("商家A", [5, 20, 36, 10, 75, 90])
bar.add_yaxis("商家B", [10, 20, 36, 10, 75, 90])

bar.set_series_opts(
        label_opts=opts.LabelOpts(is_show=True),
        )

bar.set_global_opts(
            title_opts=opts.TitleOpts(title="商品售卖情况"),toolbox_opts=opts.ToolboxOpts()
    )

# render 会生成本地 HTML 文件，默认会在当前目录生成 render.html 文件
# 也可以传入路径参数，如 bar.render("mycharts.html")
# bar.render()
bar.render_notebook() # 直接在notebook上呈现图表

使用链式调用的方式生成图形

c = (
    Bar()
    .add_xaxis(["衬衫", "羊毛衫", "雪纺衫", "裤子", "高跟鞋", "袜子"])
    .add_yaxis("商家A", [5, 20, 36, 10, 75, 90])
    .add_yaxis("商家B", [10, 20, 36, 10, 75, 90])
    .set_series_opts(
        label_opts=opts.LabelOpts(is_show=True),
        )
    .set_global_opts(
            title_opts=opts.TitleOpts(title="商品售卖情况"),toolbox_opts=opts.ToolboxOpts()
    )
)

c.render_notebook()

散点图¶

import pyecharts.options as opts
from pyecharts.charts import Scatter


data = [
    [10.0, 8.04],
    [8.0, 6.95],
    [13.0, 7.58],
    [9.0, 8.81],
    [11.0, 8.33],
    [14.0, 9.96],
    [6.0, 7.24],
    [4.0, 4.26],
    [12.0, 10.84],
    [7.0, 4.82],
    [5.0, 5.68],
]
# data.sort(key=lambda x: x[0])
x_data = [d[0] for d in data]
y_data = [d[1] for d in data]

c = (
    Scatter()
    .add_xaxis(xaxis_data=x_data)
    .add_yaxis(
        series_name="1",
        y_axis=y_data,
        color='yellow',
        symbol_size=20,
    )
    .add_yaxis(
        series_name="2",
        y_axis=y_data,
        color='red',
        symbol_size=20,
    )
    .set_global_opts(
        xaxis_opts=opts.AxisOpts(
            type_="value", splitline_opts=opts.SplitLineOpts(is_show=True)
        ),
        yaxis_opts=opts.AxisOpts(
            type_="value",
            axistick_opts=opts.AxisTickOpts(is_show=True),
            splitline_opts=opts.SplitLineOpts(is_show=True),
        ),
    )

)
c.render_notebook()

灵活运用视觉映射配置项 visualmap

from pyecharts import options as opts
from pyecharts.charts import Scatter
from pyecharts.faker import Faker # 利用Faker内置的数据进行实验，输入 help(Faker) 即可了解详情。

c = (
    Scatter()
    .add_xaxis(Faker.choose())
    .add_yaxis("商家A", Faker.values())
    .add_yaxis("商家B", Faker.values())
    .set_global_opts(
        title_opts=opts.TitleOpts(title="Scatter-VisualMap(Size)"),
        visualmap_opts=opts.VisualMapOpts(type_="size", max_=150, min_=20),
    )
)
c.render_notebook()

折线图¶

from pyecharts.charts import Line

c = (
    Line()
    .add_xaxis(Faker.choose())
    .add_yaxis('line1',Faker.values())
    .add_yaxis('line2',Faker.values())
    .set_global_opts(
        title_opts=opts.TitleOpts(title='折线图')
    )
)
c.render_notebook()

3D图形¶

import random
from pyecharts.charts import Bar3D


data = [(i, j, random.randint(0, 12)) for i in range(6) for j in range(24)]
data

[(0, 0, 2),
 (0, 1, 0),
 (0, 2, 12),
 (0, 3, 11),
 (0, 4, 0),
 (0, 5, 8),
 (0, 6, 10),
 (0, 7, 1),
 (0, 8, 0),
 (0, 9, 4),
 (0, 10, 8),
 (0, 11, 4),
 (0, 12, 0),
 (0, 13, 6),
 (0, 14, 7),
 (0, 15, 9),
 (0, 16, 10),
 (0, 17, 6),
 (0, 18, 4),
 (0, 19, 6),
 (0, 20, 4),
 (0, 21, 8),
 (0, 22, 6),
 (0, 23, 3),
 (1, 0, 4),
 (1, 1, 9),
 (1, 2, 12),
 (1, 3, 3),
 (1, 4, 4),
 (1, 5, 2),
 (1, 6, 0),
 (1, 7, 9),
 (1, 8, 1),
 (1, 9, 3),
 (1, 10, 2),
 (1, 11, 5),
 (1, 12, 11),
 (1, 13, 9),
 (1, 14, 5),
 (1, 15, 1),
 (1, 16, 4),
 (1, 17, 5),
 (1, 18, 12),
 (1, 19, 7),
 (1, 20, 8),
 (1, 21, 5),
 (1, 22, 9),
 (1, 23, 2),
 (2, 0, 8),
 (2, 1, 9),
 (2, 2, 2),
 (2, 3, 4),
 (2, 4, 11),
 (2, 5, 5),
 (2, 6, 8),
 (2, 7, 5),
 (2, 8, 11),
 (2, 9, 12),
 (2, 10, 0),
 (2, 11, 11),
 (2, 12, 8),
 (2, 13, 2),
 (2, 14, 5),
 (2, 15, 6),
 (2, 16, 7),
 (2, 17, 5),
 (2, 18, 8),
 (2, 19, 6),
 (2, 20, 1),
 (2, 21, 10),
 (2, 22, 8),
 (2, 23, 10),
 (3, 0, 8),
 (3, 1, 5),
 (3, 2, 2),
 (3, 3, 6),
 (3, 4, 5),
 (3, 5, 9),
 (3, 6, 2),
 (3, 7, 11),
 (3, 8, 4),
 (3, 9, 11),
 (3, 10, 6),
 (3, 11, 11),
 (3, 12, 4),
 (3, 13, 10),
 (3, 14, 10),
 (3, 15, 8),
 (3, 16, 0),
 (3, 17, 6),
 (3, 18, 12),
 (3, 19, 12),
 (3, 20, 5),
 (3, 21, 9),
 (3, 22, 2),
 (3, 23, 12),
 (4, 0, 0),
 (4, 1, 8),
 (4, 2, 10),
 (4, 3, 8),
 (4, 4, 0),
 (4, 5, 11),
 (4, 6, 10),
 (4, 7, 3),
 (4, 8, 11),
 (4, 9, 10),
 (4, 10, 6),
 (4, 11, 2),
 (4, 12, 5),
 (4, 13, 5),
 (4, 14, 1),
 (4, 15, 0),
 (4, 16, 0),
 (4, 17, 9),
 (4, 18, 3),
 (4, 19, 12),
 (4, 20, 5),
 (4, 21, 3),
 (4, 22, 2),
 (4, 23, 11),
 (5, 0, 12),
 (5, 1, 3),
 (5, 2, 10),
 (5, 3, 6),
 (5, 4, 8),
 (5, 5, 1),
 (5, 6, 0),
 (5, 7, 12),
 (5, 8, 3),
 (5, 9, 8),
 (5, 10, 9),
 (5, 11, 9),
 (5, 12, 8),
 (5, 13, 12),
 (5, 14, 1),
 (5, 15, 4),
 (5, 16, 0),
 (5, 17, 11),
 (5, 18, 3),
 (5, 19, 12),
 (5, 20, 4),
 (5, 21, 12),
 (5, 22, 6),
 (5, 23, 1)]

c = (
    Bar3D()
    .add(
        "",
        [[d[1], d[0], d[2]] for d in data],
        xaxis3d_opts=opts.Axis3DOpts(Faker.clock, type_="category"),
        yaxis3d_opts=opts.Axis3DOpts(Faker.week_en, type_="category"),
        zaxis3d_opts=opts.Axis3DOpts(type_="value"),
    )
    .set_global_opts(
        visualmap_opts=opts.VisualMapOpts(max_=20),
        title_opts=opts.TitleOpts(title="3D图形示例"),
    )
)
c.render_notebook()

疫情地图¶

df = pd.read_csv('data.csv')
df

attr = [re.sub('省|市|自治区|维吾尔|壮族|回族','',x) for x in  list(df['provinceName'])]
value = df['province_confirmedCount']

sequence = list(zip(attr,value))
def map_visualmap(sequence,year) -> Map:
    c = (
        Map()
        .add(year,sequence, "china",)
        .set_global_opts(
            title_opts=opts.TitleOpts(title="全国疫情确诊情况"),
            visualmap_opts=opts.VisualMapOpts(max_=80000,
                                              is_piecewise=True,
                                              pieces=[{'min':1,'max':9},
                                                     {'min':10,'max':99},
                                                     {'min':100,'max':499},
                                                     {'min':500,'max':999},
                                                     {'min':1000,'max':10000},
                                                     {'min':10000,'color':'red'}])
        )
    )
    return c

c = map_visualmap(sequence,'全国疫情确诊情况')
c.render_notebook()

国社科知识图谱¶

利用词语共现挖掘关键词之间的关系

df2 = pd.read_csv('data2.csv')
df2.head()

import jieba # 这个包用来进行分词
import numpy as np

定义绘制图谱的函数

此函数用于获取图谱节点数据，返回图谱节点信息

def get_nodes(items):
    texts = [] # 存放所有切割好的字段
    for item in items:
        temp = [ x for x in jieba.cut(item,cut_all=True) if len(x)>1 and not_num(x)]
        texts+=temp
        
    mset = set(texts)
    keys = []
    values = []
    
    stop_list = ['研究'] # 停用词表
    
    for item in mset:
        if texts.count(item) > 5 and item not in stop_list: # 达到一定阈值，去停用词
            keys.append(item)
            values.append(texts.count(item))
    
    return zip(keys,values)

此函数用于获取节点之间的联系信息

def get_links(key1,key2,items): 
    '''两节点之间的连线数量'''
    counts = 0
#     print('{} {} {}'.format(key1,key2,items))
    for item in items:
        if key1 in item and key2 in item:
            counts = counts + 1
    return counts

此函数用于绘制图谱

def draw_graph(items,datas):
    '''绘制知识图谱'''
    def link_color_map(x): #
        if x < 5:
            return '#DAA520'
        if x < 10:
            return '#40E0D0'
        if x < 15:
            return '#FFD700'
        else:
            return '#FF7F50'
    
    def node_color_map(x):
        if x < 5:
            return '#000000'
        if x < 10:
            return '#00FFFF'
        if x < 15:
            return '#4169E1'
        if x < 20:
            return '#F08080'
        if x < 25:
            return '#FFFF00'
        if x < 30:
            return '#00FF00'
        if x < 35:
            return '#FF00FF'
        return '#FF1493'
    
    nodes_data = [{'name':x[0],'symbolSize':x[1],'itemStyle' :{'normal':{'color':node_color_map(x[1])}}} for x in datas]
    #nodes_data = [opts.GraphNode(name=x[0],symbol_size=x[1],item_style =opts.ItemStyleOpts(color=node_color_map(x[1]))) for x in datas]
    keys = []
    for x in datas:
        keys.append(x[0])
    links_data = []
    for i in range(len(keys)-1):
        for j in range(i+1,len(keys)):
            thecount = get_links(keys[i],keys[j],items)
            if thecount == 0:
                continue
            links_data.append(opts.GraphLink(source=keys[i],target=keys[j],linestyle_opts = opts.LineStyleOpts(curve=0.3,width=thecount*0.2,color=link_color_map(thecount))))
    c = (
        Graph()
        .add(
            '',
            nodes_data,
            links_data,
            repulsion=4000,
        )
    )
    return c

标题数据清洗

# 过滤标题中的数字
def not_num(x):
    try:
        x = int(x)
    except:
        return True
    return False

数据集中包含上海财经大学、中央财经大学、中南财经政法大学、西南财经大学、北京大学、清华大学六个院校数据。

可以代入到下方代码中查看该院校相关情况。

from pyecharts.charts import Graph
temp_df = df2[df2['工作单位']=='上海财经大学']
node_value = get_nodes(temp_df['项目名称'])
map = draw_graph(temp_df['项目名称'],list(node_value))
map.render_notebook() #图形渲染可能需要一段时间，请耐心等待

Building prefix dict from the default dictionary ...
Dumping model to file cache C:\Users\大川\AppData\Local\Temp\jieba.cache
Loading model cost 0.707 seconds.
Prefix dict has been built successfully.

	Unnamed: 0	provinceName	province_confirmedCount
0	0	澳门	2
1	1	西藏自治区	1
2	2	青海省	18
3	3	海南省	168
4	4	天津市	136
5	5	贵州省	146
6	6	江西省	935
7	7	宁夏回族自治区	74
8	8	云南省	174
9	9	北京市	414
10	10	陕西省	245
11	11	浙江省	1206
12	12	福建省	296
13	13	甘肃省	91
14	14	安徽省	990
15	15	吉林省	93
16	16	新疆维吾尔自治区	76
17	17	重庆市	576
18	18	山西省	133
19	19	河北省	318
20	20	黑龙江省	480
21	21	辽宁省	125
22	22	广西壮族自治区	252
23	23	湖北省	67217
24	24	山东省	758
25	25	四川省	538
26	26	内蒙古自治区	75
27	27	广东省	1350
28	28	河南省	1272
29	29	湖南省	1018
30	30	江苏省	631
31	31	上海市	338

	Unnamed: 0	Unnamed: 0.1	项目批准号	项目名称	工作单位
0	67	67	18ZDA069	全球价值链背景下中美新型大国贸易关系与贸易利益研究	上海财经大学
1	86	86	18ZDA088	大数据背景下医患关系的分析与政策研究	上海财经大学
2	880	880	18BFX099	新经济背景下融资犯罪的异化与治理研究	上海财经大学
3	954	954	18BFX173	城市生活垃圾分类的法律规制研究	上海财经大学
4	988	988	18BFX207	“逆全球化”风潮下国际贸易法治的困境、出路及中国的选择研究	上海财经大学

Table of Contents