第六节:pandas基础

  pandas 是基于 Numpy 构建的,让以 Numpy 为中心的应用变得更加简单。pandas是公认的数据处理利器,本章内容主要介绍DataFrame数据结构,在此基础上进行数据处理。除了DataFrame格式,pandas 还包括series、Panel。

格式 数组 释义
Series 一维数组 与Numpy中的一维array类似。
DataFrame 二维的表格型数据结构 可以将DataFrame理解为Series的容器
Panel 三维的数组 可以理解为DataFrame的容器

  开始之前,我们首先掌握导入pandas库,方式如下:

In [1]:
import pandas as pd #pd是pandas的别名

  注意:以下内容必须在导入pandas库之后才能运行。

一、Series和DataFrame介绍

   1.Series
   由一组数据和与之相关的索引组成。可通过传递一个list对象来创建一个Series,pandas会默认创建整型索引。
   创建一个Series:

In [2]:
a=[1,3,5,7,6,8]
a[2]
Out[2]:
5
In [3]:
s = pd.Series([1,3,5,7,6,8])
print(s)
0    1
1    3
2    5
3    7
4    6
5    8
dtype: int64
In [4]:
s.plot()
Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb2e05c9b0>
In [5]:
s.plot.bar()
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb2bd07ef0>
In [6]:
s.plot.pie()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb2bd444e0>
In [7]:
s.hist()
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb2b5e76d8>

  获取 Series 的索引:

In [8]:
s.index.values
Out[8]:
array([0, 1, 2, 3, 4, 5])
In [9]:
s=pd.Series([1,3,5,7,6,8],index=['a','b','c','d','e','f'])
print(s)
a    1
b    3
c    5
d    7
e    6
f    8
dtype: int64
In [10]:
s['f']
Out[10]:
8
In [11]:
t=pd.Series([10,11,12],index=['c','b','a'])
print(t)
c    10
b    11
a    12
dtype: int64
In [12]:
[1,2,3]+[4,5,6]
Out[12]:
[1, 2, 3, 4, 5, 6]
In [20]:
t+s#Series同index的元素值相加
Out[20]:
a    13.0
b    14.0
c    15.0
d     NaN
e     NaN
f     NaN
dtype: float64
In [19]:
s=pd.Series([1,3,5,7,6,8],index=['a','b','c','d','e','f'])
print(s)
t=pd.Series([10,11,12],index=['c','b','a'])
print(t)
a    1
b    3
c    5
d    7
e    6
f    8
dtype: int64
c    10
b    11
a    12
dtype: int64

  2.DataFrame
  DataFrame是一个表格型的数据结构,它含有一组有序的列,每一列的数据结构都是相同的,而不同的列之间则可以是不同的数据结构。DataFrame中的每一行是一个记录,名称为Index的一个元素,而每一列则为一个字段,是这个记录的一个属性,DataFrame既有行索引也有列索引。
  创建DataFrame
  首先来看如何从字典创建DataFrame。

In [22]:
d = {'one': [1, 2, 3], 'two': [1, 2, 3]}
df = pd.DataFrame(d)
print(df)
   one  two
0    1    1
1    2    2
2    3    3
In [23]:
d = {'one': [1, 2, 3], 'two': [1, 2, 3]}
df = pd.DataFrame(d,index=['a', 'b', 'c'])
print(df)
   one  two
a    1    1
b    2    2
c    3    3
In [ ]:
      A B C
one   2 9 4
two   7 5 3
three 6 1 8
In [24]:
d = {'A':[2,7,6],'B':[9,5,1],'C':[4,3,8]}
df=pd.DataFrame(d,index=['one','two','three'])
print(df)
       A  B  C
one    2  9  4
two    7  5  3
three  6  1  8

  可以使用dataframe.index和dataframe.columns来查看DataFrame的行和列,dataframe.values则以数组的形式返回DataFrame的元素:

In [25]:
print(df.index) #查看行
print(df.columns) #查看列
print(df.values) #查看元素
Index(['one', 'two', 'three'], dtype='object')
Index(['A', 'B', 'C'], dtype='object')
[[2 9 4]
 [7 5 3]
 [6 1 8]]

  DataFrame从值是数组的字典创建时,其各个数组的长度需要相同,加强印象,可参考以下报错的例子。

In [26]:
d = {'one': [1, 2], 'two': [1, 2, 3]}
df = pd.DataFrame(d,index=['a', 'b', 'c'])
print(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_arrays(arrays, names, axes)
   1692     try:
-> 1693         blocks = form_blocks(arrays, names, axes)
   1694         mgr = BlockManager(blocks, axes)

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in form_blocks(arrays, names, axes)
   1762     if len(items_dict["IntBlock"]):
-> 1763         int_blocks = _multi_blockify(items_dict["IntBlock"])
   1764         blocks.extend(int_blocks)

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in _multi_blockify(tuples, dtype)
   1844 
-> 1845         values, placement = _stack_arrays(list(tup_block), dtype)
   1846 

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in _stack_arrays(tuples, dtype)
   1874     for i, arr in enumerate(arrays):
-> 1875         stacked[i] = _asarray_compat(arr)
   1876 

ValueError: could not broadcast input array from shape (3) into shape (2)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-26-6714bd1fff47> in <module>
      1 d = {'one': [1, 2], 'two': [1, 2, 3]}
----> 2 df = pd.DataFrame(d,index=['a', 'b', 'c'])
      3 print(df)

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    408             )
    409         elif isinstance(data, dict):
--> 410             mgr = init_dict(data, index, columns, dtype=dtype)
    411         elif isinstance(data, ma.MaskedArray):
    412             import numpy.ma.mrecords as mrecords

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
    255             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
    256         ]
--> 257     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    258 
    259 

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
     85     axes = [ensure_index(columns), index]
     86 
---> 87     return create_block_manager_from_arrays(arrays, arr_names, axes)
     88 
     89 

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in create_block_manager_from_arrays(arrays, names, axes)
   1696         return mgr
   1697     except ValueError as e:
-> 1698         construction_error(len(arrays), arrays[0].shape, axes, e)
   1699 
   1700 

/opt/conda/lib/python3.6/site-packages/pandas/core/internals/managers.py in construction_error(tot_items, block_shape, axes, e)
   1716         raise ValueError("Empty data passed with indices specified.")
   1717     raise ValueError(
-> 1718         "Shape of passed values is {0}, indices imply {1}".format(passed, implied)
   1719     )
   1720 

ValueError: Shape of passed values is (2, 2), indices imply (3, 2)

  如果DataFrame的值是非数组时,没有这一限制,且自动将缺失值补成NaN。如下示例

In [29]:
d= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6, 'c': 9}]
df = pd.DataFrame(d)
print(df)
     a  b    c
0  1.6  2  NaN
1  3.0  6  9.0
In [33]:
d= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6, 'c': 9}]
df = pd.DataFrame(d,index=['one','two'])
print(df)
       a  b    c
one  1.6  2  NaN
two  3.0  6  9.0
In [38]:
df['a'].plot()
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb29cdd978>
In [39]:
df.plot()
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb29c6dc18>

  在实际处理数据时,有时需要创建一个空的DataFrame,可以这么做:

In [31]:
df = pd.DataFrame()
print(df)
Empty DataFrame
Columns: []
Index: []

  另一种创建DataFrame的方法十分有用,那就是使用concat函数创建DataFrame,其主要是通过两个行或列相同的DataFrame链接成一个。

In [40]:
a= [{'a': 1.6, 'b': 2}, {'a': 3, 'b': 6}]
df1 = pd.DataFrame(a)
b= [{'a': 4, 'b': 5}]
df2 = pd.DataFrame(b)
df = pd.concat([df1, df2], axis=0)
print(df1)
print(df2)
print(df)
     a  b
0  1.6  2
1  3.0  6
   a  b
0  4  5
     a  b
0  1.6  2
1  3.0  6
0  4.0  5
In [41]:
df = pd.concat([df1, df2], axis=1)
print(df)
     a  b    a    b
0  1.6  2  4.0  5.0
1  3.0  6  NaN  NaN

  注意:concat函数内有axis参数,其中的axis=1表示按列进行合并,axis=0表示按行合并

二、数据查看

 先从简单的csv导入数据到DataFrame。

In [140]:
# 获取csv文件中的浦发银行的开盘价、最高价、最低价、收盘价,获取格式即为DataFrame
price=pd.read_csv('stock1-105.csv',encoding='gbk',usecols=['date','open','high','low','close'])
price=price.set_index('date')
price
Out[140]:
open high low close
date
2017-01-03 16.21 16.44 16.17 16.30
2017-01-04 16.29 16.35 16.18 16.33
2017-01-05 16.30 16.38 16.24 16.30
2017-01-06 16.30 16.30 16.13 16.18
2017-01-09 16.24 16.29 16.13 16.20
2017-01-10 16.18 16.24 16.14 16.19
2017-01-11 16.24 16.24 16.15 16.16
2017-01-12 16.18 16.20 16.11 16.12
2017-01-13 16.10 16.29 16.10 16.27
2017-01-16 16.23 16.60 16.10 16.56
2017-01-17 16.46 16.54 16.37 16.40
2017-01-18 16.42 16.55 16.36 16.48
2017-01-19 16.43 16.64 16.43 16.54
2017-01-20 16.58 16.66 16.50 16.60
2017-01-23 16.66 16.69 16.51 16.57
2017-01-24 16.58 16.70 16.58 16.69
2017-01-25 16.69 16.74 16.61 16.69
2017-01-26 16.69 16.84 16.61 16.74
2017-02-03 16.82 16.85 16.62 16.63
2017-02-06 16.75 16.78 16.66 16.66
2017-02-07 16.65 16.74 16.63 16.67
2017-02-08 16.63 16.69 16.54 16.67
2017-02-09 16.68 16.77 16.59 16.72
2017-02-10 16.76 16.84 16.70 16.78
2017-02-13 16.88 16.90 16.78 16.85
2017-02-14 16.87 16.88 16.74 16.75
2017-02-15 16.82 16.93 16.75 16.84
2017-02-16 16.88 16.91 16.76 16.78
2017-02-17 16.78 16.82 16.59 16.64
2017-02-20 16.62 16.98 16.62 16.91
... ... ... ... ...
2017-08-07 12.87 12.95 12.83 12.93
2017-08-08 12.91 12.93 12.83 12.87
2017-08-09 12.84 12.88 12.75 12.76
2017-08-10 12.74 12.88 12.70 12.79
2017-08-11 12.72 12.73 12.46 12.68
2017-08-14 12.62 12.63 12.52 12.56
2017-08-15 12.58 12.76 12.56 12.59
2017-08-16 12.56 12.56 12.47 12.49
2017-08-17 12.49 12.55 12.47 12.53
2017-08-18 12.50 12.57 12.46 12.51
2017-08-21 12.50 12.57 12.46 12.51
2017-08-22 12.51 12.53 12.41 12.43
2017-08-23 12.44 12.66 12.41 12.57
2017-08-24 12.58 12.66 12.45 12.47
2017-08-25 12.47 12.80 12.47 12.78
2017-08-28 12.83 12.96 12.75 12.89
2017-08-29 12.85 12.98 12.77 12.96
2017-08-30 12.91 12.98 12.82 12.87
2017-08-31 12.83 12.84 12.65 12.71
2017-09-01 12.68 12.87 12.68 12.77
2017-09-04 12.78 12.84 12.65 12.78
2017-09-05 12.78 13.09 12.78 13.03
2017-09-06 12.99 13.05 12.90 12.96
2017-09-07 12.94 13.02 12.82 12.85
2017-09-08 12.85 13.12 12.83 13.03
2017-09-11 13.15 13.15 12.95 12.99
2017-09-12 13.00 13.04 12.91 13.04
2017-09-13 13.01 13.01 12.86 12.87
2017-09-14 12.88 12.98 12.81 12.85
2017-09-15 12.82 12.86 12.75 12.83

174 rows × 4 columns

In [50]:
df=pd.read_csv('/resources/data/TRD_Dalyr.csv')
df
Out[50]:
Stkcd Trddt Opnprc Hiprc Loprc Clsprc Dnshrtrd Dnvaltrd Dsmvosd Dsmvtll Adjprcwd
0 600600 2019-01-02 34.91 35.68 34.12 34.50 2866826 9.993306e+07 24009019.79 24009019.79 57.993438
1 600600 2019-01-03 34.80 34.89 32.93 33.30 3137838 1.058104e+08 23173923.45 23173923.45 55.976275
2 600600 2019-01-04 32.19 33.93 32.19 33.40 2816908 9.389348e+07 23243514.81 23243514.81 56.144372
3 600600 2019-01-07 33.49 35.18 33.10 34.86 4165242 1.435954e+08 24259548.69 24259548.69 58.598587
4 600600 2019-01-08 34.72 34.72 33.20 33.49 4465646 1.502569e+08 23306147.03 23306147.03 56.295659
5 600600 2019-01-09 33.51 34.68 33.28 33.75 4591794 1.564405e+08 23487084.57 23487084.57 56.732711
6 600600 2019-01-10 33.73 33.73 32.87 33.18 4342328 1.436151e+08 23090413.81 23090413.81 55.774559
7 600600 2019-01-11 33.20 33.60 32.90 33.21 3027414 1.004160e+08 23111291.22 23111291.22 55.824988
8 600600 2019-01-14 33.26 33.28 32.65 33.09 2502931 8.233961e+07 23027781.59 23027781.59 55.623272
9 600600 2019-01-15 32.98 36.40 32.83 36.40 10810126 3.784860e+08 25331255.66 25331255.66 61.187280
10 600600 2019-01-16 36.00 36.87 35.86 36.10 6596757 2.388105e+08 25122481.57 25122481.57 60.682989
11 600600 2019-01-17 36.50 36.50 35.80 35.80 4330457 1.564373e+08 24913707.49 24913707.49 60.178698
12 600600 2019-01-18 36.00 36.47 35.76 36.19 4685634 1.695381e+08 25185113.80 25185113.80 60.834276
13 600600 2019-01-21 36.17 38.76 36.09 36.78 7439315 2.771725e+08 25595702.83 25595702.83 61.826048
14 600600 2019-01-22 36.50 36.74 35.80 36.17 2824918 1.021891e+08 25171195.53 25171195.53 60.800657
15 600600 2019-01-23 36.00 36.68 35.81 36.30 2863203 1.040513e+08 25261664.30 25261664.30 61.019183
16 600600 2019-01-24 36.22 36.81 36.04 36.58 2333696 8.502156e+07 25456520.11 25456520.11 61.489854
17 600600 2019-01-25 36.57 37.56 36.16 36.83 3176591 1.168643e+08 25630498.51 25630498.51 61.910096
18 600600 2019-01-28 37.00 37.94 36.60 36.96 4363808 1.625587e+08 25720967.28 25720967.28 62.128622
19 600600 2019-01-29 36.80 38.38 36.10 38.13 4956990 1.859601e+08 26535186.22 26535186.22 64.095356
20 600600 2019-01-30 37.54 38.68 37.16 37.19 3985083 1.509411e+08 25881027.42 25881027.42 62.515245
21 600600 2019-01-31 37.05 38.12 37.05 37.73 2556359 9.631578e+07 26256820.77 26256820.77 63.422968
22 600600 2019-02-01 37.77 38.20 37.50 38.10 3193602 1.211643e+08 26514308.81 26514308.81 64.044927
23 600600 2019-02-11 38.10 39.00 37.50 38.70 4417674 1.690389e+08 26931856.98 26931856.98 65.053509
24 600600 2019-02-12 38.67 38.70 38.10 38.45 4249343 1.627973e+08 26757878.57 26757878.57 64.633267
25 600600 2019-02-13 38.40 38.70 38.25 38.55 3804185 1.463872e+08 26827469.94 26827469.94 64.801364
26 600600 2019-02-14 38.48 40.24 38.38 39.46 4301795 1.699792e+08 27460751.33 27460751.33 66.331046
27 600600 2019-02-15 39.39 39.39 38.20 38.33 3445361 1.336287e+08 26674368.94 26674368.94 64.431551
28 600600 2019-02-18 38.38 39.51 38.35 39.06 3449336 1.341243e+08 27182385.88 27182385.88 65.658659
29 600600 2019-02-19 39.14 39.37 37.55 38.20 5970855 2.285474e+08 26583900.17 26583900.17 64.213025
... ... ... ... ... ... ... ... ... ... ... ...
404 600600 2020-08-31 95.44 99.50 87.36 87.36 16336429 1.485624e+09 60795013.58 61948165.58 149.643013
405 600600 2020-09-01 87.36 87.66 82.50 86.30 12650742 1.073644e+09 60057345.15 61196505.15 147.827290
406 600600 2020-09-02 85.28 86.63 83.75 84.45 7235267 6.149912e+08 58769904.96 59884644.96 144.658339
407 600600 2020-09-03 84.45 86.86 83.55 85.34 6967213 5.962448e+08 59389268.07 60515756.07 146.182861
408 600600 2020-09-04 83.50 83.56 79.61 81.50 9595196 7.745201e+08 56716959.79 57792759.79 139.605146
409 600600 2020-09-07 80.57 81.30 77.40 78.40 7119585 5.663550e+08 54559627.57 55594507.57 134.295012
410 600600 2020-09-08 78.88 79.29 74.55 77.01 8062892 6.141779e+08 53592307.65 54608839.65 131.914016
411 600600 2020-09-09 75.88 76.19 73.70 74.98 4712071 3.520896e+08 52179603.00 53169339.00 128.436734
412 600600 2020-09-10 76.00 76.99 74.77 75.24 5188079 3.926766e+08 52360540.54 53353708.54 128.882100
413 600600 2020-09-11 74.68 77.11 73.82 76.77 4146573 3.145263e+08 53425288.38 54438652.38 131.502908
414 600600 2020-09-14 77.50 80.08 76.88 77.95 7192602 5.670469e+08 54246466.45 55275406.45 133.524185
415 600600 2020-09-15 77.77 79.50 76.23 78.85 4127403 3.231186e+08 54872788.70 55913608.70 135.065837
416 600600 2020-09-16 78.65 80.17 77.67 78.65 3399733 2.676203e+08 54733605.98 55771785.98 134.723248
417 600600 2020-09-17 78.00 78.00 75.49 75.90 5078531 3.865731e+08 52819843.53 53821723.53 130.012645
418 600600 2020-09-18 75.77 76.21 74.26 76.21 4966692 3.739078e+08 53035576.75 54041548.75 130.543658
419 600600 2020-09-21 76.20 76.20 74.28 74.92 4789182 3.579874e+08 52137848.19 53126792.19 128.333957
420 600600 2020-09-22 74.00 74.99 73.50 74.30 3666905 2.729507e+08 51706381.74 52687141.74 127.271930
421 600600 2020-09-23 74.30 74.64 73.36 74.15 3852423 2.854876e+08 51601994.70 52580774.70 127.014988
422 600600 2020-09-24 74.15 74.35 71.40 72.41 5948893 4.317549e+08 50391105.01 51346917.01 124.034461
423 600600 2020-09-25 72.41 73.67 70.00 73.51 6215862 4.502747e+08 51156609.99 52126941.99 125.918702
424 600600 2020-09-28 73.00 73.50 71.17 71.49 5088792 3.657537e+08 49750864.48 50694532.48 122.458550
425 600600 2020-09-29 71.98 72.50 71.17 72.08 4640643 3.343638e+08 50161453.51 51112909.51 123.469188
426 600600 2020-09-30 72.50 76.08 72.21 75.11 8449161 6.319768e+08 52270071.77 53261523.77 128.659416
427 600600 2020-10-09 75.83 76.81 74.37 74.62 6983298 5.230091e+08 51929074.10 52914058.10 127.820072
428 600600 2020-10-12 74.50 75.19 73.17 75.10 8951159 6.663092e+08 52263112.64 53254432.64 128.642286
429 600600 2020-10-13 74.85 76.70 74.08 76.49 7075436 5.376659e+08 53230432.56 54240100.56 131.023282
430 600600 2020-10-14 75.39 76.45 72.96 74.15 9407050 6.974287e+08 51601994.70 52580774.70 127.014987
431 600600 2020-10-15 74.15 76.60 73.37 76.00 6741385 5.082867e+08 52889434.89 53892634.89 130.183938
432 600600 2020-10-16 75.44 76.88 75.02 75.40 4823905 3.662679e+08 52471886.72 53467166.72 129.156170
433 600600 2020-10-19 76.16 76.16 74.00 74.53 3630250 2.712269e+08 51866441.88 52850237.88 127.665907

434 rows × 11 columns

In [57]:
df=df.set_index('Trddt')
df
Out[57]:
Stkcd Opnprc Hiprc Loprc Clsprc Dnshrtrd Dnvaltrd Dsmvosd Dsmvtll Adjprcwd
Trddt
2019-01-02 600600 34.91 35.68 34.12 34.50 2866826 9.993306e+07 24009019.79 24009019.79 57.993438
2019-01-03 600600 34.80 34.89 32.93 33.30 3137838 1.058104e+08 23173923.45 23173923.45 55.976275
2019-01-04 600600 32.19 33.93 32.19 33.40 2816908 9.389348e+07 23243514.81 23243514.81 56.144372
2019-01-07 600600 33.49 35.18 33.10 34.86 4165242 1.435954e+08 24259548.69 24259548.69 58.598587
2019-01-08 600600 34.72 34.72 33.20 33.49 4465646 1.502569e+08 23306147.03 23306147.03 56.295659
2019-01-09 600600 33.51 34.68 33.28 33.75 4591794 1.564405e+08 23487084.57 23487084.57 56.732711
2019-01-10 600600 33.73 33.73 32.87 33.18 4342328 1.436151e+08 23090413.81 23090413.81 55.774559
2019-01-11 600600 33.20 33.60 32.90 33.21 3027414 1.004160e+08 23111291.22 23111291.22 55.824988
2019-01-14 600600 33.26 33.28 32.65 33.09 2502931 8.233961e+07 23027781.59 23027781.59 55.623272
2019-01-15 600600 32.98 36.40 32.83 36.40 10810126 3.784860e+08 25331255.66 25331255.66 61.187280
2019-01-16 600600 36.00 36.87 35.86 36.10 6596757 2.388105e+08 25122481.57 25122481.57 60.682989
2019-01-17 600600 36.50 36.50 35.80 35.80 4330457 1.564373e+08 24913707.49 24913707.49 60.178698
2019-01-18 600600 36.00 36.47 35.76 36.19 4685634 1.695381e+08 25185113.80 25185113.80 60.834276
2019-01-21 600600 36.17 38.76 36.09 36.78 7439315 2.771725e+08 25595702.83 25595702.83 61.826048
2019-01-22 600600 36.50 36.74 35.80 36.17 2824918 1.021891e+08 25171195.53 25171195.53 60.800657
2019-01-23 600600 36.00 36.68 35.81 36.30 2863203 1.040513e+08 25261664.30 25261664.30 61.019183
2019-01-24 600600 36.22 36.81 36.04 36.58 2333696 8.502156e+07 25456520.11 25456520.11 61.489854
2019-01-25 600600 36.57 37.56 36.16 36.83 3176591 1.168643e+08 25630498.51 25630498.51 61.910096
2019-01-28 600600 37.00 37.94 36.60 36.96 4363808 1.625587e+08 25720967.28 25720967.28 62.128622
2019-01-29 600600 36.80 38.38 36.10 38.13 4956990 1.859601e+08 26535186.22 26535186.22 64.095356
2019-01-30 600600 37.54 38.68 37.16 37.19 3985083 1.509411e+08 25881027.42 25881027.42 62.515245
2019-01-31 600600 37.05 38.12 37.05 37.73 2556359 9.631578e+07 26256820.77 26256820.77 63.422968
2019-02-01 600600 37.77 38.20 37.50 38.10 3193602 1.211643e+08 26514308.81 26514308.81 64.044927
2019-02-11 600600 38.10 39.00 37.50 38.70 4417674 1.690389e+08 26931856.98 26931856.98 65.053509
2019-02-12 600600 38.67 38.70 38.10 38.45 4249343 1.627973e+08 26757878.57 26757878.57 64.633267
2019-02-13 600600 38.40 38.70 38.25 38.55 3804185 1.463872e+08 26827469.94 26827469.94 64.801364
2019-02-14 600600 38.48 40.24 38.38 39.46 4301795 1.699792e+08 27460751.33 27460751.33 66.331046
2019-02-15 600600 39.39 39.39 38.20 38.33 3445361 1.336287e+08 26674368.94 26674368.94 64.431551
2019-02-18 600600 38.38 39.51 38.35 39.06 3449336 1.341243e+08 27182385.88 27182385.88 65.658659
2019-02-19 600600 39.14 39.37 37.55 38.20 5970855 2.285474e+08 26583900.17 26583900.17 64.213025
... ... ... ... ... ... ... ... ... ... ...
2020-08-31 600600 95.44 99.50 87.36 87.36 16336429 1.485624e+09 60795013.58 61948165.58 149.643013
2020-09-01 600600 87.36 87.66 82.50 86.30 12650742 1.073644e+09 60057345.15 61196505.15 147.827290
2020-09-02 600600 85.28 86.63 83.75 84.45 7235267 6.149912e+08 58769904.96 59884644.96 144.658339
2020-09-03 600600 84.45 86.86 83.55 85.34 6967213 5.962448e+08 59389268.07 60515756.07 146.182861
2020-09-04 600600 83.50 83.56 79.61 81.50 9595196 7.745201e+08 56716959.79 57792759.79 139.605146
2020-09-07 600600 80.57 81.30 77.40 78.40 7119585 5.663550e+08 54559627.57 55594507.57 134.295012
2020-09-08 600600 78.88 79.29 74.55 77.01 8062892 6.141779e+08 53592307.65 54608839.65 131.914016
2020-09-09 600600 75.88 76.19 73.70 74.98 4712071 3.520896e+08 52179603.00 53169339.00 128.436734
2020-09-10 600600 76.00 76.99 74.77 75.24 5188079 3.926766e+08 52360540.54 53353708.54 128.882100
2020-09-11 600600 74.68 77.11 73.82 76.77 4146573 3.145263e+08 53425288.38 54438652.38 131.502908
2020-09-14 600600 77.50 80.08 76.88 77.95 7192602 5.670469e+08 54246466.45 55275406.45 133.524185
2020-09-15 600600 77.77 79.50 76.23 78.85 4127403 3.231186e+08 54872788.70 55913608.70 135.065837
2020-09-16 600600 78.65 80.17 77.67 78.65 3399733 2.676203e+08 54733605.98 55771785.98 134.723248
2020-09-17 600600 78.00 78.00 75.49 75.90 5078531 3.865731e+08 52819843.53 53821723.53 130.012645
2020-09-18 600600 75.77 76.21 74.26 76.21 4966692 3.739078e+08 53035576.75 54041548.75 130.543658
2020-09-21 600600 76.20 76.20 74.28 74.92 4789182 3.579874e+08 52137848.19 53126792.19 128.333957
2020-09-22 600600 74.00 74.99 73.50 74.30 3666905 2.729507e+08 51706381.74 52687141.74 127.271930
2020-09-23 600600 74.30 74.64 73.36 74.15 3852423 2.854876e+08 51601994.70 52580774.70 127.014988
2020-09-24 600600 74.15 74.35 71.40 72.41 5948893 4.317549e+08 50391105.01 51346917.01 124.034461
2020-09-25 600600 72.41 73.67 70.00 73.51 6215862 4.502747e+08 51156609.99 52126941.99 125.918702
2020-09-28 600600 73.00 73.50 71.17 71.49 5088792 3.657537e+08 49750864.48 50694532.48 122.458550
2020-09-29 600600 71.98 72.50 71.17 72.08 4640643 3.343638e+08 50161453.51 51112909.51 123.469188
2020-09-30 600600 72.50 76.08 72.21 75.11 8449161 6.319768e+08 52270071.77 53261523.77 128.659416
2020-10-09 600600 75.83 76.81 74.37 74.62 6983298 5.230091e+08 51929074.10 52914058.10 127.820072
2020-10-12 600600 74.50 75.19 73.17 75.10 8951159 6.663092e+08 52263112.64 53254432.64 128.642286
2020-10-13 600600 74.85 76.70 74.08 76.49 7075436 5.376659e+08 53230432.56 54240100.56 131.023282
2020-10-14 600600 75.39 76.45 72.96 74.15 9407050 6.974287e+08 51601994.70 52580774.70 127.014987
2020-10-15 600600 74.15 76.60 73.37 76.00 6741385 5.082867e+08 52889434.89 53892634.89 130.183938
2020-10-16 600600 75.44 76.88 75.02 75.40 4823905 3.662679e+08 52471886.72 53467166.72 129.156170
2020-10-19 600600 76.16 76.16 74.00 74.53 3630250 2.712269e+08 51866441.88 52850237.88 127.665907

434 rows × 10 columns

In [58]:
df['Clsprc'].plot()
Out[58]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb2a162940>

取上证指数(000001.SH)从20160101到20201019的开盘价、最高价、最低价、收盘价

画出以上收盘价折线图

求所有天的日收益率,并画出直方图

In [76]:
df=pd.read_csv('/resources/data/IDX_Idxtrd.csv')
df.columns=['Code','Date','Open','High','Low','Close']
#df=df.rename(columns={'Idxtrd05':'Close'})
df=df.set_index('Date')
df
Out[76]:
Code Open High Low Close
Date
2016-01-04 1 3536.59 3538.69 3295.74 3296.26
2016-01-05 1 3196.65 3328.14 3189.61 3287.71
2016-01-06 1 3291.20 3362.97 3288.93 3361.84
2016-01-07 1 3309.66 3309.66 3115.89 3125.00
2016-01-08 1 3194.63 3235.45 3056.88 3186.41
2016-01-11 1 3131.85 3166.22 3016.70 3016.70
2016-01-12 1 3026.16 3047.66 2978.46 3022.86
2016-01-13 1 3041.11 3059.02 2949.29 2949.60
2016-01-14 1 2874.05 3012.29 2867.55 3007.65
2016-01-15 1 2988.05 3001.71 2883.87 2900.97
2016-01-18 1 2847.54 2945.45 2844.70 2913.84
2016-01-19 1 2914.41 3012.07 2906.40 3007.74
2016-01-20 1 2993.01 3016.28 2951.92 2976.69
2016-01-21 1 2934.39 2998.79 2880.09 2880.48
2016-01-22 1 2911.11 2931.36 2851.73 2916.56
2016-01-25 1 2934.08 2955.78 2911.83 2938.52
2016-01-26 1 2907.72 2911.99 2743.84 2749.79
2016-01-27 1 2756.08 2768.77 2638.30 2735.56
2016-01-28 1 2711.16 2740.54 2647.49 2655.66
2016-01-29 1 2652.85 2755.37 2649.79 2737.60
2016-02-01 1 2730.98 2735.26 2655.62 2688.85
2016-02-02 1 2687.98 2755.16 2687.98 2749.57
2016-02-03 1 2719.57 2746.07 2696.88 2739.25
2016-02-04 1 2751.43 2793.30 2751.31 2781.02
2016-02-05 1 2783.08 2790.06 2762.16 2763.49
2016-02-15 1 2684.96 2760.36 2682.09 2746.20
2016-02-16 1 2758.58 2840.62 2758.58 2836.57
2016-02-17 1 2829.76 2868.70 2824.36 2867.34
2016-02-18 1 2881.78 2893.22 2857.70 2862.89
2016-02-19 1 2854.90 2872.72 2840.49 2860.02
... ... ... ... ... ...
2020-08-28 1 3346.29 3405.88 3339.65 3403.81
2020-08-31 1 3416.55 3442.74 3395.47 3395.68
2020-09-01 1 3389.74 3410.61 3381.71 3410.61
2020-09-02 1 3420.47 3421.40 3377.21 3404.80
2020-09-03 1 3404.03 3425.63 3374.26 3384.98
2020-09-04 1 3336.41 3360.11 3328.55 3355.37
2020-09-07 1 3349.92 3368.25 3285.63 3292.59
2020-09-08 1 3301.22 3324.39 3276.44 3316.42
2020-09-09 1 3281.00 3289.62 3238.56 3254.63
2020-09-10 1 3282.23 3285.47 3227.64 3234.82
2020-09-11 1 3225.78 3262.50 3220.54 3260.35
2020-09-14 1 3275.92 3283.53 3261.15 3278.81
2020-09-15 1 3277.13 3297.69 3263.75 3295.68
2020-09-16 1 3293.17 3302.46 3271.08 3283.92
2020-09-17 1 3277.32 3290.41 3248.50 3270.44
2020-09-18 1 3270.91 3338.32 3268.53 3338.09
2020-09-21 1 3348.90 3350.59 3313.07 3316.94
2020-09-22 1 3290.67 3320.23 3265.70 3274.30
2020-09-23 1 3278.84 3289.76 3264.89 3279.71
2020-09-24 1 3262.88 3265.35 3221.24 3223.18
2020-09-25 1 3234.37 3239.47 3208.04 3219.42
2020-09-28 1 3224.98 3238.18 3210.89 3217.54
2020-09-29 1 3231.86 3242.79 3219.79 3224.36
2020-09-30 1 3232.71 3244.91 3202.34 3218.05
2020-10-09 1 3262.61 3280.51 3260.19 3272.08
2020-10-12 1 3287.33 3359.15 3286.11 3358.47
2020-10-13 1 3353.12 3361.83 3334.50 3359.75
2020-10-14 1 3353.63 3353.63 3332.96 3340.78
2020-10-15 1 3342.92 3354.58 3330.00 3332.18
2020-10-16 1 3334.46 3348.95 3319.11 3336.36

1164 rows × 5 columns

In [68]:
df['Close'].plot()
Out[68]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb29f5eef0>
In [73]:
import numpy as np
df['Return']=np.nan
df
Out[73]:
Code Open High Low Close Return
Date
2016-01-04 1 3536.59 3538.69 3295.74 3296.26 NaN
2016-01-05 1 3196.65 3328.14 3189.61 3287.71 NaN
2016-01-06 1 3291.20 3362.97 3288.93 3361.84 NaN
2016-01-07 1 3309.66 3309.66 3115.89 3125.00 NaN
2016-01-08 1 3194.63 3235.45 3056.88 3186.41 NaN
2016-01-11 1 3131.85 3166.22 3016.70 3016.70 NaN
2016-01-12 1 3026.16 3047.66 2978.46 3022.86 NaN
2016-01-13 1 3041.11 3059.02 2949.29 2949.60 NaN
2016-01-14 1 2874.05 3012.29 2867.55 3007.65 NaN
2016-01-15 1 2988.05 3001.71 2883.87 2900.97 NaN
2016-01-18 1 2847.54 2945.45 2844.70 2913.84 NaN
2016-01-19 1 2914.41 3012.07 2906.40 3007.74 NaN
2016-01-20 1 2993.01 3016.28 2951.92 2976.69 NaN
2016-01-21 1 2934.39 2998.79 2880.09 2880.48 NaN
2016-01-22 1 2911.11 2931.36 2851.73 2916.56 NaN
2016-01-25 1 2934.08 2955.78 2911.83 2938.52 NaN
2016-01-26 1 2907.72 2911.99 2743.84 2749.79 NaN
2016-01-27 1 2756.08 2768.77 2638.30 2735.56 NaN
2016-01-28 1 2711.16 2740.54 2647.49 2655.66 NaN
2016-01-29 1 2652.85 2755.37 2649.79 2737.60 NaN
2016-02-01 1 2730.98 2735.26 2655.62 2688.85 NaN
2016-02-02 1 2687.98 2755.16 2687.98 2749.57 NaN
2016-02-03 1 2719.57 2746.07 2696.88 2739.25 NaN
2016-02-04 1 2751.43 2793.30 2751.31 2781.02 NaN
2016-02-05 1 2783.08 2790.06 2762.16 2763.49 NaN
2016-02-15 1 2684.96 2760.36 2682.09 2746.20 NaN
2016-02-16 1 2758.58 2840.62 2758.58 2836.57 NaN
2016-02-17 1 2829.76 2868.70 2824.36 2867.34 NaN
2016-02-18 1 2881.78 2893.22 2857.70 2862.89 NaN
2016-02-19 1 2854.90 2872.72 2840.49 2860.02 NaN
... ... ... ... ... ... ...
2020-08-28 1 3346.29 3405.88 3339.65 3403.81 NaN
2020-08-31 1 3416.55 3442.74 3395.47 3395.68 NaN
2020-09-01 1 3389.74 3410.61 3381.71 3410.61 NaN
2020-09-02 1 3420.47 3421.40 3377.21 3404.80 NaN
2020-09-03 1 3404.03 3425.63 3374.26 3384.98 NaN
2020-09-04 1 3336.41 3360.11 3328.55 3355.37 NaN
2020-09-07 1 3349.92 3368.25 3285.63 3292.59 NaN
2020-09-08 1 3301.22 3324.39 3276.44 3316.42 NaN
2020-09-09 1 3281.00 3289.62 3238.56 3254.63 NaN
2020-09-10 1 3282.23 3285.47 3227.64 3234.82 NaN
2020-09-11 1 3225.78 3262.50 3220.54 3260.35 NaN
2020-09-14 1 3275.92 3283.53 3261.15 3278.81 NaN
2020-09-15 1 3277.13 3297.69 3263.75 3295.68 NaN
2020-09-16 1 3293.17 3302.46 3271.08 3283.92 NaN
2020-09-17 1 3277.32 3290.41 3248.50 3270.44 NaN
2020-09-18 1 3270.91 3338.32 3268.53 3338.09 NaN
2020-09-21 1 3348.90 3350.59 3313.07 3316.94 NaN
2020-09-22 1 3290.67 3320.23 3265.70 3274.30 NaN
2020-09-23 1 3278.84 3289.76 3264.89 3279.71 NaN
2020-09-24 1 3262.88 3265.35 3221.24 3223.18 NaN
2020-09-25 1 3234.37 3239.47 3208.04 3219.42 NaN
2020-09-28 1 3224.98 3238.18 3210.89 3217.54 NaN
2020-09-29 1 3231.86 3242.79 3219.79 3224.36 NaN
2020-09-30 1 3232.71 3244.91 3202.34 3218.05 NaN
2020-10-09 1 3262.61 3280.51 3260.19 3272.08 NaN
2020-10-12 1 3287.33 3359.15 3286.11 3358.47 NaN
2020-10-13 1 3353.12 3361.83 3334.50 3359.75 NaN
2020-10-14 1 3353.63 3353.63 3332.96 3340.78 NaN
2020-10-15 1 3342.92 3354.58 3330.00 3332.18 NaN
2020-10-16 1 3334.46 3348.95 3319.11 3336.36 NaN

1164 rows × 6 columns

In [74]:
for i in range(1,len(df)):
    df['Return'][i]=df['Close'][i]/df['Close'][i-1]-1
df
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
Out[74]:
Code Open High Low Close Return
Date
2016-01-04 1 3536.59 3538.69 3295.74 3296.26 NaN
2016-01-05 1 3196.65 3328.14 3189.61 3287.71 -0.002594
2016-01-06 1 3291.20 3362.97 3288.93 3361.84 0.022548
2016-01-07 1 3309.66 3309.66 3115.89 3125.00 -0.070450
2016-01-08 1 3194.63 3235.45 3056.88 3186.41 0.019651
2016-01-11 1 3131.85 3166.22 3016.70 3016.70 -0.053261
2016-01-12 1 3026.16 3047.66 2978.46 3022.86 0.002042
2016-01-13 1 3041.11 3059.02 2949.29 2949.60 -0.024235
2016-01-14 1 2874.05 3012.29 2867.55 3007.65 0.019681
2016-01-15 1 2988.05 3001.71 2883.87 2900.97 -0.035470
2016-01-18 1 2847.54 2945.45 2844.70 2913.84 0.004436
2016-01-19 1 2914.41 3012.07 2906.40 3007.74 0.032226
2016-01-20 1 2993.01 3016.28 2951.92 2976.69 -0.010323
2016-01-21 1 2934.39 2998.79 2880.09 2880.48 -0.032321
2016-01-22 1 2911.11 2931.36 2851.73 2916.56 0.012526
2016-01-25 1 2934.08 2955.78 2911.83 2938.52 0.007529
2016-01-26 1 2907.72 2911.99 2743.84 2749.79 -0.064226
2016-01-27 1 2756.08 2768.77 2638.30 2735.56 -0.005175
2016-01-28 1 2711.16 2740.54 2647.49 2655.66 -0.029208
2016-01-29 1 2652.85 2755.37 2649.79 2737.60 0.030855
2016-02-01 1 2730.98 2735.26 2655.62 2688.85 -0.017808
2016-02-02 1 2687.98 2755.16 2687.98 2749.57 0.022582
2016-02-03 1 2719.57 2746.07 2696.88 2739.25 -0.003753
2016-02-04 1 2751.43 2793.30 2751.31 2781.02 0.015249
2016-02-05 1 2783.08 2790.06 2762.16 2763.49 -0.006303
2016-02-15 1 2684.96 2760.36 2682.09 2746.20 -0.006257
2016-02-16 1 2758.58 2840.62 2758.58 2836.57 0.032907
2016-02-17 1 2829.76 2868.70 2824.36 2867.34 0.010848
2016-02-18 1 2881.78 2893.22 2857.70 2862.89 -0.001552
2016-02-19 1 2854.90 2872.72 2840.49 2860.02 -0.001002
... ... ... ... ... ... ...
2020-08-28 1 3346.29 3405.88 3339.65 3403.81 0.016029
2020-08-31 1 3416.55 3442.74 3395.47 3395.68 -0.002388
2020-09-01 1 3389.74 3410.61 3381.71 3410.61 0.004397
2020-09-02 1 3420.47 3421.40 3377.21 3404.80 -0.001704
2020-09-03 1 3404.03 3425.63 3374.26 3384.98 -0.005821
2020-09-04 1 3336.41 3360.11 3328.55 3355.37 -0.008747
2020-09-07 1 3349.92 3368.25 3285.63 3292.59 -0.018710
2020-09-08 1 3301.22 3324.39 3276.44 3316.42 0.007237
2020-09-09 1 3281.00 3289.62 3238.56 3254.63 -0.018632
2020-09-10 1 3282.23 3285.47 3227.64 3234.82 -0.006087
2020-09-11 1 3225.78 3262.50 3220.54 3260.35 0.007892
2020-09-14 1 3275.92 3283.53 3261.15 3278.81 0.005662
2020-09-15 1 3277.13 3297.69 3263.75 3295.68 0.005145
2020-09-16 1 3293.17 3302.46 3271.08 3283.92 -0.003568
2020-09-17 1 3277.32 3290.41 3248.50 3270.44 -0.004105
2020-09-18 1 3270.91 3338.32 3268.53 3338.09 0.020685
2020-09-21 1 3348.90 3350.59 3313.07 3316.94 -0.006336
2020-09-22 1 3290.67 3320.23 3265.70 3274.30 -0.012855
2020-09-23 1 3278.84 3289.76 3264.89 3279.71 0.001652
2020-09-24 1 3262.88 3265.35 3221.24 3223.18 -0.017236
2020-09-25 1 3234.37 3239.47 3208.04 3219.42 -0.001167
2020-09-28 1 3224.98 3238.18 3210.89 3217.54 -0.000584
2020-09-29 1 3231.86 3242.79 3219.79 3224.36 0.002120
2020-09-30 1 3232.71 3244.91 3202.34 3218.05 -0.001957
2020-10-09 1 3262.61 3280.51 3260.19 3272.08 0.016790
2020-10-12 1 3287.33 3359.15 3286.11 3358.47 0.026402
2020-10-13 1 3353.12 3361.83 3334.50 3359.75 0.000381
2020-10-14 1 3353.63 3353.63 3332.96 3340.78 -0.005646
2020-10-15 1 3342.92 3354.58 3330.00 3332.18 -0.002574
2020-10-16 1 3334.46 3348.95 3319.11 3336.36 0.001254

1164 rows × 6 columns

In [85]:
df['Return']=df['Close']/df['Close'].shift()-1
df['Return'].hist()
Out[85]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fcb29de95c0>

  以下为数据查看常用的八项操作:

  1.查看前几条数据:

In [88]:
price.head(10)
Out[88]:
open high low close
date
2017-01-03 16.21 16.44 16.17 16.30
2017-01-04 16.29 16.35 16.18 16.33
2017-01-05 16.30 16.38 16.24 16.30
2017-01-06 16.30 16.30 16.13 16.18
2017-01-09 16.24 16.29 16.13 16.20
2017-01-10 16.18 16.24 16.14 16.19
2017-01-11 16.24 16.24 16.15 16.16
2017-01-12 16.18 16.20 16.11 16.12
2017-01-13 16.10 16.29 16.10 16.27
2017-01-16 16.23 16.60 16.10 16.56

  2.查看后几条数据:

In [90]:
price.tail(10)
Out[90]:
open high low close
date
2017-09-04 12.78 12.84 12.65 12.78
2017-09-05 12.78 13.09 12.78 13.03
2017-09-06 12.99 13.05 12.90 12.96
2017-09-07 12.94 13.02 12.82 12.85
2017-09-08 12.85 13.12 12.83 13.03
2017-09-11 13.15 13.15 12.95 12.99
2017-09-12 13.00 13.04 12.91 13.04
2017-09-13 13.01 13.01 12.86 12.87
2017-09-14 12.88 12.98 12.81 12.85
2017-09-15 12.82 12.86 12.75 12.83

  3.查看 DataFrame 的索引

In [91]:
price.index
Out[91]:
Index(['2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06', '2017-01-09',
       '2017-01-10', '2017-01-11', '2017-01-12', '2017-01-13', '2017-01-16',
       ...
       '2017-09-04', '2017-09-05', '2017-09-06', '2017-09-07', '2017-09-08',
       '2017-09-11', '2017-09-12', '2017-09-13', '2017-09-14', '2017-09-15'],
      dtype='object', name='date', length=174)

  4.查看 DataFrame 的列名

In [92]:
price.columns
Out[92]:
Index(['open', 'high', 'low', 'close'], dtype='object')

  5.查看 DataFrame 的值

In [93]:
price.values
Out[93]:
array([[16.21, 16.44, 16.17, 16.3 ],
       [16.29, 16.35, 16.18, 16.33],
       [16.3 , 16.38, 16.24, 16.3 ],
       [16.3 , 16.3 , 16.13, 16.18],
       [16.24, 16.29, 16.13, 16.2 ],
       [16.18, 16.24, 16.14, 16.19],
       [16.24, 16.24, 16.15, 16.16],
       [16.18, 16.2 , 16.11, 16.12],
       [16.1 , 16.29, 16.1 , 16.27],
       [16.23, 16.6 , 16.1 , 16.56],
       [16.46, 16.54, 16.37, 16.4 ],
       [16.42, 16.55, 16.36, 16.48],
       [16.43, 16.64, 16.43, 16.54],
       [16.58, 16.66, 16.5 , 16.6 ],
       [16.66, 16.69, 16.51, 16.57],
       [16.58, 16.7 , 16.58, 16.69],
       [16.69, 16.74, 16.61, 16.69],
       [16.69, 16.84, 16.61, 16.74],
       [16.82, 16.85, 16.62, 16.63],
       [16.75, 16.78, 16.66, 16.66],
       [16.65, 16.74, 16.63, 16.67],
       [16.63, 16.69, 16.54, 16.67],
       [16.68, 16.77, 16.59, 16.72],
       [16.76, 16.84, 16.7 , 16.78],
       [16.88, 16.9 , 16.78, 16.85],
       [16.87, 16.88, 16.74, 16.75],
       [16.82, 16.93, 16.75, 16.84],
       [16.88, 16.91, 16.76, 16.78],
       [16.78, 16.82, 16.59, 16.64],
       [16.62, 16.98, 16.62, 16.91],
       [16.88, 17.08, 16.81, 16.88],
       [16.88, 16.9 , 16.73, 16.75],
       [16.78, 16.83, 16.64, 16.69],
       [16.67, 16.75, 16.66, 16.71],
       [16.69, 16.72, 16.53, 16.59],
       [16.58, 16.67, 16.53, 16.59],
       [16.58, 16.62, 16.52, 16.56],
       [16.62, 16.62, 16.39, 16.41],
       [16.42, 16.43, 16.32, 16.35],
       [16.37, 16.53, 16.35, 16.41],
       [16.38, 16.45, 16.37, 16.43],
       [16.4 , 16.44, 16.35, 16.39],
       [16.37, 16.4 , 16.22, 16.22],
       [16.23, 16.28, 16.17, 16.23],
       [16.23, 16.34, 16.16, 16.34],
       [16.34, 16.35, 16.24, 16.26],
       [16.24, 16.28, 16.17, 16.24],
       [16.27, 16.35, 16.25, 16.29],
       [16.3 , 16.32, 16.14, 16.2 ],
       [16.24, 16.25, 16.13, 16.14],
       [16.15, 16.16, 15.98, 16.  ],
       [15.98, 15.98, 15.7 , 15.78],
       [15.8 , 15.92, 15.79, 15.88],
       [15.85, 16.  , 15.83, 15.96],
       [15.97, 16.13, 15.9 , 16.04],
       [16.11, 16.13, 15.97, 16.01],
       [16.01, 16.11, 15.8 , 15.87],
       [15.88, 15.9 , 15.72, 15.81],
       [15.78, 16.05, 15.77, 16.01],
       [16.05, 16.2 , 15.89, 16.16],
       [16.09, 16.18, 16.03, 16.1 ],
       [16.06, 16.16, 16.02, 16.1 ],
       [16.12, 16.12, 15.94, 15.97],
       [15.97, 15.99, 15.81, 15.89],
       [15.88, 15.96, 15.75, 15.89],
       [15.88, 15.89, 15.73, 15.75],
       [15.76, 15.78, 15.42, 15.51],
       [15.53, 15.6 , 15.3 , 15.56],
       [15.5 , 15.52, 15.3 , 15.3 ],
       [15.25, 15.27, 15.11, 15.11],
       [15.12, 15.14, 14.9 , 14.91],
       [14.92, 15.07, 14.85, 15.05],
       [15.05, 15.11, 14.91, 15.  ],
       [15.02, 15.1 , 14.99, 15.05],
       [15.06, 15.11, 15.  , 15.05],
       [15.05, 15.25, 15.03, 15.21],
       [15.15, 15.22, 15.08, 15.21],
       [15.21, 15.22, 15.13, 15.16],
       [15.16, 15.16, 15.05, 15.08],
       [15.07, 15.07, 14.9 , 14.98],
       [14.95, 14.98, 14.52, 14.92],
       [14.78, 14.9 , 14.51, 14.86],
       [14.69, 14.84, 14.66, 14.76],
       [14.76, 14.85, 14.55, 14.61],
       [14.57, 14.79, 14.55, 14.75],
       [14.75, 15.23, 14.73, 15.21],
       [15.27, 15.43, 15.16, 15.27],
       [15.23, 15.29, 15.08, 15.27],
       [15.24, 15.26, 15.07, 15.21],
       [15.1 , 15.14, 14.95, 15.01],
       [15.04, 15.08, 14.91, 15.02],
       [15.02, 15.06, 14.93, 15.04],
       [15.  , 15.57, 14.98, 15.43],
       [15.38, 15.52, 15.21, 15.47],
       [11.75, 12.93, 11.72, 12.93],
       [12.81, 12.91, 12.54, 12.84],
       [12.75, 13.05, 12.66, 12.84],
       [12.78, 12.97, 12.69, 12.92],
       [12.93, 13.01, 12.74, 12.89],
       [12.79, 12.88, 12.51, 12.85],
       [12.8 , 12.84, 12.66, 12.8 ],
       [12.81, 12.99, 12.78, 12.84],
       [12.76, 12.86, 12.67, 12.84],
       [12.79, 13.2 , 12.77, 12.93],
       [12.9 , 13.14, 12.76, 12.78],
       [12.78, 12.78, 12.56, 12.59],
       [12.6 , 12.64, 12.31, 12.46],
       [12.41, 12.46, 12.28, 12.38],
       [12.38, 12.46, 12.27, 12.35],
       [12.36, 12.43, 12.3 , 12.4 ],
       [12.41, 12.44, 12.22, 12.29],
       [12.4 , 12.4 , 12.26, 12.34],
       [12.33, 12.84, 12.29, 12.57],
       [12.55, 12.65, 12.36, 12.64],
       [12.64, 12.82, 12.55, 12.64],
       [12.64, 12.73, 12.53, 12.7 ],
       [12.66, 12.91, 12.6 , 12.79],
       [12.81, 12.86, 12.7 , 12.76],
       [12.7 , 12.73, 12.59, 12.65],
       [12.64, 12.65, 12.47, 12.56],
       [12.55, 12.58, 12.41, 12.55],
       [12.5 , 12.65, 12.47, 12.62],
       [12.62, 12.72, 12.51, 12.66],
       [12.62, 12.69, 12.55, 12.6 ],
       [12.59, 12.64, 12.52, 12.53],
       [12.51, 12.94, 12.51, 12.8 ],
       [12.8 , 13.06, 12.74, 12.86],
       [12.82, 13.29, 12.77, 13.27],
       [13.34, 13.8 , 13.34, 13.59],
       [13.62, 14.02, 13.28, 13.76],
       [13.65, 13.72, 13.4 , 13.55],
       [13.56, 13.78, 13.48, 13.69],
       [13.67, 13.7 , 13.53, 13.62],
       [13.61, 13.63, 13.38, 13.45],
       [13.45, 13.74, 13.36, 13.67],
       [13.62, 13.75, 13.41, 13.51],
       [13.45, 13.7 , 13.36, 13.38],
       [13.38, 13.47, 13.28, 13.38],
       [13.37, 13.49, 13.3 , 13.47],
       [13.45, 13.52, 13.31, 13.36],
       [13.42, 13.49, 13.32, 13.43],
       [13.44, 13.6 , 13.39, 13.44],
       [13.42, 13.42, 13.04, 13.08],
       [13.09, 13.1 , 12.85, 12.87],
       [12.87, 12.95, 12.83, 12.93],
       [12.91, 12.93, 12.83, 12.87],
       [12.84, 12.88, 12.75, 12.76],
       [12.74, 12.88, 12.7 , 12.79],
       [12.72, 12.73, 12.46, 12.68],
       [12.62, 12.63, 12.52, 12.56],
       [12.58, 12.76, 12.56, 12.59],
       [12.56, 12.56, 12.47, 12.49],
       [12.49, 12.55, 12.47, 12.53],
       [12.5 , 12.57, 12.46, 12.51],
       [12.5 , 12.57, 12.46, 12.51],
       [12.51, 12.53, 12.41, 12.43],
       [12.44, 12.66, 12.41, 12.57],
       [12.58, 12.66, 12.45, 12.47],
       [12.47, 12.8 , 12.47, 12.78],
       [12.83, 12.96, 12.75, 12.89],
       [12.85, 12.98, 12.77, 12.96],
       [12.91, 12.98, 12.82, 12.87],
       [12.83, 12.84, 12.65, 12.71],
       [12.68, 12.87, 12.68, 12.77],
       [12.78, 12.84, 12.65, 12.78],
       [12.78, 13.09, 12.78, 13.03],
       [12.99, 13.05, 12.9 , 12.96],
       [12.94, 13.02, 12.82, 12.85],
       [12.85, 13.12, 12.83, 13.03],
       [13.15, 13.15, 12.95, 12.99],
       [13.  , 13.04, 12.91, 13.04],
       [13.01, 13.01, 12.86, 12.87],
       [12.88, 12.98, 12.81, 12.85],
       [12.82, 12.86, 12.75, 12.83]])

  6.使用 describe() 函数对于数据的快速统计汇总:

In [94]:
price.describe()
Out[94]:
open high low close
count 174.000000 174.000000 174.000000 174.000000
mean 14.545747 14.657241 14.436609 14.556034
std 1.668836 1.642246 1.672180 1.648151
min 11.750000 12.400000 11.720000 12.290000
25% 12.810000 12.942500 12.740000 12.842500
50% 14.975000 15.070000 14.875000 14.990000
75% 16.230000 16.287500 16.137500 16.200000
max 16.880000 17.080000 16.810000 16.910000

  7.对数据的转置:

In [95]:
price.T
Out[95]:
date 2017-01-03 2017-01-04 2017-01-05 2017-01-06 2017-01-09 2017-01-10 2017-01-11 2017-01-12 2017-01-13 2017-01-16 ... 2017-09-04 2017-09-05 2017-09-06 2017-09-07 2017-09-08 2017-09-11 2017-09-12 2017-09-13 2017-09-14 2017-09-15
open 16.21 16.29 16.30 16.30 16.24 16.18 16.24 16.18 16.10 16.23 ... 12.78 12.78 12.99 12.94 12.85 13.15 13.00 13.01 12.88 12.82
high 16.44 16.35 16.38 16.30 16.29 16.24 16.24 16.20 16.29 16.60 ... 12.84 13.09 13.05 13.02 13.12 13.15 13.04 13.01 12.98 12.86
low 16.17 16.18 16.24 16.13 16.13 16.14 16.15 16.11 16.10 16.10 ... 12.65 12.78 12.90 12.82 12.83 12.95 12.91 12.86 12.81 12.75
close 16.30 16.33 16.30 16.18 16.20 16.19 16.16 16.12 16.27 16.56 ... 12.78 13.03 12.96 12.85 13.03 12.99 13.04 12.87 12.85 12.83

4 rows × 174 columns

  8.按列对 DataFrame 进行排序

In [101]:
price.sort_values(by=['open','close'] , ascending=[False,True])
Out[101]:
open high low close
date
2017-02-22 16.88 16.90 16.73 16.75
2017-02-16 16.88 16.91 16.76 16.78
2017-02-13 16.88 16.90 16.78 16.85
2017-02-21 16.88 17.08 16.81 16.88
2017-02-14 16.87 16.88 16.74 16.75
2017-02-03 16.82 16.85 16.62 16.63
2017-02-15 16.82 16.93 16.75 16.84
2017-02-17 16.78 16.82 16.59 16.64
2017-02-23 16.78 16.83 16.64 16.69
2017-02-10 16.76 16.84 16.70 16.78
2017-02-06 16.75 16.78 16.66 16.66
2017-02-27 16.69 16.72 16.53 16.59
2017-01-25 16.69 16.74 16.61 16.69
2017-01-26 16.69 16.84 16.61 16.74
2017-02-09 16.68 16.77 16.59 16.72
2017-02-24 16.67 16.75 16.66 16.71
2017-01-23 16.66 16.69 16.51 16.57
2017-02-07 16.65 16.74 16.63 16.67
2017-02-08 16.63 16.69 16.54 16.67
2017-03-02 16.62 16.62 16.39 16.41
2017-02-20 16.62 16.98 16.62 16.91
2017-03-01 16.58 16.62 16.52 16.56
2017-02-28 16.58 16.67 16.53 16.59
2017-01-20 16.58 16.66 16.50 16.60
2017-01-24 16.58 16.70 16.58 16.69
2017-01-17 16.46 16.54 16.37 16.40
2017-01-19 16.43 16.64 16.43 16.54
2017-03-03 16.42 16.43 16.32 16.35
2017-01-18 16.42 16.55 16.36 16.48
2017-03-08 16.40 16.44 16.35 16.39
... ... ... ... ...
2017-09-01 12.68 12.87 12.68 12.77
2017-06-28 12.66 12.91 12.60 12.79
2017-07-03 12.64 12.65 12.47 12.56
2017-06-26 12.64 12.82 12.55 12.64
2017-06-27 12.64 12.73 12.53 12.70
2017-08-14 12.62 12.63 12.52 12.56
2017-07-07 12.62 12.69 12.55 12.60
2017-07-06 12.62 12.72 12.51 12.66
2017-06-14 12.60 12.64 12.31 12.46
2017-07-10 12.59 12.64 12.52 12.53
2017-08-24 12.58 12.66 12.45 12.47
2017-08-15 12.58 12.76 12.56 12.59
2017-08-16 12.56 12.56 12.47 12.49
2017-07-04 12.55 12.58 12.41 12.55
2017-06-23 12.55 12.65 12.36 12.64
2017-08-22 12.51 12.53 12.41 12.43
2017-07-11 12.51 12.94 12.51 12.80
2017-08-18 12.50 12.57 12.46 12.51
2017-08-21 12.50 12.57 12.46 12.51
2017-07-05 12.50 12.65 12.47 12.62
2017-08-17 12.49 12.55 12.47 12.53
2017-08-25 12.47 12.80 12.47 12.78
2017-08-23 12.44 12.66 12.41 12.57
2017-06-20 12.41 12.44 12.22 12.29
2017-06-15 12.41 12.46 12.28 12.38
2017-06-21 12.40 12.40 12.26 12.34
2017-06-16 12.38 12.46 12.27 12.35
2017-06-19 12.36 12.43 12.30 12.40
2017-06-22 12.33 12.84 12.29 12.57
2017-05-25 11.75 12.93 11.72 12.93

174 rows × 4 columns

  注意:sort_values函数内置参数有by和ascending,by参数是排序指定列,ascending是排序顺序,False是从大到小,True是从小到大。

三、选择数据

  依旧采用上个小节案例,继续讲述选择数据的八项基本操作。
   1.选择一列数据,选取开盘价这列数据:

In [102]:
price['open']
Out[102]:
date
2017-01-03    16.21
2017-01-04    16.29
2017-01-05    16.30
2017-01-06    16.30
2017-01-09    16.24
              ...  
2017-09-11    13.15
2017-09-12    13.00
2017-09-13    13.01
2017-09-14    12.88
2017-09-15    12.82
Name: open, Length: 174, dtype: float64
In [103]:
price.open
Out[103]:
date
2017-01-03    16.21
2017-01-04    16.29
2017-01-05    16.30
2017-01-06    16.30
2017-01-09    16.24
              ...  
2017-09-11    13.15
2017-09-12    13.00
2017-09-13    13.01
2017-09-14    12.88
2017-09-15    12.82
Name: open, Length: 174, dtype: float64

  同学们动手试试price.open~
  它与price['open']是等效的!

  2.选择多列数据:

In [108]:
price[['open','high','close']]
Out[108]:
open high close
date
2017-01-03 16.21 16.44 16.30
2017-01-04 16.29 16.35 16.33
2017-01-05 16.30 16.38 16.30
2017-01-06 16.30 16.30 16.18
2017-01-09 16.24 16.29 16.20
2017-01-10 16.18 16.24 16.19
2017-01-11 16.24 16.24 16.16
2017-01-12 16.18 16.20 16.12
2017-01-13 16.10 16.29 16.27
2017-01-16 16.23 16.60 16.56
2017-01-17 16.46 16.54 16.40
2017-01-18 16.42 16.55 16.48
2017-01-19 16.43 16.64 16.54
2017-01-20 16.58 16.66 16.60
2017-01-23 16.66 16.69 16.57
2017-01-24 16.58 16.70 16.69
2017-01-25 16.69 16.74 16.69
2017-01-26 16.69 16.84 16.74
2017-02-03 16.82 16.85 16.63
2017-02-06 16.75 16.78 16.66
2017-02-07 16.65 16.74 16.67
2017-02-08 16.63 16.69 16.67
2017-02-09 16.68 16.77 16.72
2017-02-10 16.76 16.84 16.78
2017-02-13 16.88 16.90 16.85
2017-02-14 16.87 16.88 16.75
2017-02-15 16.82 16.93 16.84
2017-02-16 16.88 16.91 16.78
2017-02-17 16.78 16.82 16.64
2017-02-20 16.62 16.98 16.91
... ... ... ...
2017-08-07 12.87 12.95 12.93
2017-08-08 12.91 12.93 12.87
2017-08-09 12.84 12.88 12.76
2017-08-10 12.74 12.88 12.79
2017-08-11 12.72 12.73 12.68
2017-08-14 12.62 12.63 12.56
2017-08-15 12.58 12.76 12.59
2017-08-16 12.56 12.56 12.49
2017-08-17 12.49 12.55 12.53
2017-08-18 12.50 12.57 12.51
2017-08-21 12.50 12.57 12.51
2017-08-22 12.51 12.53 12.43
2017-08-23 12.44 12.66 12.57
2017-08-24 12.58 12.66 12.47
2017-08-25 12.47 12.80 12.78
2017-08-28 12.83 12.96 12.89
2017-08-29 12.85 12.98 12.96
2017-08-30 12.91 12.98 12.87
2017-08-31 12.83 12.84 12.71
2017-09-01 12.68 12.87 12.77
2017-09-04 12.78 12.84 12.78
2017-09-05 12.78 13.09 13.03
2017-09-06 12.99 13.05 12.96
2017-09-07 12.94 13.02 12.85
2017-09-08 12.85 13.12 13.03
2017-09-11 13.15 13.15 12.99
2017-09-12 13.00 13.04 13.04
2017-09-13 13.01 13.01 12.87
2017-09-14 12.88 12.98 12.85
2017-09-15 12.82 12.86 12.83

174 rows × 3 columns

  注意:price[['open','close']]中['open','close']是一个由两个字符串(列名)组成的列表,会自动对应到整个DataFrame表结构中,获取到相应的数据。
  同学们试试price['open','close'],看看能不能获取到数据~

  3.选择多行:

In [112]:
price[0:3]
Out[112]:
open high low close
date
2017-01-03 16.21 16.44 16.17 16.30
2017-01-04 16.29 16.35 16.18 16.33
2017-01-05 16.30 16.38 16.24 16.30

  4.按index选取多行:

In [113]:
price['2017-01-24':'2017-02-25']
Out[113]:
open high low close
date
2017-01-24 16.58 16.70 16.58 16.69
2017-01-25 16.69 16.74 16.61 16.69
2017-01-26 16.69 16.84 16.61 16.74
2017-02-03 16.82 16.85 16.62 16.63
2017-02-06 16.75 16.78 16.66 16.66
2017-02-07 16.65 16.74 16.63 16.67
2017-02-08 16.63 16.69 16.54 16.67
2017-02-09 16.68 16.77 16.59 16.72
2017-02-10 16.76 16.84 16.70 16.78
2017-02-13 16.88 16.90 16.78 16.85
2017-02-14 16.87 16.88 16.74 16.75
2017-02-15 16.82 16.93 16.75 16.84
2017-02-16 16.88 16.91 16.76 16.78
2017-02-17 16.78 16.82 16.59 16.64
2017-02-20 16.62 16.98 16.62 16.91
2017-02-21 16.88 17.08 16.81 16.88
2017-02-22 16.88 16.90 16.73 16.75
2017-02-23 16.78 16.83 16.64 16.69
2017-02-24 16.67 16.75 16.66 16.71

  5.使用标签选取数据:

   price.loc[行标签,列标签]
   price.loc['a':'b'] #选取 ab 两行数据
   price.loc[:,'open'] #选取 open 列的数据
   price.loc 的第一个参数是行标签,第二个参数为列标签,两个参数既可以是列表也可以是单个字符,如果两个参数都为列表则返回的是 DataFrame,否则,则为 Series。

In [114]:
price.loc['2017-01-24','open']
Out[114]:
16.58
In [115]:
price.loc['2017-01-24':'2017-01-25']
Out[115]:
open high low close
date
2017-01-24 16.58 16.70 16.58 16.69
2017-01-25 16.69 16.74 16.61 16.69
In [116]:
price.loc[:, 'open']
Out[116]:
date
2017-01-03    16.21
2017-01-04    16.29
2017-01-05    16.30
2017-01-06    16.30
2017-01-09    16.24
              ...  
2017-09-11    13.15
2017-09-12    13.00
2017-09-13    13.01
2017-09-14    12.88
2017-09-15    12.82
Name: open, Length: 174, dtype: float64
In [119]:
price.loc['2017-01-24':'2017-02-25',['open','close']]
Out[119]:
open close
date
2017-01-24 16.58 16.69
2017-01-25 16.69 16.69
2017-01-26 16.69 16.74
2017-02-03 16.82 16.63
2017-02-06 16.75 16.66
2017-02-07 16.65 16.67
2017-02-08 16.63 16.67
2017-02-09 16.68 16.72
2017-02-10 16.76 16.78
2017-02-13 16.88 16.85
2017-02-14 16.87 16.75
2017-02-15 16.82 16.84
2017-02-16 16.88 16.78
2017-02-17 16.78 16.64
2017-02-20 16.62 16.91
2017-02-21 16.88 16.88
2017-02-22 16.88 16.75
2017-02-23 16.78 16.69
2017-02-24 16.67 16.71

  6..使用位置选取数据:

   price.iloc[行位置,列位置]
   price.iloc[1,1] #选取第二行,第二列的值,返回的为单个值
   price.iloc[[0,2],:] #选取第一行及第三行的数据
   price.iloc[0:2,:] #选取第一行到第三行(不包含)的数据
   price.iloc[:,1] #选取所有记录的第二列的值,返回的为一个Series
   price.iloc[1,:] #选取第一行数据,返回的为一个Series

In [120]:
price.iloc[1,1] # 选取第二行,第二列的值,返回的为单个值
Out[120]:
16.35
In [121]:
price.iloc[[0,2],:] # 选取第一行及第三行的数据
Out[121]:
open high low close
date
2017-01-03 16.21 16.44 16.17 16.3
2017-01-05 16.30 16.38 16.24 16.3
In [123]:
price.iloc[0:20,:] # 选取第一行到第三行(不包含)的数据
Out[123]:
open high low close
date
2017-01-03 16.21 16.44 16.17 16.30
2017-01-04 16.29 16.35 16.18 16.33
2017-01-05 16.30 16.38 16.24 16.30
2017-01-06 16.30 16.30 16.13 16.18
2017-01-09 16.24 16.29 16.13 16.20
2017-01-10 16.18 16.24 16.14 16.19
2017-01-11 16.24 16.24 16.15 16.16
2017-01-12 16.18 16.20 16.11 16.12
2017-01-13 16.10 16.29 16.10 16.27
2017-01-16 16.23 16.60 16.10 16.56
2017-01-17 16.46 16.54 16.37 16.40
2017-01-18 16.42 16.55 16.36 16.48
2017-01-19 16.43 16.64 16.43 16.54
2017-01-20 16.58 16.66 16.50 16.60
2017-01-23 16.66 16.69 16.51 16.57
2017-01-24 16.58 16.70 16.58 16.69
2017-01-25 16.69 16.74 16.61 16.69
2017-01-26 16.69 16.84 16.61 16.74
2017-02-03 16.82 16.85 16.62 16.63
2017-02-06 16.75 16.78 16.66 16.66
In [126]:
price.iloc[:,1:3] # 选取所有记录的第一列的值,返回的为一个Series
Out[126]:
high low
date
2017-01-03 16.44 16.17
2017-01-04 16.35 16.18
2017-01-05 16.38 16.24
2017-01-06 16.30 16.13
2017-01-09 16.29 16.13
2017-01-10 16.24 16.14
2017-01-11 16.24 16.15
2017-01-12 16.20 16.11
2017-01-13 16.29 16.10
2017-01-16 16.60 16.10
2017-01-17 16.54 16.37
2017-01-18 16.55 16.36
2017-01-19 16.64 16.43
2017-01-20 16.66 16.50
2017-01-23 16.69 16.51
2017-01-24 16.70 16.58
2017-01-25 16.74 16.61
2017-01-26 16.84 16.61
2017-02-03 16.85 16.62
2017-02-06 16.78 16.66
2017-02-07 16.74 16.63
2017-02-08 16.69 16.54
2017-02-09 16.77 16.59
2017-02-10 16.84 16.70
2017-02-13 16.90 16.78
2017-02-14 16.88 16.74
2017-02-15 16.93 16.75
2017-02-16 16.91 16.76
2017-02-17 16.82 16.59
2017-02-20 16.98 16.62
... ... ...
2017-08-07 12.95 12.83
2017-08-08 12.93 12.83
2017-08-09 12.88 12.75
2017-08-10 12.88 12.70
2017-08-11 12.73 12.46
2017-08-14 12.63 12.52
2017-08-15 12.76 12.56
2017-08-16 12.56 12.47
2017-08-17 12.55 12.47
2017-08-18 12.57 12.46
2017-08-21 12.57 12.46
2017-08-22 12.53 12.41
2017-08-23 12.66 12.41
2017-08-24 12.66 12.45
2017-08-25 12.80 12.47
2017-08-28 12.96 12.75
2017-08-29 12.98 12.77
2017-08-30 12.98 12.82
2017-08-31 12.84 12.65
2017-09-01 12.87 12.68
2017-09-04 12.84 12.65
2017-09-05 13.09 12.78
2017-09-06 13.05 12.90
2017-09-07 13.02 12.82
2017-09-08 13.12 12.83
2017-09-11 13.15 12.95
2017-09-12 13.04 12.91
2017-09-13 13.01 12.86
2017-09-14 12.98 12.81
2017-09-15 12.86 12.75

174 rows × 2 columns

In [127]:
price.iloc[1,:] # 选取第一行数据,返回的为一个Series
Out[127]:
open     16.29
high     16.35
low      16.18
close    16.33
Name: 2017-01-04, dtype: float64

  7.更广义的切片方式是使用.ix,它自动根据给到的索引类型判断是使用位置还是标签进行切片

   price.ix[1,1]
   price.ix['a':'b']

In [39]:
price.ix[1,1]
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py:947: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  return getattr(section, self.name)[new_key]
Out[39]:
16.35
In [40]:
price.ix['2017-01-24':'2017-01-25']
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
Out[40]:
open high low close
date
2017-01-24 16.58 16.70 16.58 16.69
2017-01-25 16.69 16.74 16.61 16.69
In [41]:
price.ix['2017-01-24','open']
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
Out[41]:
16.58
In [42]:
price.ix[1,'open']
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: 
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
Out[42]:
16.29
In [47]:
price.ix['2018-01-24',0]
Out[47]:
764.46000000000004

  8.通过逻辑指针进行数据切片:

   price[逻辑条件]
   price[price.one >= 2] #单个逻辑条件
   price[(price.one >=1 ) & (df.one < 3) ] #多个逻辑条件组合

In [130]:
price.open>16.5
Out[130]:
date
2017-01-03    False
2017-01-04    False
2017-01-05    False
2017-01-06    False
2017-01-09    False
              ...  
2017-09-11    False
2017-09-12    False
2017-09-13    False
2017-09-14    False
2017-09-15    False
Name: open, Length: 174, dtype: bool
In [141]:
#筛选出 open 大于 16.5的数据
price[price.open > 16.5][['open','close']]
Out[141]:
open close
date
2017-01-20 16.58 16.60
2017-01-23 16.66 16.57
2017-01-24 16.58 16.69
2017-01-25 16.69 16.69
2017-01-26 16.69 16.74
2017-02-03 16.82 16.63
2017-02-06 16.75 16.66
2017-02-07 16.65 16.67
2017-02-08 16.63 16.67
2017-02-09 16.68 16.72
2017-02-10 16.76 16.78
2017-02-13 16.88 16.85
2017-02-14 16.87 16.75
2017-02-15 16.82 16.84
2017-02-16 16.88 16.78
2017-02-17 16.78 16.64
2017-02-20 16.62 16.91
2017-02-21 16.88 16.88
2017-02-22 16.88 16.75
2017-02-23 16.78 16.69
2017-02-24 16.67 16.71
2017-02-27 16.69 16.59
2017-02-28 16.58 16.59
2017-03-01 16.58 16.56
2017-03-02 16.62 16.41
In [135]:
print((price.open > 16.4))
print((price.close < 16.6))
date
2017-01-03    False
2017-01-04    False
2017-01-05    False
2017-01-06    False
2017-01-09    False
              ...  
2017-09-11    False
2017-09-12    False
2017-09-13    False
2017-09-14    False
2017-09-15    False
Name: open, Length: 174, dtype: bool
date
2017-01-03    True
2017-01-04    True
2017-01-05    True
2017-01-06    True
2017-01-09    True
              ... 
2017-09-11    True
2017-09-12    True
2017-09-13    True
2017-09-14    True
2017-09-15    True
Name: close, Length: 174, dtype: bool
In [142]:
#筛选出 open 大于 16.4 的数据,并且 close 小于 16.6 的数据
price[(price.open > 16.4) & (price.close < 16.6)]
Out[142]:
open high low close
date
2017-01-17 16.46 16.54 16.37 16.40
2017-01-18 16.42 16.55 16.36 16.48
2017-01-19 16.43 16.64 16.43 16.54
2017-01-23 16.66 16.69 16.51 16.57
2017-02-27 16.69 16.72 16.53 16.59
2017-02-28 16.58 16.67 16.53 16.59
2017-03-01 16.58 16.62 16.52 16.56
2017-03-02 16.62 16.62 16.39 16.41
2017-03-03 16.42 16.43 16.32 16.35
In [145]:
#使用 条件过来更改数据。
price[price>16.5]
Out[145]:
open high low close
date
2017-01-03 NaN NaN NaN NaN
2017-01-04 NaN NaN NaN NaN
2017-01-05 NaN NaN NaN NaN
2017-01-06 NaN NaN NaN NaN
2017-01-09 NaN NaN NaN NaN
2017-01-10 NaN NaN NaN NaN
2017-01-11 NaN NaN NaN NaN
2017-01-12 NaN NaN NaN NaN
2017-01-13 NaN NaN NaN NaN
2017-01-16 NaN 16.60 NaN 16.56
2017-01-17 NaN 16.54 NaN NaN
2017-01-18 NaN 16.55 NaN NaN
2017-01-19 NaN 16.64 NaN 16.54
2017-01-20 16.58 16.66 NaN 16.60
2017-01-23 16.66 16.69 16.51 16.57
2017-01-24 16.58 16.70 16.58 16.69
2017-01-25 16.69 16.74 16.61 16.69
2017-01-26 16.69 16.84 16.61 16.74
2017-02-03 16.82 16.85 16.62 16.63
2017-02-06 16.75 16.78 16.66 16.66
2017-02-07 16.65 16.74 16.63 16.67
2017-02-08 16.63 16.69 16.54 16.67
2017-02-09 16.68 16.77 16.59 16.72
2017-02-10 16.76 16.84 16.70 16.78
2017-02-13 16.88 16.90 16.78 16.85
2017-02-14 16.87 16.88 16.74 16.75
2017-02-15 16.82 16.93 16.75 16.84
2017-02-16 16.88 16.91 16.76 16.78
2017-02-17 16.78 16.82 16.59 16.64
2017-02-20 16.62 16.98 16.62 16.91
... ... ... ... ...
2017-08-07 NaN NaN NaN NaN
2017-08-08 NaN NaN NaN NaN
2017-08-09 NaN NaN NaN NaN
2017-08-10 NaN NaN NaN NaN
2017-08-11 NaN NaN NaN NaN
2017-08-14 NaN NaN NaN NaN
2017-08-15 NaN NaN NaN NaN
2017-08-16 NaN NaN NaN NaN
2017-08-17 NaN NaN NaN NaN
2017-08-18 NaN NaN NaN NaN
2017-08-21 NaN NaN NaN NaN
2017-08-22 NaN NaN NaN NaN
2017-08-23 NaN NaN NaN NaN
2017-08-24 NaN NaN NaN NaN
2017-08-25 NaN NaN NaN NaN
2017-08-28 NaN NaN NaN NaN
2017-08-29 NaN NaN NaN NaN
2017-08-30 NaN NaN NaN NaN
2017-08-31 NaN NaN NaN NaN
2017-09-01 NaN NaN NaN NaN
2017-09-04 NaN NaN NaN NaN
2017-09-05 NaN NaN NaN NaN
2017-09-06 NaN NaN NaN NaN
2017-09-07 NaN NaN NaN NaN
2017-09-08 NaN NaN NaN NaN
2017-09-11 NaN NaN NaN NaN
2017-09-12 NaN NaN NaN NaN
2017-09-13 NaN NaN NaN NaN
2017-09-14 NaN NaN NaN NaN
2017-09-15 NaN NaN NaN NaN

174 rows × 4 columns

  观察可以发现,price 中大于 16.5 的数都变为 NaN。

In [146]:
#我们还可以把大于 16.5 的数赋值为1.
price[price > 16.5] = 1
price
Out[146]:
open high low close
date
2017-01-03 16.21 16.44 16.17 16.30
2017-01-04 16.29 16.35 16.18 16.33
2017-01-05 16.30 16.38 16.24 16.30
2017-01-06 16.30 16.30 16.13 16.18
2017-01-09 16.24 16.29 16.13 16.20
2017-01-10 16.18 16.24 16.14 16.19
2017-01-11 16.24 16.24 16.15 16.16
2017-01-12 16.18 16.20 16.11 16.12
2017-01-13 16.10 16.29 16.10 16.27
2017-01-16 16.23 1.00 16.10 1.00
2017-01-17 16.46 1.00 16.37 16.40
2017-01-18 16.42 1.00 16.36 16.48
2017-01-19 16.43 1.00 16.43 1.00
2017-01-20 1.00 1.00 16.50 1.00
2017-01-23 1.00 1.00 1.00 1.00
2017-01-24 1.00 1.00 1.00 1.00
2017-01-25 1.00 1.00 1.00 1.00
2017-01-26 1.00 1.00 1.00 1.00
2017-02-03 1.00 1.00 1.00 1.00
2017-02-06 1.00 1.00 1.00 1.00
2017-02-07 1.00 1.00 1.00 1.00
2017-02-08 1.00 1.00 1.00 1.00
2017-02-09 1.00 1.00 1.00 1.00
2017-02-10 1.00 1.00 1.00 1.00
2017-02-13 1.00 1.00 1.00 1.00
2017-02-14 1.00 1.00 1.00 1.00
2017-02-15 1.00 1.00 1.00 1.00
2017-02-16 1.00 1.00 1.00 1.00
2017-02-17 1.00 1.00 1.00 1.00
2017-02-20 1.00 1.00 1.00 1.00
... ... ... ... ...
2017-08-07 12.87 12.95 12.83 12.93
2017-08-08 12.91 12.93 12.83 12.87
2017-08-09 12.84 12.88 12.75 12.76
2017-08-10 12.74 12.88 12.70 12.79
2017-08-11 12.72 12.73 12.46 12.68
2017-08-14 12.62 12.63 12.52 12.56
2017-08-15 12.58 12.76 12.56 12.59
2017-08-16 12.56 12.56 12.47 12.49
2017-08-17 12.49 12.55 12.47 12.53
2017-08-18 12.50 12.57 12.46 12.51
2017-08-21 12.50 12.57 12.46 12.51
2017-08-22 12.51 12.53 12.41 12.43
2017-08-23 12.44 12.66 12.41 12.57
2017-08-24 12.58 12.66 12.45 12.47
2017-08-25 12.47 12.80 12.47 12.78
2017-08-28 12.83 12.96 12.75 12.89
2017-08-29 12.85 12.98 12.77 12.96
2017-08-30 12.91 12.98 12.82 12.87
2017-08-31 12.83 12.84 12.65 12.71
2017-09-01 12.68 12.87 12.68 12.77
2017-09-04 12.78 12.84 12.65 12.78
2017-09-05 12.78 13.09 12.78 13.03
2017-09-06 12.99 13.05 12.90 12.96
2017-09-07 12.94 13.02 12.82 12.85
2017-09-08 12.85 13.12 12.83 13.03
2017-09-11 13.15 13.15 12.95 12.99
2017-09-12 13.00 13.04 12.91 13.04
2017-09-13 13.01 13.01 12.86 12.87
2017-09-14 12.88 12.98 12.81 12.85
2017-09-15 12.82 12.86 12.75 12.83

174 rows × 4 columns

  使用isin()方法来过滤在指定列中的数据,案例延续上面赋值后的price

In [148]:
# 选取 high 列中数为 1 和 774.00的数。
price[price['high'].isin([1,16.24])]
Out[148]:
open high low close
date
2017-01-10 16.18 16.24 16.14 16.19
2017-01-11 16.24 16.24 16.15 16.16
2017-01-16 16.23 1.00 16.10 1.00
2017-01-17 16.46 1.00 16.37 16.40
2017-01-18 16.42 1.00 16.36 16.48
2017-01-19 16.43 1.00 16.43 1.00
2017-01-20 1.00 1.00 16.50 1.00
2017-01-23 1.00 1.00 1.00 1.00
2017-01-24 1.00 1.00 1.00 1.00
2017-01-25 1.00 1.00 1.00 1.00
2017-01-26 1.00 1.00 1.00 1.00
2017-02-03 1.00 1.00 1.00 1.00
2017-02-06 1.00 1.00 1.00 1.00
2017-02-07 1.00 1.00 1.00 1.00
2017-02-08 1.00 1.00 1.00 1.00
2017-02-09 1.00 1.00 1.00 1.00
2017-02-10 1.00 1.00 1.00 1.00
2017-02-13 1.00 1.00 1.00 1.00
2017-02-14 1.00 1.00 1.00 1.00
2017-02-15 1.00 1.00 1.00 1.00
2017-02-16 1.00 1.00 1.00 1.00
2017-02-17 1.00 1.00 1.00 1.00
2017-02-20 1.00 1.00 1.00 1.00
2017-02-21 1.00 1.00 1.00 1.00
2017-02-22 1.00 1.00 1.00 1.00
2017-02-23 1.00 1.00 1.00 1.00
2017-02-24 1.00 1.00 1.00 1.00
2017-02-27 1.00 1.00 1.00 1.00
2017-02-28 1.00 1.00 1.00 1.00
2017-03-01 1.00 1.00 1.00 1.00
2017-03-02 1.00 1.00 16.39 16.41
2017-03-06 16.37 1.00 16.35 16.41

510500.OF 南方中证500ETF

510510.OF 广发中证500ETF

510520.OF 诺安中证500ETF

510560.OF 国寿安保中证500ETF

510580.OF 易方达中证500

000905.SH 中证500

etf=['510500.OF','510510.OF','510520.OF','510560.OF','510580.OF']

  1. 获取各自2017年1月1日以来的日收益率

  2. 获得各自的年跟踪误差TE

    TE=std(ETF日收益率-中证500日收益率)*sqrt(250)

    std()

  3. 哪个做的最好?

In [198]:
idx=pd.read_csv('/resources/data/IDX_Idxtrd_000905.csv')
idx.columns=['Code','Date','Open','High','Low','Close']
idx=idx.set_index('Date')
idxret=idx['Close']/idx['Close'].shift()-1
idxret=idxret[idxret.index>='2018-01-01']
idxret
Out[198]:
Date
2018-01-02    0.013024
2018-01-03    0.008847
2018-01-04    0.004585
2018-01-05   -0.000045
2018-01-08    0.004508
                ...   
2020-10-12    0.028816
2020-10-13    0.001562
2020-10-14   -0.006568
2020-10-15   -0.007426
2020-10-16   -0.004854
Name: Close, Length: 676, dtype: float64
In [199]:
etf=pd.read_csv('/resources/data/Fund_NAV.csv')
etf=etf.set_index('TradingDate')
etf
Out[199]:
Symbol AccumulativeNAV
TradingDate
2018-01-02 510560 0.6205
2018-01-02 510520 1.6321
2018-01-02 510580 1.0146
2018-01-02 510500 1.8709
2018-01-02 510510 1.7612
2018-01-03 510560 0.6259
2018-01-03 510520 1.6457
2018-01-03 510580 1.0233
2018-01-03 510500 1.8874
2018-01-03 510510 1.7764
2018-01-04 510520 1.6529
2018-01-04 510510 1.7844
2018-01-04 510560 0.6287
2018-01-04 510580 1.0278
2018-01-04 510500 1.8958
2018-01-05 510520 1.6534
2018-01-05 510560 0.6287
2018-01-05 510580 1.0277
2018-01-05 510510 1.7843
2018-01-05 510500 1.8957
2018-01-08 510560 0.6316
2018-01-08 510500 1.9041
2018-01-08 510520 1.6610
2018-01-08 510580 1.0320
2018-01-08 510510 1.7920
2018-01-09 510510 1.7920
2018-01-09 510520 1.6606
2018-01-09 510560 0.6315
2018-01-09 510580 1.0319
2018-01-09 510500 1.9039
... ... ...
2020-09-29 510510 1.8165
2020-09-29 510560 0.6223
2020-09-30 510580 1.0389
2020-09-30 510510 1.8055
2020-09-30 510500 1.9114
2020-09-30 510560 0.6187
2020-10-09 510580 1.0667
2020-10-09 510510 1.8538
2020-10-09 510560 0.6350
2020-10-09 510500 1.9628
2020-10-12 510560 0.6532
2020-10-12 510580 1.0974
2020-10-12 510500 2.0197
2020-10-12 510510 1.9073
2020-10-13 510580 1.0991
2020-10-13 510500 2.0229
2020-10-13 510510 1.9102
2020-10-13 510560 0.6541
2020-10-14 510510 1.8980
2020-10-14 510580 1.0923
2020-10-14 510500 2.0099
2020-10-14 510560 0.6499
2020-10-15 510560 0.6451
2020-10-15 510580 1.0858
2020-10-15 510510 1.8841
2020-10-15 510500 1.9955
2020-10-16 510580 1.0806
2020-10-16 510510 1.8748
2020-10-16 510560 0.6420
2020-10-16 510500 1.9856

3170 rows × 2 columns

In [200]:
etfsymbols=[510500,510510,510560,510580]
etflist=[]
for i in etfsymbols:
    e=pd.DataFrame(etf[etf['Symbol']==i]['AccumulativeNAV'])
    e.columns=[str(i)]
    etflist.append(e)
df=pd.concat(etflist,axis=1)
ret=df/df.shift()-1
ret.sub(idxret,axis=0).std()*sqrt(250)
/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:7: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  import sys
Out[200]:
510500    0.004283
510510    0.005752
510560    0.005596
510580    0.015394
dtype: float64
In [ ]: