温馨提示:本案例仅供学习研究之用,不构成投资建议。
Caring reminder: This case is for study purposes only and does not constitute an investment recommendation.
比特币的价格数据是基于时间序列的,所以比特币的价格预测大部分是通过LSTM模来实现的。
bitcoin price data are based on time series, so bitcoin price projections are mostly achieved through LSTM.
LSTM是一种深度学习模,特别适用于时间序列数据(或具有时间/空间/结构顺序的数据,如电影、句子等),是预测加密货币价格走势的理想模。
LSTM is an in-depth learning model that applies in particular to time series data (or data with time/space/structure sequences, such as films, sentences, etc.) and is the ideal model for predicting the price movement of encrypted currencies.
本文主要描述通过LSTM进行数据拟合来预测比特币未来的价格。
This paper mainly describes the future price of bits .
1 importpandasaspd2importnumpyasnp34fromsklearn。PreprocessingimportMinMaxScaler LabelEncoder5fromkeras。ModelsimportSequential6fromkeras。LayersimportLSTM, density, Dropout78frommatpLootlibimportpypLootasplt9 % matpLootlibinline
1. 数据加载
1. Loading of data
读取BTC的日常交易数据
Read day-to-day transaction data from
1 data=pd。Read_cs (filepath_or_buffer="btc_data_day")
根据数据,共有1380条数据,由日期、开盘、高、低、收盘、成交量(BTC)、成交量(货币)、加权价格等栏目组成。除了Date列之外,其余的数据列都属于float64数据类。
According to the data, there are 1,380 data, consisting of dates, openings, highs, lows, closings, turnover (BTC), volume (currency), weighted prices, etc. The rest of the data columns, with the exception of the Date column, fall into the float64 data category.
1的数据。info ()
看看前10行
Look at the top ten lines.
1的数据。头(10)
2.数据可视化
2. Data visualization
利用matpLootlib绘制加权后的价格,可以看到数据的分布和趋势。在图中,我们发现有一段数据0,所以我们需要确认数据中是否有异常。
You can see the distribution and trends of the data using the matpLootlib for a weighted price. In the graph, we find a zero, so we need to confirm if there are any anomalies in the data.
1 PLT。pLoot (data [' WeightedPrice], label='Price') 2 PLTYlabel (' Price ') 3 PLT图例()4 PLTshow ()
4. 划分训练数据集和测试数据集
4. Demarcation of training data sets and testing of data sets
将数据规范化为0 – 1
Regularize data to 0 - 1
1 data_set=data。删除(' Date ',轴=1). Values2data_set=data_set。Astype (' float32) 3 MMS=MinMaxScaler (feature_range=(0,1)) 4 data_set=MMS。Fit_transform (data_set)
测试数据集和训练数据集按2:8划分
Test data sets and training data sets are divided by 2:8
1 thewire=0.82 train_size=int (len (data_set) * thewire) 3 test_size=len (data_set) - train_size4train, test=data_set [0: train_size,:], data_set [train_size: len (data_set),:)
创建培训数据集和测试数据集。以1天为窗口创建我们的培训数据集和测试数据集。
Creates training data sets and tests data sets. Creates our training data sets and tests data sets in a one-day window.
1 . DeFicreate_dataset (data): 2 window=13 label_index=64 x, y=[], [] 5 foriinrange (len (data) - Windows): 6 x. Append (data/I + window: (I),:) 7 y. Append (data/I + Windows, label_index) 8 returnnp。数组(x), np。数组(y)
1 . train_x train_y=create_dataset (test) 2 . test_x, test_y=create_dataset (test)
这次我们使用一个简单的模,其结构如下。
This time we use a simple model, which is structured as follows.
这里需要解释LSTM的输入形状。输入形状的输入维度是(batch_size、时间步长、特性)。其中,时间步长值为数据输入时的时间窗区间。这里,我们用1天作为时间窗口,我们的数据都是每天的数据,所以这里的时间步长是1。
Here it is necessary to explain the LSTM's input shape. The input dimensions of the input shape are (batch_size, length of time, characteristics). In this, the time step value is the time window area at the time of data entry. Here, we use one day as the time window, and our data is daily, so the time length here is one.
长短时记忆(LSTM)是一种特殊的RNN,主要用于解决长序列训练过程中的梯度消失和梯度爆炸问题。下面是LSTM的简要介绍。
Long-term memory (LSTM) is a special RNN, which is used primarily to solve the problem of gradient loss and gradient explosion during long-series training. Below is a brief description of LSTM.
从LSTM的网络结构图可以看出,LSTM实际上是一个小模,包括3个sigmoid激活函数、2个tanh激活函数、3个乘法函数和1个加法函数。
As can be seen from the network structure chart of LSTM, LSTM is actually a small model consisting of three sigmoid activation functions, two tanh activation functions, three multiplying functions and one plus function.
细胞状态
单元格状态是LSTM的核心,它是上图顶部的黑线,黑线下面是我们后面要讲的门。单元状态将根据每个gate的结果进行更新。让我们通过这些门,你们就会理解细胞状态的流动。
Cell status is at the core of the LSTM, which is the black line at the top of the top of the top, and the black line is the door behind us. The cell state will be updated according to the results of each gate. Let's go through these doors and you will understand the flow of cell state.
LSTM网络可以通过一个称为门的结构删除或添加有关单元状态的信息。盖茨可以有选择地决定让什么信息通过。栅极结构是乙状结肠层和点积运算的组合。因为sigmoid层的输出是0-1,0表示它们都不能通过,1表示它们都能通过。LSTM包含三个控制单元状态的门。我们来谈谈每一扇门。
The LSTM network can delete or add information about the unit status through a structure called a door. Gates can selectively decide which information to pass. The grid is a combination of the beta colon and point accumulation calculations. Because the output of the sigmoid layer is 0-1 = 0 = 0, 1 = none, 1 = all. The LSTM contains three control unit status doors. Let's talk about each door.
忘记门
forgets the door
LSTM中的第一步是确定单元状态需要丢弃哪些信息。这部分操作是通过一个名为遗忘门的sigmoid单元来处理的。让我们看看动画。
The first step in LSTM is to determine which information the unit state needs to discard. This part of the operation is handled through a sigmoid unit called the Forgotten Gate. Let's look at the animation.
我们可以看到那扇被遗忘的门正在向外张望
We can see the forgotten door looking out.
和
该信息用于输出一个介于0和1之间的向量,其中0和1的值表示单元格的状态
This information is used to produce a vector between 0 and 1, with a value of 0 and 1 indicating the status of the cell
进入门
下一步是通过打开输入门来决定要向单元格状态添加什么新信息。我们先来看动画,
The next step is to decide what new information to add to the cell state by opening the input door. Let's look at the animation first.
我们可以看到,
We can see that.
和
信息被放入遗忘门(sigmoid)和输入门(tanh)。因为忘记门的输出是0和1的价值,因此,如果混沌之门的输出为0,结果进入门美元C_{我}$不会被添加到当前状态的细胞,如果它是1,都将被添加到细胞的状态,这是混沌之门进入门的作用选择性的结果添加到细胞状态。
The information is put into the forgotten door (sigmoid) and the entry door (tanh). For forgetting that the output of the door is the value of 0 and 1, the result is that, if the output of the commotion door is 0, the value of the commotion door goes to the cell that will not be added to the current state, and if it is 1, it will be added to the cell state, which is the result of a selective entry of the commotion door into the door into the cell state.
数学公式为:
The mathematical formula is:
输出门
output door
在更新单元格的状态之后,我们需要数据
After updating the cell's status, we need data.
和
输入的和用于确定输出单元的状态特征。这里,我们需要将输入通过一个名为output gate的sigmoid层传递,以获得判断条件,然后通过tanh层传递单元格的状态,以获得一个值介于-1和1之间的向量。这个向量乘以输出门得到的判断条件,得到RNN单元的最终输出。动画图如下,
Entered and used to determine the status characteristics of the output unit. Here, we need to pass the input through a sigmoid layer called output Gate to get the judgement, and then pass the cell status through the tanh layer to get a vector of a value between 1 and 1. This vector is multiplied by the judgement obtained from the output door and the final output of the RNN unit is obtained. The animation is as follows.
1 DeFicreate_model (): 2 model=顺序模()3添加(LSTM (50, input_shape=(train_x))。形状[1],train_x形状[2]))4个模。添加(密集(1))5 model.com运行(损失='mae',优化器='Adam') 6模。返回model89m Odel=create_model ()
1 . history=model fit (train_x train_y, epochs=80, batch_size=64, alidation_data=(test_x test_y), erbose=1, shuffle=False)
1 PLT。情节(历史。History [' loss '], label='train') 2 PLT。情节(历史。历史[' al_loss], label='test') 3 PLT。图例()4 PLTshow ()
1 predict=模。预测(test_x) 2 PLT。pLoot (predict, label='predict') 3 PLT。pLoot (test_y, label='groundtrue') 4 PLT。图例()5pltshow ()
目前,通过机器学习来预测比特币的长期价格走势仍然非常困难。本文只能作为一个学习案例。之后的案例将在线与瞬间池云演示镜像,有兴趣的用户可以直接体验。
Currently, it is still very difficult to predict the long-term price movement of
文章标题:使用LSTM框架实时预测比特币价格 Title of article: real-time forecast of bitcoin prices using the LSTM framework 文章链接:https://www.btchangqing.cn/21372.html 更新时间:2021年05月29日 Update: 29/05/2021
发表评论