## 1. Introduction

Crude oil is a strategic natural resource since it is a commodity connected with many financial instruments, such as futures, options, and bonds. While most financial instruments have a short-term maturity period, there are cases with long-term pricing maturity for oil. Additionally, crude oil price has nonlinear behavior.

The nonlinear behavior in the oil price has been well discussed and analyzed by many articles in the past. Barone-Adesi et al. [

1] suggested a semiparametric method to examine the structure of oil prices. Adrangi et al. [

2] determined the presence of low-dimensional chaotic structure in the oil prices. Lahmiri [

3], Bildirici, and Sonustun [

4]; Komijani et al. [

5]; and He [

6] are the other studies that determine the presence of chaos in the oil prices. Bildirici et al. [

7] suggested a new hybrid modelling technique based on the LSTARGARCH and LSTM models to analyze the volatility of oil prices.

Apart from the works on volatility, the works by [

8,

9] carry importance. Gibson and Schwartz [

8] also shows “the mean reverting tendency as well as the variability of its changes requires a stochastic representation in order to price oil-linked securities accurately”. In [

9], a model that depends on a two-factor model for pricing financial and real assets contingent on the price of oil is developed. For valuing futures contracts, the parameters of the model were estimated by using the data between January 1984 and November 1988, and the model was tested on the out of sample data between November 1988 and May 1989. The purpose of the current work is to offer an approach applicable to pricing based on the Lie method.

In this paper, we employ Lie algebras method to solve stochastic differential equation (DE) of short-term model of the oil. We suggest that the model is governed by a stochastic differential equation model on a curved state space and develop oil price models using matrix representations and differential operator on the ${S}^{2}$ manifold. In late 19th century, under a continuous group of symmetries, Lie discovered that special approaches to solve DEs were special states of a general integration process dependent upon the invariance of the DE. Nowadays, the applications of Lie groups have a deep impact on the branches of mathematics, mechanics, and robotics sciences.

Especially, in mathematical finance, a few papers employed the Lie method to provide awareness to the structure of related partial differential equations. The approach of employing general differentiable manifolds in interest rate models appears in [

10,

11,

12]. Gazizov and Ibragimov [

11] used the Lie method in the context of Black–Scholes–Merton equation. Lo and Hui [

13] and Carr et al. [

14] constructed a concrete example of a short-rate model on the circle S

^{1}. Park et al. [

15] tested the proposition that nonlinear and random behavior of interest rates is governed by a stochastic differential equation model on a curved state space. They developed short-term interest rate models on

${S}^{1}$ and

${S}^{2}$ manifolds using matrix representations instead of differential operator representations of Lie groups.

In this paper, we employed spot price (WTI crude daily oil price) during the period from 2 January 1986 to 5 April 2021. The selected period includes some important events that had impacts on the oil price, such as multiple economic crises (1981, 2001, and 2008), US military intervention in Iraq, COVID-19, etc. These factors lead to nonlinear behavior in the oil price between spot and futures contracts.

Therefore, modeling dynamic processes and solving stochastic differential equations (SDE) are important. As is widely recognized, the solutions of DEs yield a set of symmetries that corresponds to Lie groups. In this paper, we employ a model on S

^{2} manifolds that uses matrix representations instead of differential operator representations of Lie algebras. As accented by [

15], the drift and noise volatility terms of the stochastic state equations are worked out to reflect various observed phenomena. We try to keep these terms simple and instead choose an underlying state space that is curved. Park et al. [

15] and Goard [

16] used the ordinary least square (OLS) estimation method for parameter estimation. We preferred the nonlinear least square (NLS) method for parameter estimation due to the nonlinear behavior in the specified period.

As our primary contribution, we propose the use of LSTM networks for forecasting in the domain obtained by the Lie method. Specifically, we suggest both the hybrid Lie-LSTM_{OLS} and the hybrid Lie-LSTM_{NLS} models for more reliable forecasting than standard regression methods in this domain. The forecasting performances of our proposed hybrid methods, Lie-LSTM_{OLS} and Lie-LSTM_{NLS}, are compared against those of the Lie _{NLS} and Lie _{OLS} standard regression methods on the WTI oil price data.

The paper is organized as follows. In

Section 2, the orthogonal matrix Lie groups and algebras are given, and then the oil price model is defined on the Lie groups SO(3). In

Section 3, the data is presented, and some of its descriptive statistics are given. In

Section 4, results are presented and discussed, and the last section gives the conclusion.

## 3. Data and Some Descriptive Statistics

In this paper, daily West Texas Intermediate (WTI) Crude Oil Prices dataset acquired from the FRED Economic Data. It includes oil price data between 2 January 1986 and 5 April 2021.

The published oil price is the spot price given as

As with [

15], we have used Monte Carlo simulation to evaluate the above expectation.

Firstly, the descriptive statistics of WTI oil price data were obtained, and unit root test was applied. In

Table 1, the statistics are shown. Since the data exhibits excess kurtosis, it cannot be modelled by a normal distribution, as confirmed by the Jarque–Bera (JB) test. The main problem seems to be excess kurtosis but not so much excess skewness.

From the unit root test results in

Table 1, it can be seen that H

_{0} hypothesis can be accepted for all variables at the level. ADF and KSS tests suggest the stationarity of the data at the level.

Next, the results of the nonlinearity tests are presented in

Table 2 and

Table 3. In

Table 2, Teraesvirta’s neural network test, White neural network test, Likelihood ratio test for threshold nonlinearity, and Tsay’s test for nonlinearity indicate that the linear form is mis-specified. Teraesvirta and White tests perform similarly to the Tsay test.

The BDS test (Brock et al. [

20]) in

Table 3 suggests that the (linear) functional form is misspecified for the variables.

## 4. Models and Results

The Lie parameters in Equation (5) were obtained by using the OLS and NLS methods.

Table 4 shows estimates of the Lie parameters. The

${\alpha}_{2}$ coefficient estimates obtained with the two methods turned out to be very similar, whereas

${\alpha}_{3}$ coefficient estimates obtained with the two methods were significantly different. The AIC values obtained with both models are similar to each other.

It is interesting that Lie-OLS model passes RESET, BP, and ARCH tests with values very close to the critical value. On the other hand, the statistical tests of the Lie_{NLS} model gave more successful results than the statistical tests of the Lie-OLS model.

Next, the forecasting performances with the Lie_{OLS} and Lie_{NLS} models were analyzed. LSTM was used to improve the forecasting performances of these models. In order to apply the LSTM model, the dataset was partitioned into an in-sample training set and out-sample test set corresponding to the time intervals between 2 January 1986–20 October 2019 and 21 October 2019–5 April 2021, respectively.

The configuration of our LSTM network is as follows:

Input samples consist of sequence segments of 30 timesteps, each having 1 feature (price).

Input layer is connected to an LSTM unit with 25 hidden neurons and a dropout value of 0.20.

LSTM output feeds a dense layer (output) with one neuron and linear activation function

Training is performed in batches of 32 samples.

The model giving the lowest RMSE and MAE values is deemed the most successful model.

#### 4.1. In-Sample Forecast Results

Table 5 presents the results of the LSTM method integrated with the Lie

_{OLS} model or the Lie

_{NLS} model. As references for comparison, the results with the Lie

_{OLS} and Lie

_{NLS} models (employing traditional regression techniques) are also presented.

It can be observed from

Table 5 that the Lie-LSTM

_{NLS} model gives more successful results than the Lie-LSTM

_{OLS} model. More importantly, it is also seen that Lie-LSTM

_{OLS} and Lie-LSTM

_{NLS} models give more successful results than the Lie

_{OLS} and Lie

_{NLS} models used for reference

_{.}#### 4.2. Out-of-Sample Forecast Results

The RMSE and MAE values for Lie

_{OLS}, Lie

_{NLS}, Lie-LSTM

_{OLS}, and Lie-LSTM

_{NLS} were obtained to explore their forecast accuracy for T+10 and T+20 workdays in

Table 6. The out-of-sample results indicate that Lie-LSTM

_{NLS} provides the highest out-of-sample forecast accuracy.

#### 4.3. Test for Forecast Accuracy

The Wilcoxon signed-rank and Diebold–Mariano (DM) tests were applied to test the equivalence of forecast accuracy (null hypothesis H_{0}).

In

Table 7 and

Table 8, the

p values of calculated DM and Wilcoxon test statistics are 0.00, and both of them are significant at the 1% significance level. The H

_{0} hypothesis of these tests assume the models have the same level of accuracy. For most cases, since the

p-value is <0.05, H

_{0} hypothesis is rejected. For both tests, the

p-value is >0.05 only for the RMSE comparison of the Lie

_{OLS} and Lie

_{NLS} models. Hence, these two models are comparable in terms of RMSE performance.

## 5. Discussion and Conclusions

We have proposed hybrid models for analyzing the short-term model of the oil during the period from 2 January 1986 to 5 April 2021. In our basic model, the Lie group SO(3) is a differential manifold, and it can be identified with unit sphere ${S}^{2}$. We have additionally integrated this model with LSTM to test it.

The previous works by [

8,

9] differ significantly from the current work since [

8] only discussed the mean reverting tendency of oil price, and the model by [

9] depends on a two-factor model for pricing financial and real assets contingent on the price of oil. Specifically, it estimates the parameters of joint stochastic processes modelling oil-contingent claim and futures contract based on spot prices and net convenience yield and uses this model to value futures contracts. On the other hand, our work develops the short-term model and solves it with the hybrid model of Lie method and LSTM network. Although Lie algebras have been used in interest rate models in many works in the literature, the current work is the first to use Lie algebras for oil price modelling. When the time series for oil prices are considered, it is seen that their distributions have positive skewness. Even though some other methods might be tried out to perform analyses on these processes, this problem can be readily addressed with our proposed Lie model. This can be attributed to the modelling of the oil price on the

${S}^{2}$ manifold using matrix representations and differential operators in our suggested Lie

_{OLS} and Lie

_{NLS} models. Then, we maintain that carrying out a modelling with the Lie-LSTM methods obtains good forecasting results.

Lie_{OLS}-LSTM and Lie_{NLS}-LSTM methods facilitate the numerical computations for stochastic differential equations on differentiable manifolds. By employing SO(3) group structure, the oil prices that have a positive skewness and high JB and kurtosis can be described in a geometric way. Moreover, by jointly using the Lie and LSTM methods, it becomes possible to increase forecasting performance by representing the complex structure.

As stated in Park et al. [

15] for interest rate, the Lie group models in the current work show that the closed form formulas can only be an exception rather than the rule for oil price prediction, and therefore one should resort to numerical approaches for such prediction. Additionally, while the previous works ([

15,

16] for bond pricing) have employed either OLS or NLS methods to estimate the model parameters, the current work investigated which of these methods works best with the Lie method. Each of these combinations, Lie

_{OLS} and Lie

_{NLS}, were integrated with the LSTM network to get a hybrid model with improved forecasting performance. Specifically, in price forecasting 10 and 20 days into the future, the models incorporating LSTM yielded smaller RMSE and MAE values. According to Wilcoxon signed-rank and Diebold–Mariano tests, Lie-LSTM

_{NLS} model turned out to be the most successful one in terms of forecasting performance among the four models considered.

In this study, we showed that the analysis of the short-term model of the spot price of oil by using the Lie method is important. The model that we propose can also be used to analyze the relationship between futures and spot prices of many commodities other than oil.