Beyond Equations - Predicting Gold vs Dollar Prices: A Data-Driven Forecasting Approach

Predicting Gold vs Dollar Prices: A Data-Driven Forecasting Approach

Forecasting the price of Gold against the US Dollar is a complex yet fascinating challenge, influenced by numerous economic, geopolitical, and market factors. In this project, I set out to build a robust predictive system to forecast Gold/USD prices using high-frequency data, aiming to empower data-driven decisions in trading and financial analysis.

Project Goals

Gold prices are highly volatile and influenced by various macroeconomic and geopolitical factors, making accurate prediction difficult yet valuable for traders and analysts. To address this, I developed multiple forecasting models using Python and key libraries such as scikit-learn and Facebook's Prophet. The models I implemented include:

Ordinary Least Squares (OLS) Linear Regression
Ridge Regression (with and without cross-validation)
Lasso Regression
Meta Prophet (time series forecasting)

Each model was trained on historical Gold/USD price data with a timestamp resolution of minutes and evaluated for prediction accuracy and stability.

Data Collection and Preprocessing

The dataset (Kaggle) comprised detailed historical gold price records, each stamped with a precise date and time in the format YYYY.MM.DD HH: MM. To make the data compatible with machine learning techniques, the timestamps were converted into numerical values using Unix epoch time representation. This transformation allowed the models to interpret temporal patterns as numeric input features.

Additionally, the data was scaled using standard normalization techniques to improve model stability and convergence during training.

Methodology

To capture the complex dynamics of gold price fluctuations, I designed a pipeline that prepares the data and feeds it into multiple predictive models. Each model was trained on the processed data to learn the underlying patterns and generate price forecasts.

1. Ordinary Least Squares (OLS) Linear Regression and Its Application in Gold/USD Price Prediction

What is Ordinary Least Squares (OLS) Linear Regression?

Ordinary Least Squares (OLS) Linear Regression is one of the fundamental techniques in statistical modeling and machine learning used to understand the relationship between a dependent variable and one or more independent variables. It assumes a linear relationship of the form:

y=β0+β1x1+β2x2+⋯+βnxn+εy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n + \varepsilon

Where:

yy is the dependent variable (target/output),
xix_i are the independent variables (features),
βi\beta_i are coefficients that represent the impact of each feature,
ε\varepsilon is the error term (residuals).

The goal of OLS is to find the coefficients βi\beta_i that minimize the sum of squared residuals — the differences between the observed values and the predicted values:

min⁡β∑i=1m(yi−y^i)2\min_{\beta} \sum_{i=1}^m (y_i - \hat{y}_i)^2

This approach finds the best-fitting linear line through the data points that minimizes the total squared error.

Why Use OLS for Predicting Gold Prices?

OLS Linear Regression provides a straightforward, interpretable model which can serve as a baseline in forecasting tasks. Despite its simplicity, it helps capture linear trends and dependencies in the time-series data and is computationally efficient, making it a good starting point before exploring more complex methods.

How I Used OLS in the Gold vs Dollar Prediction Project

1. Data Preparation:

The historical Gold price data included timestamps and corresponding prices at 5-minute intervals. To feed the OLS model, the date/time values were converted into numerical Unix epoch timestamps, allowing the model to treat time as a continuous numeric feature. The price data was scaled to normalize value ranges, improving model stability.

2. Feature and Target Definition:

o Feature (X): The Unix timestamp representing the exact time of the recorded price.

o Target (y): The Gold price in USD at that time.

3. Training the Model:

Using the processed data, the OLS model was trained by fitting a linear equation that best relates the timestamp (time progression) to the Gold price. This training process computed the coefficients (β0\beta_0 and β1\beta_1) minimizing prediction errors on the training set.

4. Model Prediction:

After training, the OLS model was used to predict future Gold prices by inputting timestamps for future time points (e.g., every 5 minutes over the next 24 hours). The predicted prices form a linear trend line extrapolating from the historical data.

5. Evaluation and Visualization:

The OLS predictions were plotted alongside actual historical prices to visually assess the model’s fit. While OLS captures the general trend, it struggles with volatility and non-linear patterns, highlighting the need for more advanced models.

Benefits and Limitations of Using OLS in This Context

Benefits:

o Simple and interpretable coefficients allow understanding of the linear trend.

o Fast training and prediction suitable for baseline comparison.

o Requires minimal computational resources.

Limitations:

o Assumes linearity, which may not hold in volatile financial time-series.

o Sensitive to outliers and noise common in market data.

o Cannot capture complex non-linear relationships or sudden market shifts.

Summary:

OLS Linear Regression served as a foundational model in the project, helping establish a baseline for predicting Gold prices based on time progression. It provided valuable insights into the overall trend but also pointed to the necessity of more sophisticated approaches to handle the intricate dynamics of the Gold/USD market.

2. Ridge Regression and Its Application in Gold/USD Price Prediction

What is Ridge Regression?

Ridge Regression is a type of linear regression that addresses some of the limitations of Ordinary Least Squares (OLS), especially when the predictor variables are highly correlated (multicollinearity) or when the model tends to overfit the training data. It introduces a regularization term to the loss function that penalizes large coefficients.

The Ridge Regression objective function is:

min⁡β∑i=1m(yi−y^i)2+α∑j=1nβj2\min_{\beta} \sum_{i=1}^m (y_i - \hat{y}_i)^2 + \alpha \sum_{j=1}^n \beta_j^2

Where:

The first term is the usual residual sum of squares (like OLS).
The second term is the L2 regularization penalty, where α\alpha is a hyperparameter controlling the amount of shrinkage applied to coefficients.
Larger α\alpha values cause coefficients to shrink more towards zero, reducing model complexity and preventing overfitting.

Why Use Ridge Regression for Predicting Gold Prices?

Financial time series like Gold prices can be noisy, with potential multicollinearity among derived features or time-based predictors. Ridge Regression helps improve the model's generalization by controlling coefficient size, which stabilizes predictions on unseen data and reduces sensitivity to outliers or noisy fluctuations.

How I Used Ridge Regression in the Gold vs Dollar Prediction Project

1. Data Processing:

As with OLS, the Gold price data was timestamped and scaled. To improve model robustness, additional engineered features (e.g., lagged prices, moving averages) could have been included to enrich the feature set, but the core feature remained the numeric timestamp.

2. Model Training:

I trained the Ridge Regression model using the same features and target values. The regularization parameter α\alpha was tuned through cross-validation to find an optimal balance between bias and variance, ensuring the model neither overfits nor underfits the data.

3. Coefficient Shrinkage:

Ridge Regression’s penalty term shrank the regression coefficients toward zero without forcing any to become exactly zero, thus maintaining all features but reducing their influence if they were not strongly predictive.

4. Prediction and Evaluation:

The model predicted Gold prices on test data and future timestamps similarly to OLS but exhibited smoother predictions less sensitive to noise or minor data fluctuations. The reduced variance improved forecast stability, especially in periods of volatile price changes.

5. Comparison with OLS:

Ridge Regression often outperformed OLS in terms of mean squared error on validation data, thanks to its regularization that helped prevent overfitting and improved generalization.

Benefits and Limitations of Using Ridge Regression in This Context

Benefits:

o Controls model complexity and reduces overfitting via L2 regularization.

o Handles multicollinearity better than OLS by shrinking correlated feature coefficients.

o Provides more stable and reliable predictions on noisy financial data.

Limitations:

o Coefficients are shrunk but never zeroed out, so it does not perform feature selection.

o Choice of α\alpha is crucial; improper tuning can lead to underfitting or overfitting.

o Still assumes a linear relationship between features and target.

Summary:

Ridge Regression needs more testing. The prediction data quickly fall but then stays consistent as it would appear for every day with slight movements which looks realistic, but the fist 5 minute drops says that the data set was not trained on the new data which might be causing this as the Gold price has been steadily increasing for last five Months.

3. Lasso Regression and Its Role in Gold/USD Price Prediction

What is Lasso Regression?

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a regularized linear regression technique that performs both coefficient shrinkage and feature selection. Unlike Ridge Regression, which uses an L2 penalty (squares of coefficients), Lasso uses an L1 penalty — the sum of the absolute values of the coefficients.

The Lasso objective function is:

min⁡β∑i=1m(yi−y^i)2+λ∑j=1n∣βj∣\min_{\beta} \sum_{i=1}^m (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^n |\beta_j|

Where:

The first term is the residual sum of squares as in OLS.
The second term is the L1 regularization penalty, controlled by the hyperparameter λ\lambda.
This L1 penalty tends to shrink some coefficients exactly to zero, effectively selecting a simpler model with fewer features.

Why Use Lasso Regression for Predicting Gold Prices?

In financial datasets, many features (including engineered ones like lagged variables, moving averages, or technical indicators) can be redundant or irrelevant. Lasso helps by automatically performing feature selection during training, removing unimportant variables and reducing model complexity. This is especially useful when working with many features or noisy data.

How I Used Lasso Regression in the Gold vs Dollar Prediction Project

1. Feature Engineering:

The dataset included not only the main predictor (e.g., timestamps) but also additional derived features representing trends, volatility, or past values. Lasso helped identify which of these features were most relevant to predicting Gold prices by shrinking insignificant coefficients to zero.

2. Model Training and Tuning:

I trained the Lasso Regression model on the training data, tuning the λ\lambda parameter (also called alpha in scikit-learn) via cross-validation. This tuning balanced between sparsity (fewer features) and predictive accuracy.

3. Feature Selection and Sparsity:

Due to the L1 penalty, some coefficients became exactly zero, effectively eliminating those features from the model. This improved model interpretability by highlighting only the most influential factors driving Gold price changes.

4. Prediction and Performance:

The Lasso model produced predictions that were more robust to noise by ignoring irrelevant or redundant features. This sparsity helped prevent overfitting, leading to improved generalization on test data compared to standard linear regression.

5. Comparisons with Other Models:

Lasso often provided a more interpretable and compact model than Ridge Regression, especially when many candidate features were involved. It sometimes traded a slight decrease in raw predictive power for better simplicity and insight.

Benefits and Limitations of Using Lasso Regression Here

Benefits:

o Performs automatic feature selection by zeroing out less important coefficients.

o Reduces model complexity, making results easier to interpret.

o Helps prevent overfitting by regularizing coefficients.

o Useful when dealing with high-dimensional feature spaces.

Limitations:

o Can be unstable if features are highly correlated — it may arbitrarily select one feature and ignore others.

o Requires careful tuning of the regularization parameter λ\lambda.

o Assumes linearity in relationships between features and target.

Summary:

Lasso Regression is also similar to Ridge. I will test it again and see if it was an issue with training data set.

4. Meta Prophet and Its Role in Gold/USD Price Prediction

What is Meta Prophet?

Meta Prophet (usually called Facebook Prophet), is an open-source forecasting tool developed by Facebook designed to handle time series data with strong seasonal effects and several seasons of historical data. It is particularly useful for business forecasting tasks where trends, seasonality, and holidays or events influence the target variable.

Prophet models time series data by decomposing it into three main components:

Trend: Long-term increase or decrease in the data.
Seasonality: Periodic fluctuations (daily, weekly, yearly).
Holidays/Events: Effects of special days or events.

It uses an additive regression model with piecewise linear or logistic growth trends and flexible seasonality modeled using Fourier series.

Why Use Meta Prophet for Predicting Gold Prices?

Financial time series like Gold prices can have complex patterns, including:

Long-term trends due to economic factors.
Seasonal effects based on market cycles or geopolitical events.
Sudden spikes or dips due to unexpected global news.

Prophet excels in capturing these components even when data has missing points or irregularities, making it a good candidate for forecasting Gold prices.

How I Used Meta Prophet in the Gold vs Dollar Prediction Project

1. Data Preparation:

The Gold price data was formatted into the specific structure Prophet requires: a dataframe with columns ‘ds’ (date/time) and ‘y’ (target variable, Gold price).

2. Trend and Seasonality Modeling:

Prophet automatically detected and modeled the underlying trend of the Gold prices over time. It also captured seasonal patterns such as possible monthly or quarterly cycles in price movement.

3. Incorporating Holidays or Events:

If relevant financial or geopolitical events impacting Gold prices were known, these could be added as custom holidays/events in the model to improve prediction accuracy.

4. Fitting the Model:

I trained Prophet on historical Gold prices to learn the trend and seasonality components. Its piecewise linear trend fitting helped accommodate shifts in economic conditions or market regimes.

5. Forecasting:

Using the trained Prophet model, I forecasted future Gold prices over the desired horizon (e.g., next 30 days). The model output included confidence intervals reflecting prediction uncertainty.

6. Model Evaluation:

I compared Prophet’s forecasts with actual test data to measure accuracy using metrics such as MAE and RMSE. Prophet often performed well in capturing seasonal fluctuations and sudden changes compared to simple linear models.

Benefits and Limitations of Using Meta Prophet Here

Benefits:

o Automatically models multiple seasonalities and trends without extensive feature engineering.

o Handles missing data and outliers gracefully.

o User-friendly with straightforward tuning options for trend changepoints and seasonality modes.

o Provides intuitive visualization of components (trend, seasonality, holidays).

Limitations:

o Assumes additive model structure — may not fully capture complex nonlinear relationships.

o Requires sufficient historical data with seasonal patterns to perform optimally.

o Less suited for very high-frequency intraday trading data.

Summary:

Meta Prophet (best) provided a robust, interpretable time series forecasting approach that modeled Gold prices’ underlying trends and seasonal effects. It enhanced the prediction pipeline by offering a complementary, domain-informed perspective beyond classical regression techniques.

Visualization and User Interaction

I also developed a dynamic visualization dashboard using Plotly, enabling interactive exploration of predicted prices over a specified future horizon. Users can input starting dates and base price levels to customize forecasts.

The interactive chart displays predicted Gold price movements over the next 24 hours, with 5-minute granularity. Features include:

Clean, responsive time axis with readable date and time labels
Price lines clearly depicting forecast trends
Hover tooltips providing detailed price information at each timestamp

This visualization empowers users to gain intuitive insights into forecasted price behavior and supports informed decision-making.

Challenges and Insights

Working with high-frequency financial data for predicting Gold vs Dollar prices presented several challenges:

Managing precise timestamp conversions and ensuring consistent scaling across datasets.
Dealing with noisy and volatile price signals, which can confuse predictive models and reduce accuracy.
Balancing model complexity to avoid overfitting, while still capturing meaningful patterns for better generalization.

Despite these challenges, the project underscored the critical role of comprehensive data preprocessing, flexible visualization, and iterative model refinement in building effective forecasting solutions.

Among the models tested, Meta Prophet performed particularly well. This success can be attributed to its robust handling of trend changes and seasonal components inherent in financial time series, allowing it to adapt to market fluctuations without excessive manual tuning.

The Ordinary Least Squares (OLS) regression also showed promising results, indicating strong linear relationships in the data. Its performance is expected to improve further with increased training data, which would help the model better generalize to unseen price movements.

On the other hand, Ridge and Lasso regressions did not perform as strongly in this experiment. This may be due to their sensitivity to noisy data and the limited size of the training set, suggesting that these regularization-based methods might require larger or more feature-rich datasets to unlock their full potential.

Overall, the experience highlighted that no single model is universally best; combining insights from multiple approaches and refining data quality is key to improving financial forecasting accuracy.

Future Directions

There are several promising avenues to enhance the project further:

Incorporate additional features such as trading volume, macroeconomic indicators, and technical analysis metrics to enrich the predictive input.
Develop a unified training framework that allows multiple models to be trained simultaneously on the same dataset, enabling identification of intersecting patterns and feeding these insights into a correction layer to improve overall prediction accuracy.
Experiment with ensemble modelling techniques to combine the strengths of diverse predictors and reduce individual model biases.
Implement real-time prediction pipelines and deploy forecasting as a scalable service for practical applications.
Explore anomaly detection methods to flag unusual market conditions that can significantly impact price forecasts.
In the longer term, build a custom deep learning model using TensorFlow designed specifically for Gold vs Dollar price prediction, leveraging time gaps and sequence modelling to capture complex temporal dependencies and improve forecast precision.