Machine Learning: Used to train models like neural networks, improving accuracy by minimizing error functions.
Deep Learning: Vital for optimizing complex neural network architectures in tasks like image recognition and natural language processing.
Reinforcement Learning: Enables agents to learn optimal strategies by updating policies or value functions in interaction with environments.
Optimization Problems: Applies to various domains, including physics, engineering, finance, and healthcare, to minimize costs or maximize utility functions.
Computer Vision: Key for training convolutional neural networks to classify objects, detect features, and segment images accurately.
Finance and Economics: Helps in financial modeling, algorithmic trading, and risk management by optimizing trading strategies and pricing financial instruments.
Healthcare and Biology: Utilized in drug discovery, genomic analysis, and medical image analysis to optimize models for diagnosis and treatment planning.
Parameter Estimation: Used in maximum likelihood and maximum a posteriori estimation to find model parameters that best fit observed data.
Regression: Employed in linear and logistic regression to minimize errors between observed and predicted values.
Regularization: Implements L1 and L2 regularization to prevent overfitting in models.
Dimensionality Reduction: Utilized in PCA and factor analysis to reduce data dimensionality while preserving information.
Clustering: Optimizes objective functions in k-means and Gaussian mixture models for better cluster formation.
Survival Analysis: Estimates parameters in survival models like Cox proportional hazards model for analyzing time-to-event data.
\[\begin{align} Y(x_i) = \theta_0 + \theta_1 x_i \end{align}\]
\[\begin{align} J(\theta_1,\theta_0) = \overbrace{\frac{1}{N}\sum_{i=1}^N(y(x_i)-y_i)^2}^{MSE} \end{align}\]
In order to solve for the parameters we do some math and geht the partial derivatives:
\[\begin{align} \frac{\partial \text{MSE}}{\partial \theta_0} &= -\frac{2}{n} \sum_{i=1}^n (y_i - \hat{y}_i) \\ \frac{\partial \text{MSE}}{\partial \theta_1} &= -\frac{2}{n} \sum_{i=1}^n (y_i - \hat{y}_i) x_i \end{align}\]
\(\theta_0\) and \(\theta_1\) are then iteratively updated:
\[\begin{align} \theta_0 &\leftarrow \theta_0 - \alpha \frac{\partial \text{MSE}}{\partial \theta_0} \\ \theta_0 &\leftarrow \theta_0 + \alpha \left( \frac{2}{n} \sum_{i=1}^n (y_i - \hat{y}_i) \right)\\ \theta_1 &\leftarrow \theta_1 - \alpha \frac{\partial \text{MSE}}{\partial \theta_1}\\ \theta_1 &\leftarrow \theta_1 + \alpha \left( \frac{2}{n} \sum_{i=1}^n (y_i - \hat{y}_i) x_i \right) \end{align}\]
\[\begin{align} Y(x_i) = \theta_0 + \theta_1 x_i \end{align}\]
gradient_desc <- function(theta_0, theta_1, x, y){
N = length(x)
pred <- theta_1*x + theta_0
res <- y - pred
delta_theta_0 <- (2/N)*sum(res)
delta_theta_1 <- (2/N)*sum(res*x)
return(c(delta_theta_0, delta_theta_1))
}
minimize_function <- function(theta_0, theta_1, x, y, alpha){
gd <- gradient_desc(theta_0, theta_1, x, y)
d_theta_0 <- gd[1] * alpha
d_theta_1 <- gd[2] * alpha
new_theta_0 <- theta_0 + d_theta_0
new_theta_1 <- theta_1 + d_theta_1
return(c(new_theta_0, new_theta_1))
}
alpha <- 0.1
iter <- 100
Copyright Prof. Dr. Tim Weber, 2024