Great algorithm for analyzing timeseries data or array of numbers.
An algorithm for finding the contiguous subarray within a one-dimensional array of numbers that has the largest sum. It is called Kadane’s algorithm.
How does the algorithm work? It looks for a global maximum of positive-sum on any sub-array, regardless of its starting position or its length. Numbers must be next or together in a sequence (without any number left out).
With formula as: local_maximum at index i is the maximum of A[i] and the sum of A[i] and local_maximum at index i-1. So, you calculate the sum of every possible subarray ending with the element array[n-1]. Then, we would calculate the sum of every possible subarray ending with array[n-2], array[n-3] and so on up to array[0].
With a simple function we can iterate through the array (vector) of values:
#sample vector
v <- c(-3,-8, 1, -2, 1, 5, -3, -4, 3, 10, -2, 4, 1)
kadane <- function(v){
max_so_far = -999999999999999 #some obnoxiously big number
max_ending_here = 0
for (i in 1:length(v)) {
max_ending_here = max_ending_here + v[i]
if (max_so_far < max_ending_here ){
max_so_far = max_ending_here
}
if (max_ending_here < 0) {
max_ending_here = 0
}
}
return (max_so_far)
}
Run the function with:
# run the function
kadane(v)
The function can also return the starting and ending positions of the array.
kadane_with_position <- function(v){
max_so_far = -999999999999999 #some obnoxiously big number
max_ending_here = 0
subarray_start = 0
subarray_end = 0
int_s = 0
for (i in 1:length(v)) {
max_ending_here = max_ending_here + v[i]
if (max_so_far < max_ending_here ){
max_so_far = max_ending_here
subarray_start = int_s
subarray_end = i
}
if (max_ending_here < 0) {
max_ending_here = 0
int_s = i + 1
}
}
cat("Sum is: ", max_so_far, " with starting position: ", subarray_start, " and ending: ", subarray_end)
}
# run the function
kadane_with_position(v)
Kadane’s algorithm has also an interesting time complexity function. With O(n) for an array, it can explode exponentially when using 2D matrices: O(n^3).
R library Adagio offers this function with ability to calculate over matrices.
As always, code is available at Github in the same Useless_R_function repository. Check Github for future updates.
Happy R-coding and stay healthy!“
[…] by data_admin [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page […]
LikeLike
[…] [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (Anda dapat melaporkan masalah tentang konten di halaman […]
LikeLike
Hi, thanks for sharing! I do have two questions.
(1) Could you briefly describe what the second if expression does?
(2) Is there a way to add a condition limiting the length of the subarrays? For instance, if there is an array with 100 elements and one is interested in the greatest sub array with at maximum 20 contiguous elements.
LikeLike