Skip to contents

In some AVL vendors, multiple vehicles or operators may be logged to the same trip ID at the same time. This may be acceptable in some scenarios (e.g., a vehicle/operator tradeoff mid-trip). Other times, it may be an error, with these distinct (trip, vehicle, operator) truples running simulataneously. This function identifies both scenarios, and gives the option to remove one or both.

Usage

clean_overlapping_subtrips(
  distance_df,
  check_operator = FALSE,
  remove_single_observations = TRUE,
  remove_non_overlapping = FALSE,
  return_removals = FALSE
)

Arguments

distance_df

A dataframe of linearized AVL data. Must include event_timestamp, trip_id_performed, and vehicle_id. Optionally, may include operator_id.

check_operator

Optional. A boolean, should overlaps of multiple operator_ids be checked for? Default is FALSE.

remove_single_observations

Optional. A boolean, should subtrips with only one observation be removed? Default is TRUE.

remove_non_overlapping

Optional. A boolean, should trips with multiple vehicles or operators that do not overlap be removed? Default is FALSE.

return_removals

Optional. A boolean, should the function return a dataframe of trips removed and why? Default is FALSE.

Value

The input distance_df, with violating trips removed. If return_removals = TRUE, a dataframe with trip IDs and the reason why it was identified for removal.

Examples

# Get input data
c53_dists <- new_transittraj_data("get_linear_distances")
dim(c53_dists)
#> [1] 639  11

# Run function
c53_no_overlaps <- clean_overlapping_subtrips(distance_df = c53_dists)
dim(c53_no_overlaps)
#> [1] 639  11
head(c53_no_overlaps)
#>   location_ping_id vehicle_id trip_id_performed service_date route_id
#> 1             1586       5516          13437100   2026-02-16      C53
#> 2             1667       5516          13437100   2026-02-16      C53
#> 3             1694       5516          13437100   2026-02-16      C53
#> 4             1775       5516          13437100   2026-02-16      C53
#> 5             2018       5516          13437100   2026-02-16      C53
#> 6             2261       5516          13437100   2026-02-16      C53
#>   direction_id  speed trip_stop_sequence     event_timestamp stop_id distance
#> 1            0 6.4008                  2 2026-02-16 11:08:31   13111  0.00000
#> 2            0 0.0000                  2 2026-02-16 11:09:01   13111  2.08491
#> 3            0 0.0000                  2 2026-02-16 11:09:11   13111  2.08491
#> 4            0 0.0000                  2 2026-02-16 11:09:41   13111  2.08491
#> 5            0 0.0000                  2 2026-02-16 11:11:12   13111  2.08491
#> 6            0 0.0000                  2 2026-02-16 11:12:43   13111  2.08491