How Long Is a Long Tail?
Efficiently utilizing long-tail data involves tracking those low-impact but potentially costly claims that can develop over decades. But what defines the length of this data, and why do carriers' and bureaus’ definitions vary so widely?
September 25, 2023
In statistics, a long tail describes a portion of a data set with many occurrences far from the central part of the distribution. In claims data, this can describe lengthy, developing claims with long settlement periods, such as in workers’ compensation claims that age with an injured worker.
“The notion of a long tail is associated mainly with claim development and the estimation of an entity’s ultimate claim liabilities,” said Leslie Martin, Senior Pricing Actuary at Safety National. “For example, carriers will have significantly more years of data for estimating outstanding liabilities, whereas bureaus collect data from members for the specific purposes they are authorized or permitted to provide by regulations or their membership.”
What is considered a long tail?
Datasets used for analyzing insurance vary by the line of business and intended purpose of use. For instance, property coverage, a short-tail line, typically involves claims that settle very quickly. Analysis often relies on a small number of short periods of data to estimate ultimate claim liabilities and evaluate trends. As data progresses into more complex lines of business, like workers’ compensation, carriers may have knowledge of claims within a short period of time but not understand the realized value for decades due to numerous factors like the age of the injured worker, the type of injury, and the injured worker’s comorbidities and response to medical treatment. These claims may include coverage for the remainder of the injured worker’s life, depending on state benefit structures.
Depending on the longevity of a carrier’s operations and their data practice, they may have over 40 years of available claims information. New carriers or carriers new to writing longer tail lines of business may need to supplement their data with assumptions or external data to fully recognize the long-tail exposure.
What issues might arise from shorter-term bureau data?
Rating bureaus will typically work with a shorter dataset because they use what they receive from their members and what they need to provide their services. For a long-tail line of business like workers’ compensation, the amount of historical data available has been limited by the computing technology available at the time of collection, what was requested, and what members were able to provide. An important consideration for the industry and regulators is the cost-benefit tradeoff from the bureau collecting data. If there is not sufficient benefit, there will be an increase in system costs.
Bureaus do collect recent data on older long-tail claims, but there are limited evaluations of those particular claims. The available data provides details useful in assessing current trends but may not be sufficient to meet their members’ needs for evaluating long-term trends or ultimate claim liabilities.
A primary goal of the bureaus is to establish rates or loss costs as allowed by state regulations. For workers’ compensation, the bureaus also promulgate experience modifications in most states. In addition, many bureaus provide analysis based on data they collect, which benefits customers who may not have the data or wherewithal to perform the analyses they need. However, because of the bureaus’ influence, there is a potential for misinterpretation, when users do not understand the different datasets available and the limitations of the data.
The bureau data may also remove self-insureds from the equation, who rely on long-tail data to ensure they comprehend their total liability. Trend data can help directionally for a current accident year, but self-insured groups must grasp the impact of developing claims. These groups might not have historical claims data that can provide insight into whether or not a claim may remain open for another 30-40 years. The absence of self-insureds from bureau datasets leaves out large portions of the population, which affects how applicable their analysis is in specific industries, like public entities.
How might rated ages affect long-tail data?
For severe workers’ compensation injuries, the basis for estimating the ultimate value of a claim may differ between datasets. Carriers and TPAs may evaluate the claim using a rated age instead of the injured worker’s actual age. A rated age is often determined from the life expectancy tables of disabled individuals. Generally, a rated age is higher than the injured worker’s actual age and life expectancy will be lower. With advancements in medical treatment, many injured workers may survive beyond the assumed rated age, which can appear as a larger development later in the life of the claim compared to an estimate based on an injured worker’s actual age. This treatment can differ from data reported to the rating bureaus, which are based on specified reporting rules and often include discounting the indemnity portion of the claim.