Performance Benchmarks

Execution speed is rarely the primary consideration when selecting statistical software—correctness, interpretability, and ease-of-use usually take precedence. However, computational efficiency becomes relevant when working with large datasets, conducting simulation studies, or iterating through model specifications during exploratory analysis.

This article documents the computational performance of summata relative to established alternatives. The benchmarks presented here are intended as a reference for users whose workflows involve performance-sensitive operations, and as a record of the design tradeoffs inherent in different implementation approaches.

Methodology

All benchmarks were conducted using the microbenchmark package under the following conditions:

Iterations: 5–20 per benchmark, adjusted for computational intensity
Dataset sizes: 500 to 10,000 observations
Data structure: Simulated clinical trial data with continuous, categorical, and time-to-event variables
Predictors: 14 variables for screening benchmarks

Datasets were generated using a fixed random seed to ensure reproducibility. Timing measurements exclude package loading and data generation. All packages were tested using default parameters unless otherwise noted.

Two summata configurations are benchmarked throughout: the default configuration, which uses profile likelihood confidence intervals for GLM models and includes full formatting (QC statistics, sample sizes, reference rows); and a minimal configuration (summata_minimal), which uses Wald CIs and disables optional output features. This distinction is important because profile likelihood CIs dominate GLM execution time, and the minimal configuration provides a way to measure summata’s formatting overhead in isolation. Note that finalfit and broom also use profile likelihood CIs by default for GLM models.

library(summata)
library(microbenchmark)
library(ggplot2)

Descriptive Tables

Descriptive summary tables represent a common first step in data analysis. The following packages provide comparable functionality with differing implementation strategies.

Package	Function	Implementation Notes
`summata`	`desctable()`	`data.table` operations
`arsenal`	`tableby()`	Formula-based interface
`tableone`	`CreateTableOne()`	Matrix-based computation
`finalfit`	`summary_factorlist()`	`tidyverse` ecosystem
`gtsummary`	`tbl_summary()`	`gt` table framework

Dataset Size	`summata`	`arsenal`	`tableone`	`finalfit`	`gtsummary`
n = 1,000	42 ms	64 ms	46 ms	429 ms	2,901 ms
n = 5,000	57 ms	77 ms	79 ms	442 ms	2,929 ms
n = 10,000	73 ms	98 ms	126 ms	464 ms	3,001 ms

The observed timing differences reflect underlying implementation choices. Packages built on data.table or base R matrix operations (summata, tableone, arsenal) exhibit lower overhead than those employing more extensive formatting pipelines (gtsummary). The gtsummary package prioritizes output flexibility and gt integration, which introduces additional computational cost.

Survival Tables

Survival probability tables summarize Kaplan-Meier estimates at specified time points.

Package	Function	Notes
`summata`	`survtable()`	Formatted output
manual	`survival::survfit()`	Raw computation
`gtsummary`	`tbl_survfit()`	`gt` integration

Dataset Size	`summata`	`gtsummary`	manual
n = 1,000	21 ms	266 ms	6 ms
n = 5,000	35 ms	271 ms	11 ms
n = 10,000	52 ms	274 ms	14 ms

Direct survfit() computation provides a baseline for the minimum time required. The difference between raw computation and formatted output reflects the cost of table construction and presentation logic.

Regression Output

The following benchmarks compare functions that extract and format regression coefficients. Each package produces tables suitable for publication, though with varying levels of default formatting. Compared functions are as follows:

Package	Function	Notes
`summata`	`fit()`	Profile likelihood CIs, QC stats, counts, and reference rows
`summata_minimal`	`fit(..., conf_method = "wald", show_n = FALSE, show_events = FALSE, reference_rows = FALSE, keep_qc_stats = FALSE)`	Wald CIs, reduced output
`finalfit`	`glmuni()` + `fit2df()`	Profile likelihood CIs (default)
`broom`	`tidy()`	Profile likelihood CIs via `confint()` dispatch
`gtsummary`	`tbl_regression()`	`gt` formatting

Logistic Regression

Dataset Size	`summata_minimal`	`summata`	`finalfit`	`broom`	`gtsummary`
n = 500	18 ms	174 ms	147 ms	150 ms	1,344 ms
n = 1,000	22 ms	234 ms	212 ms	214 ms	1,399 ms
n = 5,000	37 ms	840 ms	749 ms	936 ms	2,153 ms
n = 10,000	45 ms	1,532 ms	1,562 ms	1,564 ms	2,756 ms

The default summata configuration uses profile likelihood confidence intervals for GLM models, as do finalfit and broom::tidy(). The three packages show comparable performance for logistic regression because profile likelihood profiling dominates execution time for all of them. The summata_minimal configuration uses Wald CIs instead, skipping the profiling step entirely, and achieves the fastest extraction times at all sample sizes. At large n, profiling cost grows with the number of IRLS iterations, causing all profile-based packages to converge toward similar timings.

Linear Regression

Dataset Size	`summata_minimal`	`summata`	`finalfit`	`broom`	`gtsummary`
n = 500	20 ms	35 ms	6 ms	6 ms	1,179 ms
n = 1,000	21 ms	36 ms	7 ms	6 ms	1,193 ms
n = 5,000	27 ms	43 ms	9 ms	9 ms	1,192 ms
n = 10,000	38 ms	49 ms	13 ms	12 ms	1,230 ms

For linear models, broom::tidy() and finalfit achieve faster coefficient extraction due to lower formatting overhead. All three packages use exact t-distribution CIs for lm objects (via confint.lm()), so the timing difference reflects formatting features (reference rows, QC statistics) rather than CI computation.

Poisson Regression

Dataset Size	`summata_minimal`	`summata`	`finalfit`	`broom`	`gtsummary`
n = 500	20 ms	155 ms	135 ms	144 ms	1,293 ms
n = 1,000	24 ms	201 ms	181 ms	184 ms	1,325 ms
n = 5,000	34 ms	595 ms	613 ms	612 ms	1,868 ms
n = 10,000	47 ms	1,351 ms	1,409 ms	1,402 ms	2,577 ms

Poisson regression shows the same profile likelihood pattern as logistic regression: the default summata, finalfit, and broom all use profile CIs and show comparable performance. The summata_minimal configuration with Wald CIs is consistently the fastest option.

Cox Regression

Dataset Size	summata_minimal	summata	finalfit	broom	gtsummary
n = 500	17 ms	34 ms	7 ms	12 ms	1,149 ms
n = 1,000	21 ms	38 ms	9 ms	14 ms	1,161 ms
n = 5,000	40 ms	61 ms	25 ms	30 ms	1,227 ms

Cox models use Wald CIs regardless of the conf_method setting (the standard approach in survival analysis), so the timing difference between summata and summata_minimal reflects formatting overhead only. finalfit and broom achieve faster extraction with less formatting.

Mixed-Effects Models

Mixed-effects models present a useful comparison case because the underlying model fitting (via lme4) dominates execution time regardless of the wrapper package.

Package	Function	Notes
`summata`	`fit(..., model_type = "lmer")`	Unified interface
`summata_minimal`	`fit(..., model_type = "lmer", conf_method = "wald", show_n = FALSE, show_events = FALSE, reference_rows = FALSE, keep_qc_stats = FALSE)`	Reduced output
`finalfit`	`lmmixed()` + `fit2df()`	Two-step process
`broom.mixed`	`tidy()`	Minimal extraction
`gtsummary`	`tbl_regression()`	gt formatting

Dataset Size	`summata_minimal`	`summata`	`finalfit`	`broom`	`gtsummary`
n = 500	35 ms	58 ms	26 ms	31 ms	1,141 ms
n = 1,000	37 ms	60 ms	28 ms	34 ms	1,133 ms
n = 5,000	52 ms	76 ms	43 ms	47 ms	1,185 ms

The relatively narrow spread among summata, finalfit, and broom reflects the dominance of model fitting time. Differences in wrapper overhead become proportionally less significant as the underlying computation grows.

Univariable Screening

Univariable screening—fitting separate models for each predictor—provides a test case for operations involving many repeated model fits.

Package	Function	Notes
`summata`	`uniscreen()`	Parallel-capable
`summata_minimal`	`uniscreen(..., conf_method = "wald", show_n = FALSE, show_events = FALSE, reference_rows = FALSE)`	Wald CIs, reduced output
`finalfit`	`glmuni()` + `fit2df()`	Sequential
`broom`	Loop + `tidy()`	Manual implementation
`arsenal`	`modelsum()`	Formula interface
`gtsummary`	`tbl_uvregression()`	gt formatting

Screening 14 predictors:

Dataset Size	`summata_minimal`	`summata`	`finalfit`	`broom`	`arsenal`	`gtsummary`
n = 500	117 ms	319 ms	360 ms	440 ms	877 ms	12,963 ms
n = 1,000	134 ms	351 ms	477 ms	558 ms	1,128 ms	13,025 ms
n = 5,000	196 ms	624 ms	1,763 ms	1,799 ms	3,401 ms	14,001 ms

The performance gap between summata (default) and summata_minimal is amplified during univariable screening because profile likelihood profiling is repeated for each of the 14 predictor models. All profile-based packages (summata default, finalfit, broom) show comparable performance, as profiling dominates their execution time. With Wald CIs, summata_minimal is the fastest option at all sample sizes, outperforming the next-fastest alternative by 2.6–9.0× due to data.table vectorization and parallel model fitting.

Complete Workflow

The combined univariable screening and multivariable modeling workflow represents a common analytical pattern in statistical research.

Package	Approach	Notes
`summata`	`fullfit()`	Single function
`summata_minimal`	`fullfit(..., conf_method = "wald", show_n = FALSE, show_events = FALSE, reference_rows = FALSE)`	Wald CIs, reduced output
`finalfit`	`finalfit()`	Single function
manual	Loop + `glm()` + `broom::tidy()` + `rbind()`	Custom
`gtsummary`	`tbl_uvregression()` + `tbl_regression()` + `tbl_merge()`	Multi-step

Dataset Size	`summata_minimal`	`summata`	`finalfit`	manual	`gtsummary`
n = 500	123 ms	450 ms	207 ms	407 ms	9,655 ms
n = 1,000	136 ms	549 ms	200 ms	541 ms	9,726 ms
n = 5,000	196 ms	1,479 ms	209 ms	1,889 ms	11,092 ms

The default summata and finalfit show comparable performance for GLM workflows because both use profile likelihood CIs. The difference between them reflects summata’s additional features (QC statistics, reference rows, complete-case sample sizes) versus finalfit’s inclusion of a descriptive statistics table. The summata_minimal configuration with Wald CIs is the fastest single-function option at small to moderate sample sizes, completing the combined analysis in roughly 60–70% of the time finalfit requires at n = 500–1,000.

Forest Plots

Forest plot generation combines data extraction with graphical rendering.

Package	Function	Notes
`summata`	`coxforest()`	Integrated table and plot
`survminer`	`ggforest()`	Survival-focused
manual	Custom `ggplot2`	Maximum flexibility

Dataset Size	`summata`	`survminer`	manual
n = 500	203 ms	345 ms	57 ms
n = 1,000	198 ms	338 ms	54 ms
n = 5,000	203 ms	329 ms	53 ms

The manual approach produces only the graphical element, while summata and survminer generate integrated displays with coefficient tables. The relatively constant timing across dataset sizes indicates that plot rendering, rather than data processing, dominates execution time. Also, there are significant cosmetic differences between the three graphical outputs, which predominates other factors when selecting a plotting function.

Relative Performance

The following figures summarize timing ratios across benchmarks. Values greater than 1 indicate the comparison package requires more time than the baseline.

Relative to `summata` (default, profile likelihood CIs)

Relative to `summata_minimal` (Wald CIs, no QC stats)

Summary of Ratios

Ratios relative to summata (default):

Benchmark	`gtsummary`	`finalfit`	`arsenal`
Descriptive Tables	41–70×	6–10×	1.4–1.5×
Survival Tables	5–12×	—	—
Logistic Regression	6–8×	0.8–1.0×	—
Poisson Regression	7–8×	0.9–1.0×	—
Linear Regression	25–34×	0.2–0.3×	—
Cox Regression	20–34×	0.2–0.4×	—
Mixed-Effects	16–20×	0.5–0.6×	—
Univariable Screening	22–41×	1.1–2.8×	2.8–5.5×
Complete Workflow	8–21×	0.1–0.5×	—

Ratios relative to summata_minimal (Wald CIs):

Benchmark	`gtsummary`	`finalfit`	`summata` (default)
Logistic Regression	58–74×	8–35×	10–34×
Poisson Regression	55–63×	7–30×	8–29×
Linear Regression	32–59×	0.3×	1.3–1.7×
Cox Regression	31–68×	0.4–0.6×	1.5–2.0×
Mixed-Effects	23–33×	0.8–0.9×	1.5–1.7×
Univariable Screening	71–111×	3.1–9.0×	2.6–3.2×
Complete Workflow	57–78×	1.1–1.7×	3.7–7.5×

For GLM models (logistic and Poisson), summata_minimal outperforms all alternatives by a wide margin: 8–35× faster than finalfit, 7–30× faster than broom. This is because summata_minimal is the only configuration that uses Wald CIs — finalfit, broom, and the default summata all use profile likelihood CIs, which accounts for their comparable timings.

For linear, Cox, and mixed-effects models, where all packages use the same CI method (exact t-distribution for lm, Wald for Cox and mixed-effects), the timing gap between summata and summata_minimal is narrow (1.3–2.0×) and reflects formatting overhead only.

Scaling Characteristics

The relationship between dataset size and execution time provides insight into algorithmic complexity. Near-linear scaling (execution time proportional to n) indicates efficient implementation, while superlinear scaling may suggest operations with O(n²) complexity, such as repeated rbind() calls or element-wise data frame construction.

Observed scaling factors for summata (ratio of time at n = 10,000 to time at n = 1,000):

Operation	Scaling Factor	Expected for O(n)
Descriptive tables	1.7×	10×
Logistic regression	6.5×	10×
Univariable screening	1.8×	10×

The sublinear scaling reflects that fixed overhead (package loading, object construction, profile likelihood profiling) constitutes a significant fraction of total time at smaller dataset sizes. Logistic regression shows nearer-to-linear scaling because profile likelihood profiling cost scales with the number of IRLS iterations, which grows with n.

Implementation Notes

The performance characteristics documented here reflect specific implementation choices:

summata: Built on data.table for data manipulation, with coefficient extraction optimized for common model classes. Default configuration uses profile likelihood CIs for GLM/negbin models (matching finalfit and broom). The conf_method = "wald" option skips profiling entirely, producing a configuration faster than any alternative tested.

gtsummary: Prioritizes output flexibility through the gt table framework. The additional abstraction layers enable extensive customization but increase computational overhead.

finalfit: Balances functionality and performance with a tidyverse-compatible interface. Uses profile likelihood CIs by default for GLM models (confint_type = "profile"). The finalfit() function is particularly optimized for the combined workflow.

arsenal: Uses formula-based syntax familiar to SAS users. Performance varies by operation type.

broom: Provides minimal coefficient extraction with limited formatting. Uses profile likelihood CIs for GLM models via stats::confint() dispatch. Suitable as a building block for custom pipelines.

Effect of Output Options

By default, summata regression functions compute profile likelihood confidence intervals (for GLM and negative binomial models), sample sizes, event counts, QC statistics, and reference rows for categorical variables. These features produce more complete and accurate output for publication but add computational overhead. For performance-sensitive applications, these options can be disabled.

The summata_minimal configuration shown in the benchmarks represents:

fit(data, outcome, predictors, 
    conf_method = "wald",
    show_n = FALSE, 
    show_events = FALSE, 
    reference_rows = FALSE,
    keep_qc_stats = FALSE)

The conf_method parameter can also be set globally for an entire session:

options(summata.conf_method = "wald")

The impact of each option varies by model type:

Option	GLM/negbin models	Linear/Cox/mixed models
`conf_method = "wald"`	Large effect (eliminates profile likelihood profiling)	Minimal effect (Wald already used for Cox/mixed; exact t is fast for lm)
`keep_qc_stats = FALSE`	Moderate effect (skips C-statistic, Hosmer-Lemeshow)	Small effect
`show_n/show_events = FALSE`	Small effect	Small effect
`reference_rows = FALSE`	Small effect	Small effect

For logistic and Poisson models at n = 1,000, the minimal configuration is approximately 10× faster than the default (22 ms vs. 234 ms for logistic), with the majority of the difference attributable to conf_method. For linear and Cox models, the difference is roughly 1.5–2×, reflecting formatting overhead only.

The choice between configurations depends on the use case:

Publication tables: Default settings provide profile likelihood CIs and complete output ready for manuscripts
Simulation studies: conf_method = "wald" reduces per-iteration overhead substantially for GLM models
Exploratory analysis: Either setting is appropriate; conf_method = "wald" is recommended when iterating through many model specifications

Practical Considerations

The timing differences documented here range from negligible (tens of milliseconds) to substantial (several seconds). The practical significance depends on context:

Interactive analysis: Differences under 500 ms are generally imperceptible
Batch processing: Cumulative differences matter when processing many datasets
Simulation studies: Per-iteration overhead compounds across thousands of replicates
Teaching and demonstration: Faster feedback loops improve the interactive experience

Package selection should primarily reflect functional requirements, syntax preferences, and ecosystem compatibility. Performance considerations become relevant only when computational constraints are binding.

Reproducibility

The benchmark script is available in the package repository at inst/benchmarks/benchmarks.R. Execution produces:

Individual PNG figures for each benchmark category
Summary figures (benchmark_speedup.png, benchmark_speedup_minimal.png)
CSV files with detailed timing data

Results will vary across systems due to differences in hardware, R version, and package versions.

Session Information

This benchmark was run under the following conditions:

R version 4.5.2 (2025-10-31)
Platform: x86_64-unknown-linux-gnu

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.3.30.so;  LAPACK version 3.12.0

Void Linux x86_64
Linux 6.12.63_1
Intel(R) Core(TM) i5-4670K (4) @ 3.80 GHz
NVIDIA GeForce GTX 970 [Discrete]