Example-Analysis

Morphological relationships of bullseye puffer Sphoeroides annulatus and their potential use in fishery management

Bullseye puffer Sphoeroides annulatus (Jenyns 1842).

Introduction

The Morefi package: Morphological Relationships Fitted by Robust Regression. It is a methodological package developed in R to analyze the submitted article:

Aguirre-Villaseñor, H., Morales-Bojórquez, E., Cisneros-Mata (FISH13711). Biometric relationships as a fisheries management tool: A case study of the bullseye puffer (Sphoeroides annulatus. Tetraodontidae). Fisheries Research.

The objective of this study is to provide quantitative data on some morphological relationships to predict:

the biomass of different weight categories,
the fillet yield from body measurements and
the suitability of the fillet yield to be used as a management reference point for the bullseye puffer Sphoeroides annulatus.

For this, two exercises are proposed using 250 g of fillet (benchmark) as a reference: (i) how many organisms are required to obtain the benchmark by changing the average length? (ii) taking into account that five is the maximum number of organisms of a species allowed for recreational fishing per person per day, what is the average size to reach the benchmark?

Tutorial objective

The goal of this tutorial is to replicate the analysis performed in the submitted article Aguirre et al. (FISH13711) step by step, while explaining the functionality of the Morefi package Morphological Relationships Fitted by Robust Regression, developed in R. This package streamlines the methodology and improves the clarity and impact of the results, as well as the graphical presentations, including personalized tables and figures.

Instaling Morefi

The Morefi package includes functions that facilitate data analysis and ensure reproducibility of results.For more information about the functions, please visit the Morefi GitHub page.

To install this package from GitHub do the following from the R console:

if (!require("Morefi")){
        library(devtools)
        install_github("Macrurido/Morefi")
library(Morefi)        
}

When Morefi is installed, it either updates the necessary package or installs it if it is not already present.

To make sure that all the necessary packages for this example are up to date, they are defined in the packages vector. If a package is already installed, it will be loaded; if not, it will be installed and then loaded.

packages <- c("broom", "dplyr","ggplot2", "ggtext", "glue",
              "investr", "kableExtra", "latex2exp", "Morefi",
              "patchwork", "purrr", "reshape2", "robustbase")

for (i in packages) {
  if (!require(i, character.only = TRUE)){
    install.packages(i, dependencies = TRUE)
    library(i, character.only = TRUE)
  }  # End if
} # End for

theme_papers()

The theme_papers is created to standardize the graphic format, used as base theme_bw. The theme_set() function is used to apply the theme to all charts automatically. It’s not necessary to add + theme_papers() after each ggplot snippet.

To call the theme_papers() function and automatically apply it to all figures in this document, use the following statement:

theme_papers <- Morefi::theme_papers()
theme_set(theme_papers)

Data

In Morefi, the data utilized in Aguirre-Villaseñor et al. (Submitted) is incorporated.

In the file Botete there is the biometric data of the bullseye puffer S. annulatus, with the following variables, three lengths (cm): total (LT), standard (LS) and body trunk (LB); and three weights (g): total (WT); body trunk (WB) and fillet (Wfi). Additionally, there is a categorical variable called ‘Fleet’ that distinguishes between artisanal (fresh) and trawling (frozen-thawed), represented by the values 1 and 2, respectively.

To access the data file, the data frame is stored in mydata.

#> 'data.frame':    1397 obs. of  7 variables:
#>  $ LT   : num  21 22 12 21 12 28.4 29.3 34.8 31.7 31.5 ...
#>  $ LS   : num  17.3 17.6 10 17.3 10 23 24.2 27.5 25.5 26 ...
#>  $ LB   : num  17 15 10 15 8.5 20 20.5 27 23 24 ...
#>  $ WT   : num  189.1 213.4 33.4 236.7 36 ...
#>  $ WB   : num  115.6 86.7 15.4 100.4 18.9 ...
#>  $ Wfi  : num  50.7 55.5 6.9 65 9.1 ...
#>  $ Fleet: Factor w/ 2 levels "1","2": 2 2 2 1 2 1 2 1 1 1 ...

A second dataset, Botete_land, contains records of pufferfish catches landed along the Mexican Pacific coast in 2023, measured in kilograms (kg). It includes information about the landed weight categorized as landed, as well as its conversion to live weight, referred to as biomass, for each landing presentation. The dataset outlines total weight (WT), body trunk weight (WB), and fillet weight (Wfi). Additionally, it specifies whether the catch was fresh or frozen-thawed. The dataset also includes the value of the landed catch in Mexican pesos (Pesos $) and the value per kilogram of landed weight (Pesos_kg) (SIPESCA, 2024).

To access the data file, the data frame is stored in catch.

#>          Category  Landed   Biomass    Pesos  Pesos_kg
#> 1        WT_Fresh  509354  509354.0  33079.5  41.49873
#> 2        WB_Fresh  449646  494610.6  58048.0  52.05232
#> 3 WB_Fresh_Thawed  361404  451755.0  37997.0  54.77369
#> 4       Wfi_Fresh   47990   95980.0  16485.0 112.83715
#> 5           Total 1368394 1551699.6 145609.5  50.97446

Metodology

Defining the models

Eight relationships are fitted, six to a simple linear model: LT vs LS, LT vs LB, LS vs LB, WT vs WB, WT vs Wfi, WB vs Wfi; and three to a potential model: LT vs WT, LT vs WB, LT vs Wfi.

To perform each regression, a list named modelos is created to specify the variables that will be included. These variables are then stored in a vector.

modelos <- list (LTvs.LS  =  c(1,2),
                 LTvs.LB  =  c(1,3),
                 LSvs.LB  =  c(2,3),
                 WTvs.WB  =  c(4,5),
                 WTvs.Wfi =  c(4,6),
                 WBvs.Wfi =  c(5,6),
                 LTvs.WT  =  c(1,4),
                 LTvs.WB  =  c(1,5),
                 LTvs.Wfi =  c(1,6))

Labellers

The labelers designated for use in this script are outlined below.

Model labelers

For the model labelers, the parameters include subscripts, and the customized labels for the plots are stored in the variable my_labeller.


my_labeller <- as_labeller(c(LT_LS=  "L[T]-L[S]",
                             LT_LB=  "L[T]-L[B]",
                             LS_LB=  "L[S]-L[B]",
                             WT_WB=  "W[T]-W[B]",
                             WT_Wfi= "W[T]-W[fi]",
                             WB_Wfi= "W[B]-W[fi]",
                             LT_WT=  "L[T]-W[T]",
                             LT_WB=  "L[T]-W[B]",
                             LT_Wfi= "L[T]-W[fi]"),
                           default = label_parsed)

Variable labelers for the plots.

For the variable labelers, the variables include subscripts, and the customized labels for the plots are stored in the variable my_labeller.

my_labeller2 <- as_labeller(c(LT=  "L[T]",
                             LS=  "L[S]",
                             LB=  "L[B]",
                             WT=  "W[T]x",
                             WB=  "W[B]",
                             Wfi= "W[fi]"),
                             default = label_parsed)

Variable labelers for the tables.

The ‘row_name’ holds labels for the models used in the tables, which will replace the current values in the ‘Models’ column of the data frame for the table design. Defining model labels using subscript characters for the tables.

# Defining model labels using subscript characters for the tables.
row_name <- c("$L_T$ vs. $L_S$", "$L_T$ vs. $L_B$",
              "$L_S$ vs. $L_B$", "$W_T$ vs. $W_B$",
              "$W_T$ vs. $W_fi$", "$W_B$ vs. $W_fi$",
              "$L_T$ vs. $W_T$", "$L_T$ vs. $W_B$",
              "$L_T$ vs. $W_T$")

Regression results.

Empty tables and lists to fill, throughout the cycles.

To organize and present the results of the regression analysis, several tables were created based on List_base.

To store the results for each model from every data source, nested lists were developed. This structure allows for a hierarchical or multi-level organization of the data. At the first level, the datasets are categorized into three groups: Fresh, Frozen-thawed (Frozen), and Total sample (Total). At the second level, sub-lists provide the results from various fitted models model_list.

The number of items in a list and their names depend on the fitted model of x against y.

# Create empty model list  
model_list <- setNames(lapply(names(modelos), function(x) NULL), names(modelos))

List_base <- list("Fresh"= model_list,
                  "Frozen"= model_list,
                  "Total" = model_list)

Using List_base

Summary Regression Table.

To store the summary regression, a customised table T1 is generated. The type of function model (Linear or Potential), and its variables, are indicated in the first two columns. The third column indicates the data source used (Fresh, Fresh-Thawed, or Total). By row it is stored: the Intercept, its confidence interval (CI95%); the Slope; its (CI95%); the adjusted robust coefficient of determination R2,adj,w,a and the degrees of freedom DF.

    # An empty table to save summary regression results   
 
T1 <- data.frame(matrix(NA, nrow = 3, ncol = 9))
names(T1) <- c("Function", "Model","Categories",
               "Intercept", "CI95%","Slope",
               "CI95%", "R2","DF")
T1
#>   Function Model Categories Intercept CI95% Slope CI95% R2 DF
#> 1       NA    NA         NA        NA    NA    NA    NA NA NA
#> 2       NA    NA         NA        NA    NA    NA    NA NA NA
#> 3       NA    NA         NA        NA    NA    NA    NA NA NA

To store the results of each fitted model, a list called List_Tables was created. Each item in the list contains the customized table T1, which is initially filled with NA values.


# Create model list including empty T1 table
T1_model_list <- setNames(lapply(1:length(modelos), function(x)
T1),names(modelos))

#List_Tables <- list("Fresh"= T1_model_list,
#                    "Frozen"= T1_model_list,
#                    "Total" = T1_model_list)
List_Tables <- T1_model_list

Coincident Curves Test Table.

List_TCCT store the values and calculus for the Coincident Curves Test

To store the results of each model regressions, an empty matrix of 4 rows and 8 columns are created. The table CCT stores the values and calculus for the Coincident Curves Test between weight categories: Fresh, Frozen, Total and Joined (sum of values of Fresh, Frozen): the residual sum of squares RSS, degree of freedom DF, Analysis of the Residual Sum of Squares ARSS, F-table, p-value and decision criteria for $\alpha$ = 0.05 Criteria.

To store the results of each fitted model, a list called List_TCCT was created. Each item in the list contains the customized table CCT, which is initially filled with NA values.

catego <- c("Fresh","Frozen","Total")

nrow <- length(catego)+1
            
CCT <- (matrix(NA,nrow=nrow,ncol=8))
CCT[,2] <-  c(catego,"Joined")
colnames(CCT) <- c("Model","Data","RSS","DF","ARSS","F-table",
                   "p-value","Criteria")

# Create model list including empty T1 table
List_TCCT <- setNames(lapply(1:length(modelos), function(x)
CCT),names(modelos))

CCT

Regression results

The figure list List_Figures is stored in a list for each category: 1) fresh; 2) frozen-thawed and 3) total.

For each category list, the elements are named using the vectors values of X_names: L_Tvs. L_S, L_Tvs. L_B, L_Svs. L_B, W_Tvs. W_B, W_Tvs. W_fi, W_Bvs. W_fi, L_Tvs. W_T, L_Tvs. W_B, L_Tvs. W_fi)

The List_wi store the the weighted values $w_i$ from each model.

List_dfa_long

For each landed category, the data frames in List_fit were transformed into long-form data frames and stored in List_fit_long.

          
X_list <- vector(mode='list', length=length(modelos))
         
X_names <- c("LT_LS",
             "LT_LB",
             "LS_LB",
             "WT_WB",
             "WT_Wfi",
             "WB_Wfi",
             "LT_WT",
             "LT_WB",
             "LT_Wfi")
names(X_list) <- X_names

Biometrics as a fishery management tool

Hugo Aguirre-Villaseñor

25 June 2025