Hydrodynamic and TS Structure Dataset of the São Francisco and Parnaiba Brazilian Rivers. and adjacent to two

Running a numerical simulation with the ROMS model in the regions of the mouths and plumes of the São Francisco and Parnaiba rivers, this dataset was obtained. It is the result of two scenarios, one of them taking into account the tides and the other not. It consists of 50 files (in NetCDF format) divided into 25 for each of the mentioned scenarios. Among the 25 files, 1 corresponds to the spatial discretization parameters of the input; 12 corresponds to the numerical simulation in the Parnaiba River and the remaining 12 to the São Francisco River. These files contain the daily average of all input and output parameters of the simulations. The ROMSTOOLS scripting package was used to prepare the input data. The ROMS model input was the surface forcings from the COADS dataset; the WOA09 and SODA datasets were used as initial and lateral boundary conditions, respectively. The bathymetry was taken from ETOPO2 and the tide was acquired from the TPXO7 satellite product. The hydrodynamics and TS structure data contained in this dataset are useful for physical oceanographers to study the thermodynamics of the waters and other physical processes occurring in the mouth and plume region of the São Francisco and Parnaiba Rivers. It is also useful for chemical and biological oceanographers to study the behavior of nutrients, chlorophyll and primary productivity.


DATA IMPORTANCE
• This dataset is useful to physical oceanographers and can be used to study the behavior of currents and the TS structure of the ocean in the waters surrounding Northeastern Brazil, more specifically in the region of the mouth and plumes of the Parnaiba and São Francisco rivers; • Other ocean researchers such as chemical oceanographers and marine biologists can use the water masses dynamics reproduced in this dataset to study the behavior of biogeochemical properties such as pH, dissolved O2, PO4, NO3, dissolved Si, dissolved Fe, marine species, primary productivity and chlorophyll concentration, zooplankton, and phytoplankton in relation to the simulated physical processes.

MATERIALS AND METHODS
The study is carried out using the IRD-UCLA version of the Regional Ocean Modeling System (ROMS model;MCWILLIAMS, 2003MCWILLIAMS, , 2005PENVEN et al., 2006), which has been extensively and successfully tested in the Northeastern Brazilian region (SILVA et al., 2005;ARAUJO, 2009;SILVA et al., 2009;BOURLES, 2010;VARONA, 2018;VARONA et al., 2018VARONA et al., , 2019DE SANTANA et al., 2020). This circulation model is a split-explicit and free-surface model that makes the Boussinesq and hydrostatic assumptions when solving the primitive equations. The code uses a third-order up-stream biased scheme for advection, which provides lateral diffusivi-ty/viscosity (SHCHEPETKIN; MCWILLIAMS, 2005).
In the vertical the model is discretized on a sigma, or topography-following stretched coordinate system. The grid is isotropic and does not introduce any asymmetry in the horizontal dissipation of turbulence, allowing a fair representation of mesoscale dynamics (PENVEN et al., 2006). The bottom topography is derived from a 2 min resolution database (SMITH; SANDWELL, 1997). Although the latest version of the model includes a pressure gradient scheme associated to a specific equation of state to limit errors in the computation of the pressure gradient MCWILLIAMS, 2003), the bathymetry has been filtered in order to keep the ''slope parameter'' r ≤ 0.25. The model has 50 vertical levels and the vertical s-coordinate is stretched for boundary layer resolution. In the configurations analyzed in this manuscript, we chose to more homogeneously distribute the sigma-levels along the water column, activating the optimizing function of the vertical coordinate system that ensures an increased resolution in the subsurface and a fair smoothing of the tracer fields. All the model external forcing functions are derived from climatology of oceanic and/or atmospheric parameters. At the surface, the model heat and fresh water fluxes are extracted from the COADS climatology (DA SILVA; YOUNG-MOLLING; LEVITUS, 1994).
The initial conditions of the model are an ocean at rest with temperature and salinities from the World Ocean Atlas (WOA) for the month of January. For the wind stress, a monthly mean climatology is computed from QuikSCAT scatterometer data. The deliberate choice of using a climatological wind forcing for a simulation that is forced for multiple years, is in accordance with the focus on equilibrium dynamics. Moreover, it allows for an investigation of intrinsic, or unforced, system variability. At the four lateral boundaries an active, implicit, up-stream-biased, radiation condition connects the model solution to the surroundings (MARCHESIELLO; MCWILLIAMS; SHCHEPETKIN, 2001). The vertical mixing scheme is based on the KPP parametrization of Large, Mcwilliams and Doney (1994), while the horizontal mixing is parametrized as a linear combination of Laplacian and biharmonic mixing scaled with the grid size.
Our model is integrated for 5 years and the output were stored and averaged every day. Two horizontal resolutions were set: a mediumresolution simulation at 1/12° where and are the distances between nodes in the grid for longitude and latitude respectively, that is fully mesoscale resolving; then this horizontal resolution nested with a 1/36° , that partly permits submesoscale dynamics, but completely resolves mesoscale features. The latter, highest resolution solution consists of a double-way nested configuration with two child grids embedded within a parent domain (CAPUANO et al. 2022b) and geographically centered over the deltaic regions of the Sao Francisco and Parnaiba estuaries ( Fig. 1; Table 1). Results presented in this paper come from the analysis of the last 4 years of each simulation, since the first year was discarded as a spin-up period required to reach statistical equilibrium and for eddy kinetic energy ( ), where and are the zonal and meridional velocity components, and designates the standard deviation of their instantaneous values) to reach a plateau. Figure 1 shows the geographical extensions of the three domains making up our nested simulation. In order to maximize computing efficiency, the simulation employs the two-way AGRIF embedding capability of ROMS model (DEBREU; MAZAURIC, 2006), which is designed such that the output from the lower resolution 'parent' domain (CAPUANO et al. 2022b) provides boundary conditions for the higher resolution 'child' domains nested within it and the 'child' domains in turn feed the parent domain (Table 1; CAPUANO et al. 2022b). This technique allows for more consistent boundary conditions than in-situ products based on often temporally and spatially scarce measurements and is far less costly than running the parent domain at the resolution of either one of the two child domains. As previously mentioned, all the simulations presented here are based on the same depth dataset, parameters, and forcing, in order to render them as comparable as possible. The ROMS model input files were prepared with the MATLAB script set ROMSTOOLS (PENVEN et al., 2003(PENVEN et al., , 2006, which allows the creation of river discharge grids, surface forcings as well as initial and boundary conditions.

Data description
This dataset is the result of the research carried out by Capuano et al. (2022a), it is the numerical output of a simulation of hydro-thermodynamics in the region of the mouth and plumes of the San Francisco and Parnaiba rivers. All the files corresponding to this dataset are in the NetCDF self-describing format, to which new metadata was added using the freeware tool packages called NCO (NetCDF operators; ZENDER, 2008) and CDO (Climate Data Operator; SCHULZWEIDA, 2006). All output files that are part of this dataset can be visualized with the tools that are in the Roms_tools/Visualization_tools/ directory which is a component of the MATLAB ROMSTOOLS package, and among the scripts found in this directory there is a Graphical User Interface (GUI) called ROMS_GUI (PENVEN et al., 2003(PENVEN et al., , 2006 Dataset This dataset is composed of the two sets of NetCDF files that correspond to the numerical outputs of the ROMS model for the child grids of the mouth and plume of the Parnaiba and São Francisco rivers, composing a total of 50 NetCDF files. Every set of files contains 25 files, one file for the bathymetry grid, 12 files with the simulation grids where tides are taken into account, and 12 where they are not taken into account. Each of these 12 files' sets corresponds to the numerical output of the daily averages of all parameters in each month of a year. For this climatology of daily averages, it was considered that all months had 30 days. All these files are organized in a three-level directory structure (Fig. 2). In the first level are the files roms_grd.nc.<idChid> (files with spatial discretization data for all grids in the numerical simulation with the ROMS model), idChild=1 for the Parnaiba river and idChild=2 for the São Francisco river. In the second level, there are two directories named "Tide" and "noTide" corresponding to the simulations where the tide is taken into account and those where it is not. Within each of these directories there are two more named "Parnaiba_river" and "SaoFrancisco_river", which correspond to the simulations in child grids 1 and 2 respectively. Each of these four directories contain 12 files and follow the pattern name roms_avg_Y10M<idMonth>.nc.<idChild>, where idMonth are the months of the year (1 = January, 2 = February, ..., 12 = December).
There is a detailed description of the ROMS model output files by Varona and Araujo (2022). Here table 2 shows the coordinate system that defines the geographical location of all parameters. The geographic location system of the ocean current velocity field is divided into two coordinate systems, one coordinate system for the zonal component (lon_u, lat_u) and another for the meridional component (lon_v, lat_v), and these two coordinate systems are different from the geographic location reference system of the rest of the parameters (lon_rho, lat_rho). There are three mask parameters: mask_u and mask_v are the mask parameters for the zonal and meridional components of the currents, and mask_rho is the mask for the remaining parameters. The mask_rho parameter is present in all NetCDF files. Table 3 describes all input and output parameters of the numerical simulations performed with the ROMS model. The ocean surface parameters and those at a row depth are three-dimensional grids, because they depend only on geographic position and time; the rest of the parameters are four-dimensional grids, since they additionally depend on depth.