NUMBER OF LEAVES NEEDED TO MODEL LEAF AREA IN JACK BEAN PLANTS USING LEAF DIMENSIONS

Leaf area estimation models based on linear leaf dimensions are an important method because their application is not destructive to the leaves. For these models to be reliable, it is important that the estimation of model parameters is accurate, and for that to occur, the models must be generated using an adequate sample size (number of leaves). The objective of this study was to determine the number of leaves necessary to accurately model the leaf area of jack beans (Y), determined by digital photos, according to the width of the central leaflet (x), by a power model (Y = ax) generated through an iterative process. Accordingly, an experiment was performed in a 256 m area. A total of 745 leaves were randomly collected at six different crop development stages (29, 43, 57, 73, 87 and 101 days after emergence). Each leaf was comprised of a left, central and right leaflet. The width of the central leaflet (x) was measured on the 745 leaves. Leaf area (sum of the area of the left, central and right leaflets; Y) was then determined using a digital photo method. The number of leaves necessary for the estimation of the parameters a and b and the coefficient of determination (R) of the power model were determined through resampling with replacement. The power model (Ŷ = 4.2049x, R = 0.9701), based on the width of the central leaflet was determined to be adequate for estimating jack bean leaf area. Data collected from a sample of 200 leaves were determined to be sufficient for constructing an accurate power model for the leaf area of jack beans (Y) as a function of the width of the central leaflet (x), based on determinations of leaf area using digital photos.


INTRODUCTION
Jack bean (Canavalia ensiformis) crops have a high nitrogen fixation capacity and weed control efficiency (FERNANDES et al., 1999), which reduces soil and nitrogen loss (PANSAK et al., 2008), and also have heavymetal phytoremediation potential for elements such as lead (PEREIRA et al., 2010) and copper (ZANCHETA et al., 2011).The seeds of the jack bean are nutritional and can be used in human and animal food (BENÍTEZ et al., 2013;MARIMUTHU;GURUMOORTHI, 2013;SASIPRIYA;SIDDHURAJU, 2013) as long as they are correctly processed.Extracts from raw and processed jack bean seeds also exhibit antioxidant properties (SOWNDHARARAJAN et al., 2011), and the urease produced by the plant may be an important alternative remedy for the control of fungal diseases (POSTAL et al., 2012) and insects (DEFFERRARI et al., 2011).
Leaf area is important for determining plant growth and has an direct relationship with light interception, photosynthetic efficiency, evapotranspiration rates and plant response to fertilizers and irrigation (BLANCO;FOLEGATTI, 2005).Therefore, leaf area should be measured accurately.Leaf area can be measured directly or indirectly.In general, the direct measurement of the leaf area is destructive and requires special equipment, such as leaf area integrators, scanners or digital cameras (BLANCO; FOLEGATTI, 2005), which are expensive and overly complicated for use in simpler and more basic studies (DEMIRSOY et al., 2005).Alternatively, leaf area can be estimated indirectly using leaf area estimation models based on linear leaf dimensions such as length, width or length × width.
Models generated based on the linear leaf dimensions are generally accurate, as has been shown for grapevines (WILLIAMS;MARTINSON, 2003), strawberry (DEMIRSOY et al., 2005), coffee (ANTUNES et al., 2008), Barbados nut (POMPELLI et al., 2012), forage turnip (CARGNELUTTI FILHO et al., 2012a), velvet bean (CARGNELUTTI FILHO et al., 2012b) and jack bean (TOEBE et al., 2012).These models allow for the quantification of leaf area in a non-destructive way with a lower financial investment.Additionally, leaf area can be estimated for different growth and crop development periods.
Of the different types of leaf area estimation models that have been studied for use with agricultural crops, power models (Y = ax b ), in which leaf area (Y) is a function of a linear leaf dimension (x) (such as length, width or length × width), have been determined to be accurate and adequate for several crops.Antunes et al. (2008), Cargnelutti Filho et al. (2012a) and Pompelli et al. (2012) tested several linear and non-linear models and found that power models based on the product of leaf length × width were the most precise models for estimating leaf area in coffee, forage turnip and the Barbados nut, respectively.Williams and Martinson (2003) tested nine different leaf area estimation models for two grapevine cultivars and found that a power model based on leaf width was the best-fitting model.Power models based on the width of the central leaflet were found to be adequate for the estimation of leaf area in velvet bean (CARGNELUTTI FILHO et al., 2012b) and jack bean (TOEBE et al., 2012) crops.Antunes et al. (2008) and Pompelli et al. (2012) found that the best models were based on the product of leaf length × width.These authors suggested that, although models based on a single dimension (length or width) exhibit high accuracy, they may produce biased estimates, especially in the case of small or large leaves, and the residuals may not follow a normal distribution.However, Williams andMartinson (2003), Cargnelutti Filho et al. (2012b) and Toebe et al. (2012) all recommend using models based on a single linear leaf dimension (width) due to the ease of measurement and the high level of accuracy.According to Williams and Martinson (2003) and Zhang and Pan (2011), using a single linear leaf dimension also prevents potential problems of colinearity between the independent variables in the model.
For leaf area estimation models to have high accuracy and reliability, and to avoid bias in the models, it is necessary to use a representative sample (a certain number of leaves) from the plant population used to generate the model.Zhang and Pan (2011) used samples of between 202 and 476 leaves to generate leaf area estimation models for a tree species.Pompelli et al. (2012) used 1,200 leaves from Barbados nut plants to generate leaf area models and used two independent samples of 300 leaves for the validation of the generated models.According to these authors, previous models for the Barbados nut, which were generated using samples of 89 and 250 leaves, may have not estimated leaf area correctly due an insufficient number of leaves.The authors aimed to determine the sample size (number of leaves) necessary for generating an accurate model, and they found 415 leaves to be an adequate sample size for the Barbados nut.Antunes et al. (2008) used 1,563 leaves to generate a leaf area estimation model for coffee and 388 leaves for validating the model.These authors also concluded that approximately 200 leaves was an adequate sample size for generating power models for this species.For the velvet bean (CARGNELUTTI FILHO et al., 2012b) and jack bean (TOEBE et al., 2012), samples of 650 and 605 leaves, respectively, were used to generate leaf area models, and samples of 140 leaves were used for validation of the models.
Leaf area estimation models for the jack bean have been previously generated and validated.A power model (Ŷ = 3.7046x 1.8747 R 2 = 0.9757) based on the width of the central leaflet (x), was found to adequately estimate leaf area (Y), determined through the use of digital photos (TOEBE et al., 2012).However, these authors did not determine the adequate sample size (number of leaves) needed for the generation of the model.In addition, the parameters a and b and the coefficient of determination (R 2 ) of the Y = ax b function were determined following a logarithmic transformation for linearization of the Y = ax b function (STEEL et al., 1997).A similar procedure has been used by other authors (WILLIAMS; MARTINSON, 2003;ANTUNES et al., 2008).The use of an iterative process for nonlinear models may have resulted in a better fit of the models generated in those studies, minimizing the error sum of squares, especially when the number of leaves was small.Thus, the objective of our study was to determine the sample size (number of leaves) needed for the construction of a power model (Y = ax b ) generated by an iterative process that would estimate leaf area (Y) in jack beans as a function of the width of the central leaflet (x), using digital photos to measure total leaf area.

Field experiment
In this study, a field experiment was performed in a 256 m 2 area planted with jack bean (Canavalia ensiformis).The spacing between rows was 0.5 m, with 0.125 m between plants in the row, for a total of 16 plants m -2 .Sowing was performed on November 12, 2010 with a base fertilizer application of 40 kg ha -1 N, 150 kg ha -1 P 2 O 5 and 100 kg ha -1 K 2 O.The day of emergence was November 22, 2010, at which time 50% of the seeds had emerged.Ninety and 95 leaves were collected 29 and 43 days after emergence (DAE), respectively.At each of the remaining sampling times (57, 73, 87 and 101 DAE), 140 leaves were randomly collected in the experimental area.A total of 745 leaves, each comprised of three leaflets (left, central and right), were collected.

Variable measurement and model generation
The width of the central leaflet (x) of each of the 745 leaves was measured with a millimeter ruler (Figure 1).All leaves (each comprised of three leaflets) were photographed with a Sony DSC-W110 digital camera.The resulting images were processed using Sigma Scan Pro v. 5.0 (Jandel Scientific, 1991) software to determine leaf area (sum of the leaf area of the left, central and right leaflets; Y) according to the digital photo method.Measures of central tendency and variability and kurtosis and skewness coefficients were calculated for each measured variable (x and Y) for each sampling period (29,43,57,73,87 and 101 DAE) and using the data from the 745 leaves.Additionally, the normality of the data was tested using the Kolmogorov-Smirnov test, and frequency histograms and scatterplots were generated.Leaf area for the 745 leaves (Y), measured using the digital photos, was used to construct a power model (Y = ax b ) for leaf area as a function of the width of the central leaflet (x).The model was generated using an iterative process until convergence was achieved.The data set generated from this sample of 745 leaves was previously used in a preliminary study by Toebe et al. (2012).In that study, the authors used data from 605 leaves to generate linear, quadratic and power models (by linearization of the model variables) of leaf area of jack beans as a function of the width, length and length × width of the central leaflet (TOEBE et al., 2012).Data from 140 leaves were used in model validation.Using digital photos to determine leaf area, the authors concluded that a power model (Y = ax b ) as a function of the central leaflet (x) was the most adequate model for estimating leaf area (Y).In this complementary study, our objective was to determine the number of leaves needed to model Y as a function of x using a power model (Y = ax b ).We used data from all the leaves in the sample (n=745) to generate our models so that the models were based on the most representative sample possible.Furthermore, the model had been validated in the previous study and needed no further validation.

Determination of sample size (number of leaves)
In this study, the sample size (number of leaves) required to model the leaf area of jack bean (Y) (determined using digital photos) as a function of the width of the central leaflet (x) using a power model (Y = ax b ) was determined through resampling with replacement.Estimation of the model parameters was performed using an iterative process.For resampling, we used 791 simulated sample sizes, starting with an initial sample size of 10 leaves (which was considered the minimum sample size for construction of the model).The remaining sample sizes were tested in increments of one, up to a sample size of 800 leaves.Therefore, sample sizes of 10, 11, 12, ..., 800 leaves were tested.
For each simulated sample size, 3,000 resamplings with replacement were performed.Estimates of the parameters a and b and the coefficient of determination (R 2 ) of the power model (Y = ax b ) were obtained for each resampling.To minimize the error sum of squares, the estimates of a, b and R 2 were obtained by iteration until convergence was achieved.The values established for the convergence criterion were: iterations=200; step size=1; tolerance=0.0000000001.Thus, for each sample size, 3,000 estimates of a, b and R 2 were obtained and were determined the 2.5% percentile, mean and 97.5% percentile.The 95% confidence interval

Left leaflet
Right leaflet Width of the central leaflet (ACI) was calculated as the difference between the 97.5% percentile and 2.5% percentile.For a, b and R 2 , the ACI for the smallest sample size (10 leaves, ACI 10 ) was considered to be the reference condition of 100%, i.e., the maximum ACI value, representing minimal accuracy of the estimates of a, b and R 2 .The accuracy gain (AG i , %) was calculated by adding ith leaves (i = 1, 2, ..., 790 leaves, respectively, for the sample sizes 11, 12, ..., 800 leaves), according to AG i = 100 -(ACI i /ACI 10 )×100, where ACI i is the size of the 95% confidence interval for the sample sizes 11, 12, ..., 800 leaves.Was considered in this study, the accuracy gain (AG i ,) minimum of 79.41% for the estimates of a, b and R 2 as criterion for defining the sample size.
The 2.5% percentile, the mean, and the 97.5% percentile of a, b and R 2 were plotted by sample size in 10leaft intervals for better visual representation.Because the results for each sample size were too extensive to present in a table, the accuracy gain for each 10-leaft interval was shown.Statistical analyses were performed in R (R Development Core Team, 2012) and Microsoft Office Excel ® software.
For leaf area (Y), the kurtosis was not different from three and the skewness was not different from zero in three samplings (50% of the data) and five samplings periods (83.33% of the data), respectively (Table 1).The data from each of the six samplings periods had a normal distribution (p>0.05).Across the entire sample of 745 leaves, the skewness was not different from zero, and the data were well fitted to a normal distribution curve.Thus, even with a high number of observations (n = 745 leaves), the data for leaf area had a normal distribution.Consistent with our previous inferences, width of the central leaflet (x) and leaf area (Y) for the leaves collected during the six sampling periods (29,43,57,73,87 and 101 DAE) were, in general, well fitted to a normal distribution (Table 1).This result can also be observed in the frequency histograms for these two variables (Figures 2 and 3).Therefore, our data set appeared to be sufficient for generating models of leaf area as a function of the width of the central leaflet and for the study of sample size.The scatterplot of width of the central leaflet (x) against leaf area (Y) for the entire data set from the 745 jack bean leaves showed a non-linear association between the two variables (Figure 4).This result indicates that the power model was adequate, in accordance with the previous report by Toebe et al. (2012).In this study, the power model generated using an iterative process was Ŷ = 4.2049x 1.8215 , and the high coefficient of determination (R 2 = 0.9701) indicated that the accuracy of the model was high.

Sample size (number of leaves)
Using the average of the 3,000 estimates for parameters a and b and the coefficient of determination (R 2 ) of the power model (Y = ax b ) for the leaf area of jack bean (Y), generated by the resampling of 800 leaves with replacement, the following model was obtained: Ŷ = 4.2026x 1.8220 with R 2 = 0.9702.The estimates obtained through resampling were similar to the estimates produced by the model generated from the data set derived from the sample of 745 leaves (Ŷ = 4.2049x 1.8215 , R 2 = 0.9701).For the 3,000 resamples of 10 leaves (the smallest sample size used in this study), the 95% confidence interval (ACI) for a, b and R 2 was 6.0757, 0.5786 and 0.0819, respectively, and the mean for a, b and R 2 was 4.3915, 1.8296 and 0.9697, respectively (Figures 5, 6 and 7 and Table 2).At the opposite end of the range of sample sizes, for the 3,000 resamplings of 800 leaves (the largest sample size used), the 95% confidence interval (ACI) for a, b and R 2 was 0.6219, 0.0604 and 0.0076, respectively.The mean for a, b and R 2 was 4.2026, 1.8220 and 0.9702, respectively (Figures 5,6 and 7 and Table 2).Table 2. Values for the 95% confidence interval (ACI) and accuracy gain (AG i , %) of the estimates of parameters a and b in the power model (Y = ax b ) of leaf area for the jack bean (Canavalia ensiformis).Leaf area was determined using digital photos, and the model represents leaf area as a function of the width of the central leaflet (x).The coefficient of determination (R 2 ) for the sample sizes 10, 11, 12, ..., 800 leaves is also shown.
, where ACI i is the size of the 95% confidence interval for the sample sizes 11, 12, ..., 800 leaves, and ACI 10 is the size of the 95% confidence interval for the reference sample size, i.e., 10 leaves.
Although the mean estimates for a, b and R 2 were similar for the 10 leaf and 800 leaf samples, the larger confidence intervals obtained for the 10 leaf sample compared to the 800 leaf sample indicate that the estimation of the model parameters with the 10 leaf sample had a lower accuracy.This result indicates that an insufficient sample size may result in biased estimates of leaf area and that models generated using data from a small number of leaves should not be used to determine leaf area.Generating an accurate model requires first defining an adequate sample size.
The size of the 95% confidence intervals (ACI) for the estimates of a, b and R 2 gradually decreased with an increasing number of leaves in the sample (Figures 5, 6 and 7 and Table 2).This result was expected, and it indicates that estimation accuracy, and consequently, the chance of obtaining more reliable models, improves with an increasing number of leaves in the sample.A visual evaluation of Figures 5, 6 and 7 shows that there was a pronounced decrease in the ACI up to a sample size of approximately 200 leaves.Thereafter, ACI continued to decrease, but the decreases were smaller, indicating that the additional workload required for measuring more than 200 leaves results in negligible improvement to the accuracy of the estimation of the model parameters.A visual evaluation suggests that a sample of 200 leaves is sufficient to generate a strong power model for estimating a, b and R 2 .
An increase in sample size from 10 to 20 leaves resulted in accuracy gains of 35.49% [100-(3.9194/6.0757)×100], 34.98% [100-(0.3762/0.5786)×100]and 36.70%[100-(0.0518/0.0819)×100],for the estimates of a, b and R 2 , respectively.An increase from 10 to 30 leaves resulted in accuracy gains of a=46.57%,b=44.76% and R 2 =49.50% (Table 2).The accuracy gains with an increasing number of leaves in the sample were similar for a, b and R 2 .In addition, the gains were more pronounced with an increase from 10 to 20 leaves than with an increase from 20 to 30 leaves, and so on.Accuracy gains close to 80% (a=79.85%,b=79.41% and R 2 =81.08%) were obtained from increasing the sample size from 10 to 200 leaves.With an increase from 10 to 800 leaves, the accuracy gains were approximately 90% (a=89.76%,b=89.56% and R 2 =90.77%).Therefore, an increase in accuracy of only approximately 10% was gained when the sample size increased from 200 to 800 leaves (an increase of 600 leaves).Although researchers should aim to use the largest number of leaves possible to obtain maximize the reliability of the models, 200 leaves appears to be a reasonable sample size for estimating the parameters of a power model for jack bean leaves.

Characterization of the data set and the leaf area model
Although our experiment was performed in a single environment, we suggest that our data set was representative of leaf size in jack bean crops because we used a large number of leaves (n = 745), we took measurements at six different crop development stages and our leaf sample showed large variation in leaf width and area.Therefore, we suggest that our data set was reliable and adequate for generating models and determining an adequate sample size (number of leaves).To construct models and analyze sample size, Pompelli et al. (2012) used 1,200 Barbados nut leaves from different sites and different seasons, and Antunes et al. (2008) used 1,563 coffee leaves from eight different genotypes.Demirsoy et al. (2005) used two strawberry cultivars to generate leaf area models and observed that the models could be used to accurately estimate leaf area for seven different tested cultivars.Blanco and Folegatti (2005) generated leaf area estimation models for cucumber based on linear dimensions of the leaves and observed that models generated from data corresponding to different levels of salinity and/or grafting conditions could be used for any conditions (of grafting or salinity).This indicated that their data were representative of leaf size for cucumber plants.
In this study, the data used to generate the model and perform the analysis of sample size generally showed a good fit to a normal distribution curve, with the exception of a few cases in which there were slight deviations from normality.These deviations from the normal distribution curve may have been associated with the high number of observations in our data set (n = 745 leaves), as previously described by Toebe et al. (2012).Therefore, we suggest that the data used in this study were reliable for generating leaf area estimation models and analyzing sample size.
Although we used the same data set as Toebe et al. (2012), the small differences in the a, b and R 2 estimates between our model (Ŷ = 4.2049x 1.8215 , R 2 = 0.9701 with n = 745 leaves) and the model obtained by Toebe et al. (2012) (Ŷ = 3.7046x 1.8747 R 2 = 0.9757 with n = 605 leaves) can be attributed to the following: 1) In our study, the estimates were obtained using the data from all leaves (n=745 leaves), whereas Toebe et al. (2012) used a random sample of 605 leaves for the generation of the model and used the remaining 140 leaves for the validation of the model, and 2) In our study, the estimates for a, b and R 2 , were obtained using an iterative process until convergence was achieved, with the goal of minimizing the error sum of squares, whereas Toebe et al. (2012) used a different process.The value of Pearson's linear correlation coefficient for the leaf area estimates of the two models was r = 0.9999 (p=0.000),indicating that both models can be used.However, the use of our model (Ŷ = 4.2049x 1.8215 , R 2 = 0.9701), which was generated using an iterative process using the entire data set (which is more representative of jack bean leaf size), should be favored over the Toebe et al. (2012) model.Power models for the estimation of the leaf area based leaf width have also been shown to be highly accurate for other crops, as observed by Williams and Martinson (2003) for two grapevine cultivars (R 2 ≥ 0.9632) and by Cargnelutti Filho et al. (2012b) for the velvet bean (R 2 =0.9886).

Sample size (number of leaves)
In this study, a sample of 10 leaves was insufficient to build accurate power models for the estimation of leaf area for the jack bean.The increase in the number of leaves used to generate the model reduced the confidence intervals for the parameters a and b and the coefficient of determination of the model, resulting in accuracy gains.There was a pronounced increase in the accuracy gain with increases in the sample size up to a sample of 200 leaves, beyond which the gain was negligible (Figures 5,6 and 7 and Table 2).Based on this information, the use of a 200 leaf sample is recommended for generating power models for leaf area estimation based on the width of the central leaflet of jack bean leaves.Models generated with data from a smaller number of leaves can be inaccurate and their parameters may be biased.Models generated with a larger number of leaves are more accurate, but the accuracy gain resulting from increasing the sample size beyond 200 leaves tends to be progressively smaller, and does not compensate for the extra resources that must be used to obtain measurements from the larger sample size.A sample size of 200 leaves was also considered optimal for the estimation of coffee leaf area (ANTUNES et al., 2008).Pompelli et al. (2012), however, recommended the use of approximately 415 leaves for accurate estimation of leaf area for the Barbados nut.According to these authors, previously published models using 89 and 250 leaf samples may not have been accurate and may have resulted in biased parameter estimates.
Although the sample size determined in this study to be adequate for the jack bean plant was similar to the sample size determined to be adequate for coffee (n = 200 leaves; Antunes et al., 2008), it was smaller than the sample size suggested by Pompelli et al. (2012) for the Barbados nut.In addition to differences in modeling methods, different sample sizes may affect the independent variables used and the results for tested crops.Considering the high importance of leaf area estimation models in agronomy studies, the high number of models that are generated and the specific characteristics of each crop, we suggest that further studies of sample size must be completed to attain strong models for other agricultural crops.Such studies would lend support and reliability to published models.

CONCLUSIONS
For the jack bean plant, the power model Ŷ = 4.2049x 1.8215 , R 2 = 0.9701 based on the width of the central leaflet is an adequate model for estimating leaf area as determined using digital photos.
Measurements from a sample of 200 leaves was sufficient for generating accurate power models for estimating the leaf area (Y) of jack beans as a function of the width of the central leaflet (x), with leaf area determined using digital photos.do Rio Grande do Sul) for financial support and a scientific initiation scholarship.

Figure 1 .
Figure 1.Representation of a leaf of the jack bean (Canavalia ensiformis), comprised of a left, central and right leaflet.The location of the measurement of the width of the central leaflet (x) is shown.

Figure 2 .
Figure 2. Frequency histogram and normal distribution curve of the width of the central leaflet (x), in cm, of the jack bean (Canavalia ensiformis) leafs collected at 29, 43, 57, 73, 87 and 101 days after emergence (DAE).

Figure 5 .
Figure 5. Percentile 2.5%, mean and percentile 97.5% for 3,000 estimates of parameter a in the power model (Y = ax b ) for the leaf area (Y) of the jack bean (Canavalia ensiformis) as a function of the width of the central leaflet (x), generated using a digital photo method for leaf area.Percentiles and mean are shown for each sample size (number of leaves -ten in ten leaves).

Figure 6 .
Figure 6.Percentile 2.5%, mean and percentile 97.5% for 3,000 estimates of parameter b in the power model (Y = ax b ) for the leaf area (Y) of the jack bean (Canavalia ensiformis) as a function of the width of the central leaflet (x), generated using a digital photo method for leaf area.Percentiles and mean are shown for each sample size (number of leaves -ten in ten leaves).

Figure 7 .
Figure 7. Percentile 2.5%, mean and percentile 97.5% for 3,000 estimates of the coefficient of determination (R 2 ) in the power model (Y = ax b ) for the leaf area (Y) of the jack bean (Canavalia ensiformis) as a function of the width of the central leaflet (x), generated using a digital photo method for leaf area.Percentiles and mean are shown for each sample size (number of leaves -ten in ten leaves).

Table 1 .
Number of leaves (n), minimum, maximum, mean, coefficient of variation (CV), variance, kurtosis, skewness and Kolmogorov-Smirnov test p-value for width of the central leaflet (x) and leaf area (sum of the areas of the left, central and right leaflets) (Y), determined using digital photos of 745 leaves of jack bean (Canavalia ensiformis)