# Glossary ## A ```{glossary} Accuracy A metric for evaluating {term}`classification` models based on the fraction of predictions our model got correct. Mathematically, accuracy has the following definition: - $Accuracy = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}}$ For {term}`Binary Classification`, it is defined as the following: - $Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)}$ \ where: - TP = {term}`True Positive Rate (TP)` - TN = {term}`True Negative Rate (TN)` - FP = {term}`False Positive Rate (FP)` - FN = {term}`False Negative Rate (FN)` A binary example with benign and malignant tumor detection rates is shown in the following image: ![binary accuracy example](./images/binary_accuracy_example.png) AUROC The Area Under the ROC Curve (AUROC) is a metric that measures the performance of a {term}`classification` model at all possible classification thresholds. It's calculated by measuring the two-dimensional area underneath a ROC curve from (0,0) to (1,1). AUROC can range from 0 to 1, with higher values indicating better performance: An example scale might be: - 0: A perfectly inaccurate test - 0.1-0.4: Unacceptable. Inaccurate a majority of the time - 0.5: A random model or no discrimination - 0.6: Unacceptable. Low discrimination - 0.7–0.8: Acceptable - 0.8–0.9: Excellent - 1: A perfect model that can correctly distinguish between all positive and negative class points Artificial Intelligence (AI) Artificial Intelligence, or AI, is technology that enables computers and machines to simulate human intelligence and problem-solving capabilities. It is modeled after the decision-making processes of the human brain that can ‘learn’ from available data and make increasingly more accurate classifications or predictions over time. For the applications in {term}`DataEval`, Neural Networks are the main modeling method. See {term}`Neural Network` Aspect Ratio For Images, the ratio of the width (in pixels) over the height (in pixels) - $Aspect Ratio = \frac{width}{height}$ See {term}`Image Size` Autoencoder An autoencoder is a type of artificial {term}`neural network` that learns efficient encodings of unlabeled data by doing {term}`unsupervised learning`. An autoencoder learns two functions: an encoding function that transforms the input data into a {term}`latent space`, and a decoding function that recreates the input data from the encoded representation. Typically used for {term}`dimensionality reduction`. Average Pooling A type of {term}`pooling layer` that calculates the average value from a group of pixel values produced by a {term}`convolutional layer`. Typically used in a {term}`convolutional neural network` to reduce the dimensionality between layers. ``` ## B ```{glossary} Balance A measure of co-occurrence of metadata factors with class labels. Metadata factors that spuriously correlate with individual classes may allow a model to learn shortcut relationships rather than the salient properties of each class. Bayes Error Rate (BER) In statistical classification, bayes error rate is the lowest possible error rate for any classifier of a random outcome (into, for example, one of two categories) and is analogous to the {term}`irreducible error`. A number of approaches to the estimation of the bayes error rate exist. In general, it is impossible to compute the exact value of the bayes error. Bias The systematic error or deviation in a model's predictions from the actual outcomes. Bias can arise from various sources, such as a skewed or imbalanced dataset, incomplete feature representation, or the use of biased algorithms. Binary Classification Binary classification is a fundamental task in {term}`machine learning`, where the goal is to categorize data into one of two classes or categories. Black-box Shift Estimation (BBSE) A method for measuring {term}`label shift