In-Browser Semantic Search Demo

A searchable library of 42 machine-learning concepts. Search by meaning, not keywords. A transformer-based model converts this library of documents into vectorized embeddings. When you query (search) the library, cosine similarity is used to find the best matches.

To search, load a ~25 MB embedding model (all-MiniLM-L6-v2) directly from Hugging Face to your browser. All inference is local and the model is cached after first load. You can browse the library of all documents below.

Full Document Library

Fundamentals

Supervised vs. unsupervised learning

Supervised learning trains on labeled examples to predict a known target, like spam-or-not-spam. Unsupervised learning finds structure in unlabeled data, such as grouping similar customers. Most business problems with a clear outcome to predict are supervised.
Training, validation, and test sets

Data is split three ways: a training set to fit the model, a validation set to tune settings and compare models, and a held-out test set used only once to estimate real-world performance. Never let test data influence training.
Overfitting and underfitting

A model overfits when it memorizes the training data and fails on new examples, and underfits when it is too simple to capture the pattern at all. The classic sign of overfitting is excellent training accuracy but poor validation accuracy.
The bias–variance tradeoff

Bias is error from overly simple assumptions; variance is sensitivity to small changes in the training data. Lowering one often raises the other, so the goal is the balance that minimizes total error on data the model has never seen.
Cross-validation

K-fold cross-validation splits data into k parts, training on k-1 and testing on the remaining fold, rotating through all of them. Averaging the scores gives a more reliable estimate of performance than a single split, especially on smaller datasets.
Regularization (L1 and L2)

Regularization discourages overly complex models by penalizing large weights. L2 (ridge) shrinks weights smoothly, while L1 (lasso) can push some to exactly zero and effectively select features. Both are common tools for reducing overfitting.

Data

Feature engineering

Feature engineering means building informative inputs from raw data — extracting the day of week from a timestamp, or combining two columns into a ratio. Strong features often improve results more than switching to a fancier model.
Feature scaling and normalization

Many algorithms assume features are on comparable scales, so we standardize them to zero mean and unit variance, or rescale to a 0–1 range. Distance-based and gradient-based models are especially sensitive to inputs left on wildly different scales.
Handling missing data

Real datasets have gaps. Common strategies are dropping rows, filling with the mean or median, or predicting the missing value with another model. The right choice depends on why the data is missing and how much of it is absent.
Encoding categorical variables

Models need numbers, so categories like color or country are converted using one-hot encoding for unordered values or ordinal encoding when there is a natural ranking. Categories with many distinct values may use target or embedding-based encodings.
Class imbalance

When one class is rare — like fraud among normal transactions — a model can post high accuracy while ignoring it entirely. Fixes include resampling the data, applying class weights, and judging the model with precision and recall instead of accuracy.
Data leakage

Leakage happens when information unavailable at prediction time sneaks into training, producing scores that look great but collapse in production. A frequent cause is computing statistics across the entire dataset before splitting into train and test.

Algorithms

Linear regression

Linear regression predicts a continuous number by fitting a straight-line relationship between the inputs and the output. It is fast, interpretable, and a strong baseline for regression problems before reaching for anything more complex.
Logistic regression

Despite the name, logistic regression is a classification method that estimates the probability an example belongs to a class. It is a reliable, interpretable baseline for binary problems such as customer churn or click prediction.
Decision trees

A decision tree makes predictions by asking a sequence of yes/no questions about the features. Trees are easy to interpret and handle mixed data types, but a single deep tree tends to overfit the training data.
Random forests

A random forest trains many decision trees on random subsets of the data and features, then averages their votes. This ensemble is accurate, robust, and needs little tuning, which makes it a popular first model for tabular data.
Gradient boosting

Gradient boosting builds trees one after another, each correcting the errors of the last. Libraries like XGBoost and LightGBM are top performers on tabular data and win a large share of real-world prediction competitions and projects.
k-nearest neighbors

k-nearest neighbors classifies a point by the majority label among its closest examples. It needs no real training, but it slows down on large datasets and depends heavily on sensible feature scaling and a good distance metric.
k-means clustering

k-means is an unsupervised method that groups data into k clusters by repeatedly assigning each point to the nearest center and then recomputing those centers. It is widely used for customer segmentation and quick exploratory analysis.
Cosine similarity

Cosine similarity measures how alike two vectors are by the angle between them rather than their length, scoring near 1 when they point the same way and near 0 when they are unrelated. It is the standard way to compare text embeddings, and ranking by it is what powers semantic search and retrieval. Its inverse, cosine distance (1 minus the similarity), measures the same thing flipped — smaller means more similar — which is the form many libraries and vector databases expect so results sort nearest-first.
Vector databases and nearest-neighbor search

A vector database stores embeddings and quickly finds the closest ones to a query using approximate nearest-neighbor search, trading a little accuracy for speed at scale. It is the storage and retrieval engine behind production semantic search and RAG systems.

Evaluation

Precision, recall, and F1 score

Precision is how many predicted positives are actually correct; recall is how many of the true positives you managed to catch. F1 balances the two. Which matters more depends on whether false positives or false negatives are costlier.
The confusion matrix

A confusion matrix tabulates true positives, false positives, true negatives, and false negatives. It is the foundation for nearly every classification metric and shows exactly what kinds of mistakes a model is making.
ROC curve and AUC

The ROC curve plots the true-positive rate against the false-positive rate across all thresholds, and AUC condenses it into one number. An AUC of 0.5 is random guessing, while 1.0 means the model perfectly separates the classes.
Regression metrics: RMSE, MAE, R²

For numeric predictions, mean absolute error is easy to interpret, root mean squared error punishes large mistakes more harshly, and R² reports the fraction of variance explained. Choose the metric that matches the real cost of being wrong.
Choosing the right evaluation metric

Accuracy can be misleading, particularly with imbalanced classes. The metric should reflect the actual goal — catching rare events, ranking items, or minimizing costly errors — and the team should agree on it before any modeling begins.

Deep Learning

Neural network fundamentals

A neural network stacks layers of simple units that each compute a weighted sum followed by a nonlinear function. Stacking many layers lets the network learn complex patterns directly from raw inputs such as images, audio, or text.
Gradient descent and learning rate

Gradient descent improves a model by nudging its weights in the direction that reduces error. The learning rate sets the step size: too large and training diverges, too small and it crawls. It is often the single most important knob to tune.
Activation functions

Activation functions inject the nonlinearity that lets networks model complex relationships. ReLU is the usual default for hidden layers, while sigmoid and softmax convert outputs into probabilities for binary and multi-class predictions.
Dropout

Dropout randomly switches off a fraction of neurons during each training step, forcing the network not to depend on any single path. It is a simple, effective regularization technique for reducing overfitting in deep models.
Transfer learning and fine-tuning

Rather than training from scratch, transfer learning starts from a model already trained on a huge dataset and adapts it to your task with relatively little data. Fine-tuning this way is the standard approach for most modern vision and language work.
Embeddings

An embedding maps something like a word, image, or product into a vector of numbers so that similar items land close together in that space. Embeddings power semantic search, recommendations, and feeding text into downstream models.

NLP & LLMs

Tokenization

Tokenization breaks text into smaller pieces — whole words or subword chunks — that a model can turn into numbers. The tokenizer affects vocabulary size, sequence length, and how gracefully the model handles rare or invented words.
Transformers

The transformer is the architecture behind modern language models. Its attention mechanism lets the model weigh how strongly each word relates to every other word, capturing long-range context far better than older sequence models did.
Retrieval-augmented generation (RAG)

RAG improves a language model by first retrieving relevant documents with vector search and inserting them into the prompt. This grounds answers in real sources, reduces hallucination, and adds new knowledge without retraining the model.
Prompt engineering

Prompt engineering is the craft of writing the instructions and examples given to a language model to get reliable output. Useful techniques include clear directions, a few worked examples, and asking the model to reason step by step.
Hallucination and grounding

A language model hallucinates when it produces something fluent but false. Grounding its answers in retrieved facts, citing sources, and constraining the output format are the main ways to keep generated responses trustworthy in production.

MLOps

Model deployment and inference serving

Deployment turns a trained model into a running service that returns predictions, usually behind an API. The key concerns are latency, throughput, scaling under load, and packaging the model together with its exact dependencies.
Model monitoring and data drift

Models decay as the world changes and incoming data drifts away from what they were trained on. Monitoring tracks prediction quality and input statistics over time so the team can retrain before performance quietly degrades.
Hyperparameter tuning

Hyperparameters are settings chosen before training, such as tree depth or learning rate. Grid search, random search, and smarter Bayesian methods automate the search for the combination that performs best on the validation data.
Experiment tracking and reproducibility

Experiment-tracking tools record the code, data version, settings, and metrics for every run so results can be compared and reproduced later. Fixing random seeds and pinning the environment keep experiments trustworthy and shareable across a team.
Batch vs. real-time inference

Batch inference scores large volumes of data on a schedule, while real-time inference answers individual requests within milliseconds. The choice shapes the architecture, the cost, and the latency requirements of the entire system.

Search Results