In a standard linear regression, a model is fit to a set of data (the training data); the same linear model applies to all the data. In local regression methods, multiple models are fit to different neighborhoods of the data. A kernel function is used to determine the contribution of the “neighborhood data” to the predicted value. A simple kernel method is k-nearest neighbors – for a given record with known predictor values and unknown y, the y values for the k closest data records (in terms of their predictor values) are averaged, and that average is the predicted y value for the record being considered.
Often the kernel function attaches higher weights to closer records, and lower weights to more distant records. This has the effect of smoothing, so that the regression curve is not “jerky,” as it is with equally-weighted neighbors. Chapter 6 in The Elements of Statistical Learning, available online here, provides an excellent exposition.