Changes in AIDA Fitting Scheme
This page describes proposed changes to the AIDA Fitting. "Main Idea" gives an overview of proposed changes, "Use Cases" contains examples of usage and "Detailed Description" has full specification of interfaces.
IFitFunction - no longer needed
IFitter - NEW, does the job of fitting
IFitterFactory - NEW, creates IFitter of specified type
IFitResult - NEW, holds fit results, is created by IFitter
IVariable - NEW, represents variable or parameter, has "Value", "Error",
set of internal states, and can be connected to ITuple
In the most general terms fitting process requires a set of Data (DataSource) and a function in order to produce a result. Fit result does not have to be part of the fit itself – imagine making several fits and comparing results later. Schematically fitting can be represented by:
Result = Fitter.fit(DataSource, Function) ,
where different Fitters can do different kinds of fits. This approach provides a lot of flexibility and at the same time has enough structure to be easily used and implemented.
1. DataSource
With a few extensions ITuple interface can be used to represent a DataSource. New methods in ITupleFactory:
public ITuple create(IHistogram histogram)
public ITuple create(ICloud cloud)
provide ITuple wrapper around IHistogram and ICloud, allowing uniform access to the data. Points in ICloud and bins in IHistogram are mapped to ITuple rows.
fit(IHistogram1D h, IFunction f);
2. Function
In AIDA context IFunction is used for three main purposes:
a) to plot or calculate value
b) to do chi-Square fits
c) to do unbinned Maximum Likelihood (ML) fits
Item c) needs some clarification. In order to be used in a ML fit, function must be a PDF: always positive and normalized over the range of the dependent variables. The last requirement introduces normalization factor N that depends on analysis cuts.
Example: f(x) = ax+b, x – variable, a, b - fit parameters
For a chi-Square fit we just have to find set of parameters (a, b) that
minimizes sum of squares of weighted deviations. For a ML fit f(x) has
to be normalized to some value N = integral[f(x)] from xmin to xmax. In this
case N = ½ a(x2max – x2min) + b(xmax – xmin). Now N depends on parameters
a, b and the range of x, so in order to get correct result from the ML fit,
function f(x)/N should be used in the fit, not f(x).
In order for ML fit to be efficient, function should be able to calculate its own normalization factor for any point and any set of parameters that are likely to be used in a fit. To address this issue two new methods are included in IFunction interface:
public boolean supportsNormalization()
public void setNormalization(double N)
The first method returns "true" if function can calculate its own normalization and "false" otherwise. Second method sets the normalization constant N (usually N=1) and turns function into a PDF (in some cases amplitude parameter of the function has to be set fixed, …). These methods should be used by IFitter rather than by user.
3. Variable
IVariable is a new interface designed to help keep track of values, errors, states and ranges of various parameters and variables. IVariable can represent variable or parameter, it has "Value", "Error", set of internal states, and it also has ability to take its values from a DataSource in some orderly fashion. It is important that IVariable can support a set of ranges, not just one (x_min, x_max) pair.
IVariable does not have a factory and can not be created directly, instead IVariable can be returned by an IFunction or can be "derived" from ITuple:
ITuple: public IVariable variable(String name, String label, IEvaluator ev)
IFunction: public IVariable variable(String name)
public void connect(IEvaluator ev)
public void connect(ITuple data)
4. Fitting
Here we introduce three new interfaces: IFitter, IFitterFactory and IFitResult.
In a standard AIDA fashion, IFitter is created by IFitterFactory with "create" method:
public IFitter create(String type),
where "type" describes what kind of fitter is created. Among other methods, IFitter has "fit" method that returns IFitResult:
public IFitResult fit(ITuple data, IFunction function).
IFitResult contains all results relevant to this particular fit. IFitter should have several "setup" methods, like what results to save in IFitResult, how to do the fit, etc.
Note that there is a problem with displaying fit results – IFunction can retain parameter values obtained in the fit, and we can even put a switch in IPlotter (or in IFunction) to display values and errors of IFunction's parameters, but IFunction can not have any fit-related information. Such information (like chi-2, fit quality, parameter scans, etc.) belongs in IFitResult and it looks like this information has to be extracted and displayed "by hand". There is a possibility of making IFitResult a hash-bag, so that IFitter can just drop there any result it is configured to save and mark each individual result "displayable" or "non-displayable". It would be convenient for IFitResult to retain reference to histogram and function used in the fit. Then we can have IPlotter.plot(IFitResult result) method that automatically displays histogram, function plus all "displayable" information in the current region.
1. Below is a simple example of fitting histogram with a Gaussian function, as it stands in the current version of AIDA: creating function factory, then function and then asking function to do fit.
import
hep.aida.*;
import
java.util.Random;
public
class FitExample
{
public static
void main(String[] args)
{
// Create factories
IAnalysisFactory
analysisFactory = IAnalysisFactory.create();
ITreeFactory treeFactory =
analysisFactory.createTreeFactory();
ITree tree = treeFactory.create();
IPlotter plotter =
analysisFactory.createPlotterFactory().create("Plot");
IHistogramFactory
histogramFactory = analysisFactory.createHistogramFactory(tree);
IFunctionFactory
functionFactory = analysisFactory.createFunctionFactory(tree);
// Create 1D
histogram
IHistogram1D h1d =
histogramFactory.create1D("Histogram 1D",50,-3,3);
// Fill 1D
histogram with Gaussian
Random r = new Random();
for (int
i=0; i<5000; i++)
{
h1d.fill(r.nextGaussian());
}
// Create Gaussian
fitting function
IFitFunction f =
functionFactory.createFit("Gaussian Fit",
"Gaussian Fit", "G","amplitude=1.,
mean=0., sigma=1.");
///////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Other supported
functions:
// f =
functionFactory.createFit("Exp Fit", "Exponential Fit",
"E", "amplitude= ; origin= ; exponent= ");
// f =
functionFactory.createFit("BW Fit", "Breit-Wigner Fit",
"BW","amplitude= ; origin= ; width= ");
// f =
functionFactory.createFit("Poly Fit", "Polynomial Fit",
"P", "a4= , a3= , a2= , a1= , a0= ");
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Do Fit
f.fit(h1d);
// Show results
plotter.createRegions(1,1,0);
plotter.plot(h1d);
plotter.plot(f);
plotter.show();
}
}
2. With proposed AIDA modifications the fit-related part of example will change.
There are several extra steps - create fitter factory and fitter, configure fitter, ask fitter to do fitting. "IFitter.resetFunction(boolean state)" is set to "false" here so that after fitting is done, function is not reset to its original state. We can make it a default, I just wanted to show where fitter configuration goes. Only fit-related part here:
IFunctionFactory
functionFactory = analysisFactory.createFunctionFactory(tree);
IFitterFactory
fitterFactory = analysisFactory.createFitterFactory(tree);
IFunction
f = functionFactory.create("Gauss",
"1D Gaussian", "G");
IFitter
fitter = fitterFactory.create("LSF");
// Configure fitter
fitter.resetFunction(false);
// Do fitting
IFitResult
result = fitter.fit(h1d, f);
// Plot results – just the same as before
plotter.createRegions(1,1,0);
plotter.plot(h1d);
plotter.plot(f);
plotter.show();
3. Here is more complicated example of chi-Square fit using data from ITuple.
Suppose ITuple contains data from beam position scan with following columns:
x – wire position in ADC counts
signal – absolute beam intensity
sigma – error with which "signal" was measured
…
Each row of ITuple corresponds to the next step in "x" direction. We want to see how well normalized beam intensity can be described by a double Gaussian in real detector coordinates between –1.5 and 1.5 cm. One way to do it would be to create an IHistogram and fill it with {x="0.05*x+18.6", Weight=signal/maxSignal, Error=sigma/maxSignal}, but currently IHistogram does not have "setError" method; so we'll just fit ITuple. The IXYData interface would be very useful in this particular case. IEvaluator factory is needed here:
IEvaluatorFactory
evaluatorFactory = analysisFactory.createEvaluatorFactory(tree);
IFunctionFactory
functionFactory = analysisFactory.createFunctionFactory(tree);
IFitterFactory
fitterFactory = analysisFactory.createFitterFactory(tree);
// Derive x coordinate, measurement, and error variables from
tuple
IVariable x_cm = tuple.variable("x", "x in
cm", evaluatorFactory.create(tuple, "0.05*x + 18.6"));
x_cm.setMinRange(-1.5);
x_cm.setMaxRange(1.5);
x_cm.setUnits("cm");
double maxSignal = tuple.columnMax(2);
IVariable meas = tuple.variable("meas", "scaled
signal", evaluatorFactory.create(tuple, "signal/maxSignal");
IVariable error = tuple.variable("error", "scaled
sigma", evaluatorFactory.create(tuple, "sigma/maxSignal");
// Create functions with variable already connected to tuple
IFunction
f1 = functionFactory.create("Gauss1",
"First Gaussian", "G", IVariable[] {x_cm} );
IFunction
f2 = functionFactory.create("Gauss2",
"Second Gaussian", "G", IVariable[] {x_cm} );
IFunction
f_sum = functionFactory.add(f1,f2, 0.9);
// Create and configure fitter
IFitter
fitter = fitterFactory.create("LSF");
fitter.resetFunction(false);
// Do fitting. This fit method takes function, measurement and
measurement error variables
// The only variable, x_cm, is already connected to tuple.
IFitResult
result = fitter.fit(f_sum, meas, error);
// Can plot only resulting function and histogram with wrong
errors (or without errors)
// Also there is no corresponding "ITuple.project()"
method to fill histogram with weight
plotter.createRegions(1,1,0);
plotter.plot(h1d);
plotter.plot(f_sum);
plotter.show();
4. Example of unbinned Maximum Likelihood analysis.
IEvaluatorFactory
evaluatorFactory = analysisFactory.createEvaluatorFactory(tree);
IFunctionFactory
functionFactory = analysisFactory.createFunctionFactory(tree);
IFitterFactory
fitterFactory = analysisFactory.createFitterFactory(tree);
// Derive variables from tuple
IVariable cosTheta = tuple.variable("cosTheta",
"cos of Theta");
cosTheta.setMinRange(-0.7);
cosTheta.setMaxRange(0.7);
IVariable prob_b = tuple.variable("prob_b",
"probability to be b event", evaluatorFactory.create(tuple,
"pB"));
IVariable prob_c = tuple.variable("prob_c",
"probability to be c event", evaluatorFactory.create(tuple,
"pC"));
IVariable ap_b= tuple.variable("ap_b", "analyzing
power, b event", evaluatorFactory.create(tuple,
"1/(1+exp(-abs(Qdiff)*alphaB))");
IVariable ap_c= tuple.variable("ap_c", "analyzing
power, c event", evaluatorFactory.create(tuple,
"1/(1+exp(-abs(Qdiff)*alphaC))");
// Register my function factory and create my function.
Registering does not make much sense here
functionFactory.register(new MyFunctionFactory());
IFunction
func = functionFactory.create("myFunction",
"My fit Function", "myFunc", new IVariable[] {cosTheta, prob_b,
prob_c, ap_b, ap_c} );
// Create filters and add them to the tuple
IFilter
filter1 = filterFactory.create("Evis>20. && nTracks>=7
&& abs(cosTheta)<0.7");
IFilter
filter2 = filterFactory.create("(Q1!=0 || Q2!=0) &&
abs(Qdiff)<20");
tuple.addFilter("event
cuts", filter1);
tuple.addFilter("extra
cuts", filter2);
// Create and configure fitter
IFitter
fitter = fitterFactory.create("MLF");
fitter.resetFunction(false);
fitter.keepScan("asymmetry");
// Do fitting
IFitResult
result = fitter.fit(func);
IHistogram1D
h = result.scan("asymmetry");
1.
IVariables
public
interface IVariable
{
public String label();
public String name();
public void setValue(double
value);
public double value();
public double error();
// Set variable ranges
public void setRange(double
lower, double upper);
public void addRange(double
lower, double upper);
public double[][2] range();
// If dependent is "true", this
is a variable, if "false" - parameter
public void setDependent(boolean
state);
public boolean isDependent();
// States related to fitting
public void setStep(double
step);
public void setFixed(boolean
state);
public boolean isFixed();
// Even if bounds are defined, they don't
have to be used
public void setUseBounds(boolean
state);
public boolean useBounds();
// IVariable can be connected to ITuple to
derive its value
// from the current ITuple row
public boolean isConnected();
public ITuple connection();
public void connect(ITuple
data);
public void connect(IEvaluator
ev);
// Units can be used to annotate plot axis
public void setUnits(String
units);
public String units();
}
There
is no
IVariableFactory class. IFunction can return its variables and
parameters in a form of IVariable, or IVariable can be "derived" from
ITuple.
2.
IFunctions
IFunctions
should now be based on IVariables. To make it easier for user we still keep
"lazy" create(String
name, String label, String type) method for IFunction creation. In that case
IVariables will be created by IFunction internally. Alternatively user can
supply a vector of pre-configured IVariables. I can imagine that this vector
can be shorter than required, in which case missing IVariables will be created
by IFunction internally.
public
interface IFunctionFactory
{
public IFunction create(String
name, String label, String type);
public IFunction create(String
name, String label, String type,
String
options);
public IFunction create(String
name, String label, String type,
IVariable[]
variables);
public IFunction create(String
name, String label, String type,
IVariable[]
variables, IVariable[] parameters);
public IFunction createScripted(String
name, String label,
String script) ;
public IFunction createScripted(String
name, String label,
String
script, IVariable[] variables) ;
public IFunction createScripted(String
name, String label,
String script,
IVariable[] variables, IVariable[] parameters) ;
// Allows user to register new function
factory
// IUserFunctionFactory should have method
"String[] types()" that
// return all IFunction types that this
user factory can make
public void register(IUserFunctionFactory
userFunctionFactory);
// Do arithmetics with functions, Example:
"add" f_new=f1*p+f2*(1-p)
// Note that "div" does not make
much sence for PDFs
public IFunction add(IFunction
f1, IFunction f2, double p);
public IFunction add(IFunction[]
f, double[] p);
public
IFunction mul(IFunction
f1, IFunction f2, double p);
public IFunction mul(IFunction[]
f, double p);
public IFunction div(IFunction
f1, IFunction f2);
}
public
interface IFunction
{
public int dimension();
public String label();
// Deal with variables and parameters
public IVariable variable(String
name);
public IVariable[] variables();
public IVariable parameter(String
name);
public IVariable[] parameters();
// Get value based on internal state of the
function
public double value();
// Get value of the function without
changing its internal state
public double value(double[]
point);
public double value(double[]
point, double[] parameters);
// Can be used by IFitter, maybe should be
excluded from the user
// interface together with support for
derivatives
public
boolean supportsNormalization();
public
void setNormalization(double
N);
}
Note that in order to change variable or parameter settings (names, values, limits, fixed, connection, ...) user now have to talk to the corresponding IVariable directly.
3. ITuple
public
interface ITupleFactory
{
public ITuple
create
(String name, String label,
String[] ColumnNames,
Class[]
columnType, string options);
public ITuple
create
(String name, String label,
String columns,
String
options);
// New methods
public ITuple create(String
name, String label, IHistogram hist);
public ITuple create(String
name, String label, ICloud cloud);
public ITuple create(String
name, String label, ITuple tuple,
IFilter filter);
public ITuple create(String
name, String label, ITuple tuple,
IFilter[]
filters);
public ITuple chain(ITuple[]
set);
public ITuple merge(ITuple[]
set);
}
Method
"chain()" makes a chain of ITuples with the same internal structure
while "merge()" creates a union of columns from all ITuples in the
set. Exact rules how to do it must be defined. Can create ITuple from another
ITuple by applying filters.
Also
we need to include following methods in ITuple:
public
interface ITuple
{
...
// New ITuple methods here:
// Need ability to "derive"
variable from an ITuple
public IVariable variable(String
name, String label);
public IVariable variable(String
name, String label, IEvaluator ev);
// Need more "project" methods.
Same with 2D and 3D histograms
public boolean project(IHistogram1D
hist, IEvaluator value);
public boolean project(IHistogram1D
hist, IEvaluator value,
IEvaluator weight);
public boolean project(IHistogram1D
hist, IEvaluator value,
IEvaluator weight, IFilter filter);
public boolean project(IHistogram1D
hist, IVariable value);
public boolean project(IHistogram1D
hist, IVariable value,
IVariable weight);
public boolean project(IHistogram1D
hist, IVariable value,
IVariable weight, IFilter filter);
}
4.
Fitting
public
interface IFitterFactory
{
// "type" can be something like
LSF, UMLF ...
public IFitter create(String
type);
// Allows user to register new fitter types
and fitter factory
// that makes them
public IFitter register(String[]
userTypes, IFitterFactory
userFitterFactory);
}
public
interface IFitter
{
// Various fit control methods here
public void keepScan(String
parameterName);
public void keepScan(String
parameterName1, String parameterName2);
public void resetFunction(boolean
state);
...
// Does fitting and returns results in a
form of IFitResult object.
// Fitter first connects all function's
not-yet-connected variables to
// ITuple
public IFitResult fit(ITuple
data, IFunction function);
public IFitResult fit(IHistogram
data, IFunction function);
public IFitResult fit(ICloud
data, IFunction function);
// If all function's variables are already
connected, can use this
// method
public IFitResult fit(IFunction
function);
// Specific for chi-Square fit
public IFitResult fit(IHistogram
data, IFunction f, IVariable meas, IVariable error);
public IFitResult fit(IHistogram
data, IFunction f, IVariable[] meas, IVariable[] error);
// Returns IFitter type
public String getType();
}
// Need more thinking here
public
interface IFitResult
{
public double chiSquared();
public double degreeOfFreedom();
public IVariable[] parameters();
public IVariable parameter(String
name);
public IHistogram1D scan(String
parameterName);
public IHistogram2D scan(String
parameterName1, String parameterName2);
}
Questions:
1.
Should there be single interface for
chi-Square and ML fitters? Configuration and fitting procedure can be quite
different.
2.
How should IEvaluator handle
external parameters and do we need both IEvaluator and IVariable? Maybe
IEvaluator should be hidden from user?
3.
How do we do simultaneous fits of
several functions and fits with extra constrains?
4.
It is possible to split IVariable into
two different interfaces, one describes variable, another one describes
parameter (see IVariable
alternative).
...