Mathematical Programming Solver with SQLFlow
- Linear programming
- Nonlinear programming
- Mixed-integer linear programming
- Mixed-integer nonlinear programming
To understand what’s a mathematical programming problem, let’s take a look at this example (example originally published at ):
Giapetto’s Woodcarving, Inc., manufactures two types of wooden toys: soldiers and trains.
A soldier sells for $27 and uses $10 worth of raw materials. Each soldier that is manufactured increases Giapetto’s variable labor and overhead costs by $14. A train sells for $21 and uses $9 worth of raw materials. Each train built increases Giapetto’s variable labor and overhead costs by $10. The manufacture of wooden soldiers and trains requires two types of skilled labor: carpentry and finishing. A soldier requires 2 hours of finishing labor and 1 hour of carpentry labor. A train requires 1 hour of finishing labor and 1 hour of carpentry labor. Each week, Giapetto can obtain all the needed raw material but only 100 finishing hours and 80 carpentry hours. Demand for trains is unlimited, but at most 40 soldiers are bought each week. Giapetto wants to maximize weekly profit (revenues-costs).
Let
- x be the number of soldiers produced each week
- y be the number of trains produced each week
Then the objective is:
Maximise Z = (27 - 10 - 14)x + (21 - 9 - 10)y = 3x + 2y
The constraints are:
- 2x + y <= 100 (Finishing Constraint)
- x + y <= 80 (Carpentry Constraint)
- x <=40 (Constraint on demand for soldiers)
- x,y >= 0 (sign restriction)
To solve this problem, we can use tools like AMPL or open-source tools like , or Matlab, or .
- Using Matlab and R are quite the same, they require users to define the constraints as matrixes and call a function to get the result. They both have their own grammar, and you have to write code according to their language specifications.
- Using “AMPL like” high-level language to describe the problem, and call the corresponding solvers. Pyomo and AMPL have similar components in their grammar: sets, parameters, variables, objectives, constraints (https://pyomo.readthedocs.io/en/stable/pyomo\_modeling\_components/index.html).
So we can see that using AMPL is a modern and general way of describing mathematical programming problems. We can simply write the AMPL snippet to describe the above problem:
In order to extend SQL to have completely same ability of AMPL, the extended syntax should be able to describe objective and constraints while the input data table can store the params for each variable, and the rows in the table is naturally become the set we defined in AMPL.
Then we have the below table :
In the woodcarving
:
- The set X is row one and row two.
- We have one variable, and the variable name strings is stored in column
product
. In cases that have cross variables (like the example described at ), the table should have multiple string columns to store the variable names. - Other columns like
price
,materials_cost
are all params for the corresponding variable.
Then we can use below extended SQL syntax to describe above example:
SELECT * FROM woodcarving
TO MAXIMIZE SUM((price - materials_cost - other_cost) * amount)
CONSTRAINT SUM(finishing * amount) <= 100,
SUM(carpentry * amount) <= 80,
amount <= max_num
WITH variable="amount(product)",
var_type="Integers"
[USING glpk]
INTO result_table;
In the SQL statement:
CONSTRAINT ...
expression strings that describe the constraints, can have multipleCONSTRAINT
expressions separated by comma.WITH
attributes:variable="amount(product)"
: required, specify the variable definition,product
is the column that stores the variable name. Using comma to separate if there are multiple variables, e.g.shipment(plants,markets)
.var_type="Integers"
: optional, specify the variable type, there should only be one variable in current cases. The format is likevar_type="Type"
, the type can beIntegers
,NonNegativeIntegers
,Reals
etc. The default variable type isIntegers
.
USING
: optional, solver tool to use, default: glpk.- : set the result table name.
After the SQL statement finishes execution, the result table result_table
should look like:
product | amount |
---|---|
soldier | 20 |
train | 60 |
Combinatorial Optimization Problem (https://en.wikipedia.org/wiki/Combinatorial\_optimization) is a subset of mathematical optimization that is widely used in real life. Here we demostrate how to use a SQL statement to solve a typicall combinational optimization problem.
Plants capacity table:
Markets demand table:
markets demand marketA 130 marketB 60 Plants to markets distance table:
When we start to solve the problem, we’d like to join the tables beforehand:
Then we have a “joined” table like below to start the solving process:
plants markets distance capacity demand plantA marketA 140 100 130 plantA marketB 210 100 60 plantB marketA 300 90 130 plantB marketB 90 90 60
Then we can use below extended SQL syntax to describe above example:
SELECT src.plants, src.markets, src.distance, plants.capacity, markets.demand FROM transportation AS src
LEFT JOIN plants ON src.plants = plants.plants
LEFT JOIN markets ON src.markets = markets.markets
TO MINIMIZE SUM(shipment * distance * 90 / 1000)
CONSTRAINT SUM(shipment) <= capacity GROUP BY plants,
SUM(shipment) >= demand GROUP BY markets
WITH variable="shipment(plants,markets)",
var_type="Integers"
[USING glpk]
INTO result_table;
- In the above SQL statement, the syntax is quite the same as the single variable example, yet
- The
CONSTRAINT
including aGROUP BY
clause is a “partial aggregation constraint”, takeCONSTRAINT SUM(markets) <= capacity GROUP BY plants
as an example, it means:- for each plant,
- the sum of “shipment amount to each market from current plant” should be less than the current plant’s capacity.
Then after the solving job has completed, we should have below contents in the result_table
(the result column is a fake result for demonstration):
Support any aggregation functions accross rows that the programming solvers support. We may need to add support more syntax than SUM
in the future.
- Update our extended syntax parser to support
TO MAXIMIZE|MINIMIZE
clauses. - Add an IR struct to represent
TO MAXIMIZE|MINIMIZE
clause. - Create a table to store the result.
- Add code generator to generate code like below example to run, for different mathematical programming software, we may need to add different code generators. Since we extend SQL to have the same ability that AMPL has, we can almost connect to any software we like.
- The generated code should be able to output the result to the result table.
from pyomo.opt import SolverFactory
from pyomo.environ import (ConcreteModel, Var, Objective, maximize, Constraint, NonNegativeIntegers)
model = ConcreteModel()
# input dataframe
data_df = ... # Construct dataframe from DBMS driver here.
# variable
size = len(data_df.type)
model.x = Var(variables, within=NonNegativeIntegers)
# object
def objRule(model):
return sum([(data_df.price[i] - data_df.materials_cost[i] - data_df.other_cost[i]) * model.x[i] for i in model.x])
model.Obj = Objective(rule=objRule, sense=maximize)
# constrains
def rule1(model):
return sum([data_df.finishing[i] * model.x[i] for i in model.x]) <= 100
model.c1 = Constraint(rule=rule1)
def rule2(model):
return sum([data_df.carpentry[i] * model.x[i] for i in model.x]) <= 80
model.c2 = Constraint(rule=rule2)
def rule3(model):
return model.x[i] <= data_df.max_num[i] for i in model.x
model.c3 = Constraint(rule=rule3)
if __name__ == '__main__':
with SolverFactory('glpk', executable='glpsol') as solver:
results = solver.solve(model)
print(results)
model.display()
result_data = pd.DataFrame(columns=['var_id', 'var', 'x'])
result_data['var_id'] = [i for i in model.x]
result_data['x'] = [model.x[i]() for i in model.x]