Open Source Initiative: The good, the bad and the questions

The EPANET Open Source Initiative is great. A community that cares is the best thing that can happen to software like EPANET. Many people has shown their interest and support at this Forum and in the CCWI Conference held last week at De Montfort University (Leicester, UK) there was an interesting talk dedicated to this initiative. In the next WDSA Conference to be held in “La Universidad de los Andes” (Cartagena, Colombia) a special session about this initiative will also take place as announced by Juan Saldarriaga at the CCWI. The situation is perfect for giving a real impulse to the project and I´m enthusiastic about it.

Nevertheless, the reality is showing some hard facts. Despite the support of many people there are only 13 contributors to the code (registered at the EPANET repository) and from them only 2 were actively making changes in the code repository in the last month. Developers are missing but also vendors are missing (people running business related to the result of the project). Even with engaged developers, it is hard to keep the project alive in the long term without vendors involved (it is enough to take a look to other successful open source projects). We need to move from the current state and we have to do it quickly.

I prefer to look at problems with optimisms and take part on driving changes for a better future with a positive critic eye on reality. From the discussions at the CCWI last week there were three main points that took my attention: what license to adopt (this point was already addressed in this forum and I agree with @samhatchett), finding funds to the project (excellent point brought by Ivan Stoianov [Imperial College London] to the discussion) and the “technology to use”. Regarding this last point I can summarize as far as I know that there are developments done in Matlab, Phyton, Java, c/c++ and c#. There are also people in favor of using an object oriented approach (I defend this idea) and some other preferring to keep the approach used in the original EPANET code. The IT world has changed a lot since EPANET first appeared in 1993 until today. If we have to rewrite the code again would we do it the same way? In which language? What would be the best approach to bring the project to life? I would like to hear the opinion of the community about it.

1 Like

Thanks for sparking this conversation, @imontalvo - I’m glad that you are enthusiastic and optimistic! I could perhaps divide your comments into two categories: Structural and Procedural. Structural issues would include questions of who is involved, to what degree, how to interact and get more participation / interest / corporate cooperation. Procedural issues would essentially refer to the “road map” or product backlog.

I would argue that the road map is of more immediate importance, since that is going to draw the interest of people who have something to say about it, or have the hard-earned experience that this project needs. I’m very much looking forward to the discussion of both the ultimate goal (a modern, stable engine for hydraulic and water quality analysis) and the path that we take to get there.

@samhatchett I agree with the ultimate goal. The first step would be to have a first stable release with the few fixes and new functions we got so far. Is the language and OO questions are structural or procedural issues? These questions could be answered once we know what are the objectives and in my view this engine should be:

  1. non proprietary
  2. stable
  3. super fast
  4. cross platform
  5. easy to use and with as many languages as possible
  6. easy to maintain and contribute to

I think I got the order too…

I´m totally agree about the importance of the roadmap and on defining first at least a preliminary idea of what we (as a community) would like to reach/have in our next version (@jjoseng made a very good point about it from the hydraulic point of view). I would suggest to make a separated backlog visible to everyone with those expectations/ideas/desires/needs and also a visible roadmap inicating what is decided to be done for the next release (from all what is already in the backlog). Then new potential contributors will have a better view of what they could do next.

Following the ideas of @eladsal I would add that the solution should also be:

  • testable
  • scalable (vertical and horizontal scalability)
  • easy to divide in different tasks for several developers
  • following the SOLID principles of object oriented design.

@imontalvo can you explain what you mean by ‘scalable’?

In my view object oriented is not an objective but a methodology which may, or may not, help us reach our objectives.

What I mean is the capacity of the new engine to scale for satisfying efficiently a high amount of water distribution system analysis. For example, in the context of water distribution system optimization using evolutionary algorithms, the evaluation of objective functions are done mainly sequentially because with the current EPANET toolkit it is not directly possible to evaluate different scenarios/solutions in parallel. I agree that with the current toolkit it is very efficient to evaluate just a single scenario/solution but when we are talking about >100K evaluations then it makes sense to use parallelization/distribution. In the other hand I think that having an engine able to work in a distributed environment and/or a cloud environment would be also a good point to consider, that’s why I was talking both about horizontal scalability (increase capacity by adding more “machines” into the pool of resources) and vertical scalability (increase capacity by adding CPUs/memory to a machine).

@imontalvo Can we put it in EPANET’s terms?

I think that to scale horizontally means that the engine can solve the same large network using more than one computer node while to scale vertically means, as you wrote, the ability of the engine to use more memory\CPU on the same computer node, right?

For “large network” running steady state analysis (or extended period simulation the way EPANET is doing today) it is not probably worth to split the calculation through different computers (how much would we gain in performance for the efforts?). Trying to run a other type of analysis (transient analysis/water hammer) could be something different but it is not included in what EPANET is doing today and anyway I think it would be enough in those cases to use several CPU instead of using a system distributed in several computers. Today there are many people working in online water distribution system analysis (including online calibration, event detection, source identification of contamination events, operation optimization…) in my opinion, a distributed platform working with a software as a service concept could be a good approach to support those calculations. In this case is where I think that a distributed environment (several computers managing data and running different scenarios/simulations/tasks) is the good way to go (and also as I mentioned before, in the “offline” cases [EA optimization] where a large amount of simulations are required). Note that the scope here would be totally different than a user running a single model in a computer.

I’m not saying that we need the engine to be able to split the calculations of a single network on a number of computers I’m just trying to “translate” your scalability requirement into something I can understand. Do you think the EPANET engine should include a “distributed platform working with a software as a service”? What should the engine have to support this?

Perhaps what we are saying is that we want an architecture which allows the epanet engine to run just as well in a standalone desktop or mobile application as in a scaled-up distributed system. I completely agree, but I also should say that we need to set manageable goals. I’m not really interested in committing to this ultimate vision of delivering a completely open-source SAAS platform for optimization. I would however be interested in delivering an epanet that is fit for the job.

But the starting point is the same: a modern, stable simulation engine. How this engine becomes incorporated in desktop applications or server farms (or even the iPad Pro for that matter) will undoubtedly influence the development path.

The convenience of a software as a service approach depend on what is the goal at the end, what is expected from the new version, what are the needs of water utilities. Lets set the goals first and it will be possible to answer the question properly. In positive case, it would be better to rewrite the engine again in a platform/language/technology supporting the requirements

In my mind, considering what have been the strengths of EPANET in the past should help guide its future development. It seems to me that EPANET has been popular because:

  • It has a free version with at least basic GUI functionality
  • The engine is fast
  • It is pretty stable
  • Software vendors can build a profitable business around adding tools and functionality to the basic engine, thus adding momentum and popularity to the platform.

I have only been in the modeling world for about 10 years, so there may be other factors that I am not thinking of from a historical perspective.

If it is decided to rewrite the software in another language, there are two facets, GUI and engine. Maybe a different approach is needed for each facet. ??

Exactly.

Yes, some of the goals are here and a few more here (but not the OO part in my view).

Good code does not have to be OO but OO certainly could help a lot when writing software that could turn large with the time. OO is not going to be a guarantee of succeed and it wouldnt be directly applicable to the existing code. In my opinion OO is something to consider in case good part of the engine is rewritten, otherwise it would probable be better to make the changes/development in top of the existing code. If I have to put my vote I would do an OO approach. I´m also totally agree with @markwilson about keeping the strengths of the EPANET in the past. Lets write a roadmap and put hands on work :smile:

I did not see a date for the first Steering Committee meeting. Is it scheduled?

For me, a good starting point of the roadmap would be to expose all network data to the user (get and set).

I like that @eladsal, lets write it down and put in a single place with all propositions regarding roadmap (and create some backlog too). @samhatchett, do you think it would be good to create it here in this web or to use github wiki or something else? I think a clear view of the roadmap as you mentioned before is really important.

No idea about the steering committee meetings, what about the development commitee? @eladsal? @samhatchett? any meeting scheduled?

the steering committee has met a couple of times so far (this is composed of the Officers from the WDSA standing committee of the EWRI/ASCE). i would not say that things are organized enough to have formal minutes from the meetings, but in short they are generating ideas for how to gather interest and participation. the recent panel discussion at CCWI came out of those meetings, so i would say they are doing a great job getting the word out. if you have ideas or suggestions, @dboccelli is the WDSA chair and a good person to contact.

as for the development committee, we have not had formal meetings, but interact frequently on this forum and on the github project site. these discussions that take place here are the real activity and the future of the OSS development effort. i see the dev-com’s job as promoting a structure and a process to enable everyone to participate in a healthy, orderly, and productive manner.

as for the roadmap, i’m not sure what format would be best to have a central itemized list with some ability to poll and comment. any suggestions?

How about working towards a workshop at the WDSA conference (https://wdsa2016.uniandes.edu.co) in Cartagena in July 2016 for finalising the roadmap to a new version of EPANET? We can webcast the proceedings and take comments via email/Skype for those that can’t attend.