Visualization and data mining tools applied to Algal biomass prediction in Illinois streams.
Large amounts of hydrologic, geographic, meteorological, water quality, soil type, land-use and many other types of data are available for water scientists and practitioners. Those abundant and often multidimensional datasets could be analyzed using sophisticated and complex modeling techniquesthat might require powerful computers to handle the computation
Various data mining tools help us better understand the data and methods, better interpret the results, and more accurately predict the future values of hydrologic variables, and thus make better water planning and management decisions.
The Image Spatial Data Analysis (ISDA) group at the National Center for Supercomputing Applications (NCSA) has been working together with the Illinois State Water Surver (ISWS) on a set of visualization and data mining tools. These are being developed for water resources research and applications.
The tools are applied to predict Algal biomass using nutrients and other explanatory variables.
Several methods for extracting variables from remote sensing data, clustering variables, and modeling relationships between variables with data-driven models, such as Naive Bayes or decision tree, were explored with the observed nutrients, algal biomass and other data. Furthermore, in order to solve the algal biomass prediction problem, several heterogeneous software tools had to be executed and linked together with various data sets. Thus, we have also introduced a software process management technology for performing algal biomass prediction with heterogeneous visualization and data mining software tools.
The problem of algal biomass prediction in Illinois streams lies in explaining the variability in algal biomass measured as chlorophyl a, based on nutrients (total or dissolved nitrogen, and total or dissolved phosphorus) and other variables (water velocity, canopy cover along the streambank, stream width/depth, etc.). Algae are either the direct or indirect cause of most problems related to nutrient enrichment.
Our study uses a dataset for the entire state of Illinois, consisting of numerous nutrients, chlorophyll a (green) data and other variables. Although these long-term ambient datasets are incomplete and do not necessarily contain storm-event data, they represent the best currently available datasets for testing the results of this study in Illinois.
Technical
The algal biomass prediction problem can be described as a sequence of processing steps to establish data-driven models (relationships) between input variables and algal biomass growth, and to provide computer-assisted interpretation of the models supported by visualization for water scientists and practitioners. The flow of processing steps is illustrated above. The overarching goals of the analysis are (a) to predict algal biomass from multiple measurements gathered using water gauges, remote sensors and other instruments with unsupervised learning and supervised modeling techniques and (b) to improve users understanding of algal biomass spatial and temporal variability.
- Peter Bajcsy
Research group ISDA, National Center for Supercomputing Applications, UIUC - Momcilo Markus
Illinois State Water Survey - Rob Kooper
ISDA, National Center for Supercomputing Applications, UIUC - Luigi Marini
ISDA, National Center for Supercomputing Applications, UIUC - David Clutter
ISDA, National Center for Supercomputing Applications, UIUC - Qi Li
ISDA, Computer Science Department, UIUC
We acknowledge support of the NCSA Faculty Fellow Program of this work.