There are a number of data providers that provide data files broken down into different geographical areas. In previous posts I have outlined how we can get these all into Postgis. But once they are in postgis (or any other database) you may wish to get these separate tables into one single global table. Clearly a union query will do this, however it can be time consuming to write the union query out as it simply has so many tables in it.
I used the code in the link to scan a directory and get all the filenames (in this case shape files of the UK road network) into a table that I called UKRoadLinks which had two fields PKID (primary long integer autonumber) and Filen text field where Filen were the filenames.
I then wrote the following function to write a text file that on completion will contain an sql union of all the tables listed in your recordset – I then copied and pasted this into Postgis database within which I had already imported all the sub tables to union the tables into a single copy. Alter the recordset source if for instance if you wish to use only a subset. The nice thing about this is if you have hundreds of tables to amalgamate there should be less likelyhood of you accidentally missing or misspelling table names.
Public Function createunionsqllinks()
Dim rst As DAO.Recordset
Set rst = CurrentDb.OpenRecordset("UKRoadLinks")
Dim fs, TextFile
Set fs = CreateObject("Scripting.FileSystemObject")
Set TextFile = fs.CreateTextFile("c:\data\sqlmerge.txt", True)
TextFile.WriteLine ("CREATE TABLE sqltomergetables AS ")
Do Until rst.EOF = True
TextFile.WriteLine (Chr$(40) & "select * from " & rst!Filen & Chr$(41) & " UNION ")
Demonstration environment and programs
Postgres Version : 11.2
QGIS desktop version : 3.4.4
My working through of a process to export a single database (structure and data) from a Postgres Instance, the database has PostGIS and pgrouting extensions enabled, followed by importing into in this example the same instance but in principle could be a different instance.
Access the command prompt (RUN AS ADMINISTRATOR)
PLEASE NOTE run the command prompt as administrator or you will get frequently get an ACCESS DENIED message after using pg dump command.
Navigate to the directory of the PostgresVersion from which you wish to export the database. This will typically be the bin subdirectory of the version of your postgres ( here 11 ). You can ensure that pg_dump.exe is here if you do a dir on the directory to reveal alternatively you could reference the full path to pgdump and then pass the parameters to it subsequently.
Next place in the parameters of the database what database you wish to export along with the name that you want to call the exported file and then hit return.
Pressing return depending on your security you should be asked for your password.
Once this is done it goes through a process of recreating the structure of the database then importing all the data
For me the first lines look like this
and the last look like this
Now looking at the instance as a whole we can see the imported database
and here I am displaying geographical information through QGIS to get an idea of the data and ensure that it appears to be all correct.
There are quite a lot of tutorials online on how to do this but most seem to skip over some details - I've tried to be as accurate as possible but depending on you setup there may be differences. Nonetheless this is an extremely important task to perform so worth practicing to get right.
This places all tables and fields into an excel file on a single worksheet as a single table.
Public Function TableDef()
Dim def As TableDef
Dim wb As Object
Dim xL As Object
Dim lngRow As Long
Dim f As Field
Set xL = CreateObject("Excel.Application")
xL.Visible = True
Set wb = xL.workbooks.Add
lngRow = 2
For Each def In CurrentDb.TableDefs
For Each f In def.Fields
.Range("A" & lngRow).Value = def.Name
.Range("B" & lngRow).Value = f.Name
.Range("C" & lngRow).Value = f.Type
.Range("D" & lngRow).Value = f.Size
.Range("E" & lngRow).Value = f.Required
lngRow = lngRow + 1
A function that will loop through an access database and export all tables to csv and xls.
Useful for subsequent import through QGIS into Postgres.
Public Function ExportAll()
Dim obj As AccessObject, dbs As Object
Dim strFolder As String
strFolder = "c:\"
Set dbs = Application.CurrentData
For Each obj In dbs.AllTables
If Left(obj.Name, 4) <> "MSys" Then
DoCmd.TransferText acExportDelim, , obj.Name, strFolder & obj.Name & ".csv", True
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel9, obj.Name, strFolder & obj.Name & ".xls", True
If using the dijkstra function with direction turned on it is important to identify the order in which the nodes of a vector line have been digitised. This is called the direction, dijkstra can use this with a reverse_cost attribute to handicap wrong movement along lines to such an extent that the correct path can be calculated around things like roundabouts.
Here is an example of the roundabout in Straiton in Edinburgh just North of the A720 bypass. While some of the lines have a correct anti clockwise orienation clearly some have been incorrectly digitised.
First we can see this by displaying the network in QGIS but using the styling to arrow the direction.
The function that can be used to reverse such inaccuracies if you can’t resort to buying a correct dataset try ST_REVERSE
RAISE NOTICE can provide the same function as Message Box in VBA ie you can use it to comment on the progress of a script. RAISE NOTICE is not supported by SQL so you can’t place it in scripts containing SQL they need to be in plpgsql scripts. This isn’t too much of a hassle as the way I am working at the moment I am calling the SQL anyway from plpgsql so I can place my message boxes in there.
No VBA Ok buttons.
CREATE OR REPLACE FUNCTION noticeexample() returns void as $$
RAISE NOTICE 'ONE FINE DAY IN THE MIDDLE OF THE NIGHT';
When creating a topology the geometry field cannot contain a Z coordinate.
OK but the Ordnance Survey Open Data highways layers containse a Z coordinate. Previously I had stripped this out using the latest version of QGIS which has a tick box in the front end that allows for import stripping of the z coordinate in the process. If you don’t have access to the latest QGIS version how can you strip out the z coordinates.
ALTER TABLE public.nuroadlink ADD COLUMN geom2(multilinestring,27700);
UPDATE public.nuroadlink SET geom2 = ST_FORCE2D(public.nuroadlink.geom);
ALTER TABLE public.nuroadlink drop column geom;
ALTER TABLE public.nuroadlink RENAME COLUMN geom2 TO geom;
I had assumed after I had created a working SQL Script I would just be able to wrap the whole thing easily into a function and then bang it would be off to the races.
My script really needed to be run in order and for some as yet undefined reason I was getting particular errors where a table would be created and then a following query would add or alter that table. It looked like the second query was trying to adapt the table prior to its creation with an inevitable error.
I managed to get it working by making each SQL Query a function and then scripting the functions consecutively in a separate function using the PERFORM instruction.
I incorporate into this the check_function_bodies switch which just allows the creation of sql referring to objects that may not be in existence yet.
SET LOCAL check_function_bodies TO FALSE;
CREATE OR REPLACE FUNCTION query01() returns void as $$
CREATE TABLE t001start
pkid serial primary key,
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION query02() returns void as $$
CREATE TABLE t002end
pkid serial primary key,
$$ LANGUAGE SQL;
And then subsequently I create a function that runs the functions.
CREATE OR REPLACE FUNCTION runallthequeries()
returns text as
RETURN 'process end';
I was trying to write a script that would allow me to measure distances to schools and my original script gradually built up tables that were subsequently deleted. Worked fine in one big sql script but when I tried to convert this into a function so that it could be more easily stored with the database I kept on getting error messages stating that it was not possible to create sql that referred to objects that did not exist. Postgres validates functions and will at default prevent creation of functions containing SQL that refers to objects not yet in existence.
Postgres does not however save dependencies for code in the function body. So although once the function is created the tables and views can be dropped (and the function still exists) in default you need a set of tables in place with default settings before the function can be created. One workaround would be to create dummy tables and views in advance and later drop them but this if often clunky and awkward. Luckily this validation can be turned off.
SET LOCAL check_function_bodies TO FALSE;
CREATE or REPLACE FUNCTION examplefunction() Returns void AS $$
CREATE TABLE t001 (pkid serial primary key, field1 varchar(20));
$$ LANGUAGE sql;
This parameter is normally on. When set to off, it disables validation of the function body string during CREATE FUNCTION. Disabling validation avoids side effects of the validation process and avoids false positives due to problems such as forward references. Set this parameter to off before loading functions on behalf of other users; pg_dump does so automatically.
Now lets go to QGIS connect to the PostGIS instance add the tables and create some test data manually.
Here I have added fields in green with bold number labels and plots in brown with smaller number labelling. The numbers represent the pkid fields.
Now here I can quickly run a query to identify the plots that are in fields
And it correctly identifies that plot 1 is within the fields layer.
But what would be great in an application is to have some kind of junction table that individual master records could display their children on. For this we need a junction table that links between the field and plots table showing the pkids from each.
SELECT t00002plots.pkid as Plotspkid,t00001fields.pkid as Fieldspkid
Now I will move plot 2 into field 3 and rerun the above.
The layer now looks like
and running the former query we get.
Now its possible to either create a junction table to hold this information..
CREATE TABLE t00010fieldplotjunction AS
SELECT t00002plots.pkid as Plotspkid,t00001fields.pkid as Fieldspkid
or we can create a view that will constantly calculate this everytime it is seen
CREATE VIEW v001FieldPlotJunction AS
SELECT t00002plots.pkid as Plotspkid,t00001fields.pkid as Fieldspkid
Now if I add a few more plots and fields and then pull up the view we shall see that everything has been adjusted
and running the view we now get
In some circumstances this calculation may be expensive so we may wish to run and create a junction table overnight other times we may be happy to do it fully dynamically. Of course in a front end you could query and filter such that only one record was compared against the fields plot at anytime. Very useful nonetheless.
In 2017 I was involved in an important work project to transfer all the records in a legacy system that was being deprecated by the vendor into another maintained system. We were in some ways fortunate because both systems had been designed by a single company and they were encouraging us to transfer. We had delayed transfer for several years already but were aware that we now had to move. The vendor did have some tools in place , had staff dedicated to such transfers and were offering favorable consultancy rates. The amount of data was not horrendous in computing terms but they were far far beyond the remit of the ability to cope with any sort of manual data correction and the system was an absolute core system upon which several departments completely depended. These were systems that all departments are in from the moment they start the work day to the end. Generally its unusual if they are down for more than 5 minutes in a month, all work pretty much stops when they stop and in no circumstances could they be down for more than a day without special dispensation and coordination to indicate to manage customer expectations.
The whole project was a success although it was challenging. Here is an outline of the steps we took. As ever order here is important in most of the steps.
Inform managers of all involved sections and ensure they are on board – identify and ring fence budget
Appoint project manager on vendor and client side
draw together team to perform transformation.
Draft time table creation of how long it will take putting in place planning for tutorials on systems and consultancy.
Request managers to put forward staff on all sides willing to be involved
Identify any omissions in knowledge and start to identify how this can be remedied. Kick off and complete acquisition of said staff.
Meeting with lead staff to confirm buy in. Request buy in from staff including ring fencing of holidays etc.. to ensure key staff are available at required times.
Set up test systems that all individuals have access to and ensure that the old and new systems can be viewed simultaneously by individuals. Ensure that the domain specialists can identify processes that will be mirrored from the old system to the new system
Give DBAs or those that will be doing data transfer access to databases of source so that they can start thinking of how they can pull out information.
Training for all individuals concerned in new systems.
In new system start tasking individuals with how they are going to do the simple processes – eg register a record approve a record alter a record and get reports out. If possible allow new champions to start to define things like reports.
Start making up any new lookup fields compared with old lookups and also start tasking individuals with creation of reports and letter that will need to be done.
Start mapping the data from old system to new system – excel spreadsheets can be used for this that show the data going from the old system and what fields they are going to go into in the new system. Divide this task up between domain users – this step needs to be done after old and new systems are on domain users machines. As part of this the applications in question should expose if possible the table and field names of the source and target fields. With the systems we were involved in this was possible both for the old and new systems.
For each form on the two systems try to identify the below
Source table.field Target table.field
Also get them to map the lookup table values if direct transfer is not possible or if alias id are used in these lookups.
Give both mapping documents to the ETL people to allow them to start writing the queries. It is unlikely that there will be a straight transfer across from table to table. While it would be expected that field and table names will be completely different it will be expected that table structure will in certain places be different in this respect it would be good to have a really nice schema diagram of both source and target.
Allow data individuals to write scripts that can be run live against present initial system – if necessary doesn’t need to be live live could copy every night and then perform on 1 day old database backend – which is what we did. This means work can go on in old system and then at a touch of a button.
Encourage DBAs to be able to run these scripts every day to ensure that running them for go live is absolutely no issue. Our scripts only took about half an hour to run so this wasn’t an issue. I was personally involved in writing the SQL for those and I had systems in place to cross tab the amount coming into each new table so I could see new records and information from the old system trickling manually into the system and then being transferred.
Test data input into new system
Check test data input into new system with reference to domain users.
Confirm go live date ensure staff available for issues
Go live to production and start all new procedures ensure staff technical and domain key players on hand to make flexible solutions to things
Project review on going maintenance and improvement of new system
After suitable time turn off of old system if possible.
I wanted to be able to run thousands of queries or hundreds from Postgres like I can in MS Access this didn’t turn out to be too difficult.
Here’s something that works firstly lets create a new database
CREATE DATABASE sqlloopdb;
You will then need to connect to the database.
Next I will create 2 tables with; One table to be called t001sqltarget – this is the table we shall change with queries. One table called t002sqlrun – this will contain queries that we will run.
Please note the field names are important as well but I will let you study them in the code.
I then have 4 inserts that place valid SQL strings into the field sqltorun.
CREATE TABLE t001sqltarget (pkid serial primary key, fieldforupdate varchar(1));
CREATE TABLE t002sqlrun (pkid serial primary key, sqltorun varchar(1000));
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (1);');
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (2);');
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (3);');
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (4);');
First lets run the above and see what we have. Below you can see that I create the database then connect to it before opening the editor from which I run the above code I then take a look at the tables in the database and run a select to return all the records within the t001sqltarget table of which there are none.
Now lets run the following code and then take a look at t001sqltarget.
FOR stmt IN
SELECT sqltorun FROM t002sqlrun
And after running there are 4 lines in the table
Every time I run the Do code four more records will be added to this table. Any SQL could be included in t002sqlrun and this is a nice demonstration of what I had previously able to do in MS Access and is massively powerful. It could be used for instance to calculate multiple measurements.
Here we take much of the work covered in post 010 and take the parts and user st_union to merge into a single record and place it in a table created by transforming a view into a table
Firstly go to your psql line and ensure that you are logged in with a username that you wish to be the owner of the table. In my case general
Now same measurement as before but this time we shall make a view out of the measurements then load that into a new table before deleting the view leaving us with the table with a combined measurement.
CREATE VIEW v001firstmeasurement AS SELECT seq, id1 AS node, id2 AS edge, cost, geom, agg
FROM pgr_dijkstra( 'SELECT id, source, target, st_length(geom) as cost FROM public.t01roadnetwork', 15883, 10967, false, false ) as di
JOIN public.t01roadnetwork pt ON di.id2 = pt.id ;
CREATE TABLE t003 as select sum(cost), st_union(geom) from v001firstmeasurement;
DROP VIEW v001firstmeasurement;
It is important in notepad to remove the blank spaces in the editor this looks as follows.
We then should then get some kind of confirmation that the view and table are created before the view is then dropped again. There might be a more efficient way of doing this but this was my first experiment.
And we can go back to QGIS 3.4 and display the now single line in our project.
Complete with now accurate measurement.
It should be noted that if you were wanting to do multiple line measurements you would need to step out of the create statement and use an insert statement for all subsequent insertions as follows.
insert into t003(sum,st_union) select sum(cost),st_union(geom) from v001firstmeasurement;
This would allow you to do multiple measurments.
I haven’t added up the measurement but it looks about right.
I had been using the 2010 WordPress standard theme as the basis for Round Up the Usual Suspects but decided it was time to upgrade. I decided to go for the 2016 WordPress standard theme as it is so well tested and with a relatively large back catalogue my primary concern was that I could port everything forward as easily as possible. I will be working on making it as user friendly as possible.
Objective here is to write a series of queries that can be used to measure the shortest distance between selected paired locations on a network such that the geometry of the routes can be calculated and displayed on a map.
For this particular tutorial you will need – QGIS 3 or higher and a version of Postgres I am using version 11.0 here (I have upgraded since my former posts). I believe this tutorial will work with previous versions but if you are following along now might be a good time to upgrade.
QGIS 3.4 or higher – needed as the Ordnance Survey road network geometry contains a z coordinate which will prevent the creation of the required geometry for measurement. QGIS 3 introduced the ability to save geometry excluding z coordinate. If you have a network without z coordinates you should not require this.
So let us first get the data. Here you tick the option in the top right hand corner – scroll to the bottom and submit your request after which you will be asked a few basic questions along with email address you wish the download to be sent to after a few minutes you should be sent the download link through your email – follow the instructions and you should be able to get the information
The information you are downloading is a block framework for the whole of the uk. When you unzip the download into a folder you will see multiple files. We will be using a section of the national dataset relating to Edinburgh – NT. Choose the block or selection that you are interested in. More blocks may take more time however.
Open QGIS Create a new project : eg EdinburghRouting.qgz Load in your chosen network block : eg NT_RoadLink.shp
Select the layer you just loaded in : eg NT_RoadLink.shp
and navigate to the following in the menu settings Layer / Save As
Fill out the Save Vector Layer as … dialog box IMPORTANT – ensure within the Geometry section Geometry type is set to LineString Include z-dimension is unticked
Give the new file a name : eg ntosroutingnetwork.shp
Within the layer dialog of QGIS your new layer should appear you can now remove the for NT_RoadLink shape file from the project
Next go to your version of PostgreSQL and using a superuser account create a new database : eg edinburghrouting
I would suggest you use lower casing as well
As a superuser ensure you add the postgis and pgrouting extensions.
Next I set up the following connection between the QGIS project and PostgreSQL
Personal tastes may vary but I like like to select Also list tables with no geometry Allow saving/loading QGIS projects in the database
OK the selection and you should now have a connection to the new database you just created.
QGIS has an excellent dbmanager window which we will use to load our new shape file which excludes the z layer into the new database we created in PostgreSQL
Ensuring that you have a connection to your localpostgis database hit the
Here I load the information into a new table t01roadnetwork
On pressing OK there will be delay after which if things go well you will receive the following message.
As ever it is good to check that things appear to be going well. Add the layer to your project and determine approximately whether import was successful.
Next back in psql command line and in an editor we are going to run 4 queries The first 2 add columns that are required in the shortest distance algorithm we shall use, the third will allow anyone to write an aggregation function to see the total cost of the route and the last creates a topology for the road network.
alter table public.t01roadnetwork add column source integer;
alter table public.t01roadnetwork add column target integer;
alter table public.t01roadnetwork add column agg smallint default 1;
select pgr_createTopology('public.t01roadnetwork', 0.0001, 'geom', 'id');
If things go correctly you should see the database engine start to create the topology and what I see is it gradually stepping through the creation process.
and on completion you should have something like the following:
A new table has been added to the edinburghrouting database and next step is to display the network and its vertices. In QGIS.
In QGIS we should see something like
The next thing that I like to do is to label the nodes so that for quick identification.
And look to the t01roadnetwork table and see if the columns are clear and present.
We are now ready to make a measurement. Here I choose the nodes 15883 and 10967
SELECT seq, id1 AS node, id2 AS edge, cost, geom , agg
'SELECT id, source, target, st_length(geom) as cost FROM public.t01roadnetwork',
15883, 10967, false, false
) as di
JOIN public.t01roadnetwork pt
ON di.id2 = pt.id ;
Now we can load this as a new layer and then improve the symbology
Doing this we get.
It should be noted that the line you see is a collection of lines. In my next post I will go through and indicate how we can amalgamate that into a single line for storage in a table.
Congratulations if you have got this far you should be able to measure the shortest distance between any two points on a valid network by altering the numbers.
Note: As of PostgreSQL 9.1, most procedural languages have been made into “extensions”, and should therefore be installed with CREATE EXTENSION not CREATE LANGUAGE. Direct use of CREATE LANGUAGE should now be confined to extension installation scripts. If you have a “bare” language in your database, perhaps as a result of an upgrade, you can convert it to an extension using CREATE EXTENSION langname FROM unpackage
As per the nature of recursion a function is a variable is a function.
Previously in 005 and 006 we wrote functions that returned subsets of queries they were effectively dynamic queries where I entered a parameter that was used in a select query. This effectively meant that although the function was returning a variable this was a query of a select statement.
What if we wish just to return a single value as in say translate centigrade to fahrenheit or some other calcuation.
In such case you simply state the function should return a variable and you state the variable type.
CREATE FUNCTION add(integer,integer) RETURNS integer
AS 'SELECT $1 + $2;'
RETURNS NULL ON NULL INPUT;
There are a few interesting things here which should be born in mind
In this case the addition is performed in SQL
I have to specifically name the language of the calculation (SQL) which suggests that if you stated another language it might accept it!
You still need to select the function to run it. This indicates that postgres 9.5 doesn’t executve functions as per some environments (ms access for example) – I have read that version 11 changed this and allows you to execute or perform a function. Ms Access you don’t even need to write execute simply the name of the function with the integers.
Variables are refererred to as to their input position unlike VBA where you dimension the variable and give it a name. I am unclear at the moment the advantages of the former or later but it is interesting nonetheless. I first came across something similar with autohotkey.
In 005 when last we left our intrepid explorers we were wondering if having defined an inline table that contains the definition of the selection purely in the user defined function how do we see what that selection is as it might not be a presently defined object. SetOF references an object we can execute independently of the function Table() – does not.
Well apparently magically you can run the following.
Like all platforms it is possible to create bespoke functions in Postgres
For the following I assume;
1. Postgres 9.5 is installed with the server running (syntax should be the same for other versions)
2. A database called exampledb has been created
3. In this database there exists table called t001landparcels with some records and a field called PKID
4. You are logged into the exampledb with a username called general that has been granted CREATEDB role.
5. You are in psql
The following can be used to create a simple function
CREATE FUNCTION getrecords(int) RETURNS SETOF t001landparcels AS $$
SELECT * FROM t001landparcels WHERE pkid <= $1;
$$ LANGUAGE SQL;
If you have been careful and done this exactly the same with my initial assumptions it should return
This tells us that the functioned has been created this will now exist in the schema the table is as a permanent addition.
We can identify the function by either listing all the functions and scrolling through
or listing the individual function
This should return your newly created function
Now to run the function – unlike MS Access you can’t simply run the function you need to allocate it to a select statement.
SELECT * FROM getrecords(2);
This should return everything you are looking for.
Now you should be able to drop the function using the following SQL
DROP FUNCTION getrecords(int);
Note how you have to define the function with its parameter I have read (no idea whether its true) that in version 10 of postgress you can simply use
DROP FUNCTION getrecords;
Writing code in psql does require accuracy so getting things to work does usually involve some experimentation. I have removed much of this from my screenshots!
An alternative is as follows
CREATE FUNCTION getrecords(int) RETURNS TABLE (pkid integer, parcelname text) as $$
SELECT pkid, parcelname FROM t001landparcels WHERE pkid <=$1;
$$ Language SQL;
This appears to result in the same answer I am not clear what the difference is yet – note the result would have been the same if I had defined the table with the additon of a geometry column.
Note I dropped the old getrecords function before I created this one. Not sure what would have happened if I had tried to create one over the other.
I found this second method in stackoverflow when investigating functions with the following to me slightly mysterious quote
This is effectively the same as using SETOF tablename, but declares the table structure inline instead of referencing an existing object, so joins and such will still work.
Which sounds to me as important but I’m struggling at present to understand its meaning!