1. 程式人生 > >Develop and Extract Value from Open Data

Develop and Extract Value from Open Data

Open data is fostering new opportunities for innovation, both in terms of entrepreneurship and public service. AWS embraces open data, providing the tools to develop and extract value in a single place. This includes direct hosting of public datasets at no cost on Amazon Simple Storage Service (Amazon S3)

.

In this blog post, we explore a use case for government organizations using the OpenStreetMap (OSM) dataset, a free, editable map of the world, created and maintained by volunteers and available for use with an open license. Using open source tools, we generate and render custom maps for a government’s digital property. By leveraging Amazon S3, Amazon EC2, Amazon ECS, and multi-tiered architectures, map tiles server can run in an efficient and highly available infrastructure.

Generate and render map tiles with OpenStreetMap

Governments often provide geographic information to users on their webpages. A country’s ministry of foreign affairs may display a map with the location and contact information of each of its global embassies. In other cases, cities may use it to provide directions and other details about tourist attractions.

OSM can provide agencies with geographic services with no data licensing costs, and with full control over how they use the data. This blog post will explain how to use Amazon S3 to access OSM data, how to use an EC2 instance to generate and compute OSM map tiles, and how to build a multi-tier, highly available architecture to serve map content.

The process to generate the tiles from OSM requires a number of open source tools. In addition to a PostgreSQL database, this includes:

  • PostGIS Extensions: a spatial database extender for PostgreSQL object-relational database
  • osm2pgsql: a tool that converts OSM data to postGIS-enabled PostgreSQL databases
  • mapnik: a map-rendering toolkit that includes bindings in Node, Python, and C++
  • renderd: a rendering daemon used with mapnik and OSM
  • mod_tile: an Apache module that renders and serves map tiles
  • OpenLayers: a mapping library that includes markers and tiled layers

Using Docker containers to render OpenStreetMap tiles

OSM provides an overview on how to install, configure, and use these tools from scratch to get the tiles rendered. For the purpose of this post, we use a pre-built container image with the set of tools needed to create and generate the map tiles from the OSM data.

For this example, let’s use a container image from the GitHub community, built by the National Center for Atmospheric Research Earth Observing Laboratory (NCAR EOL) that is based on the openstreetmap-docker-files image. The process is outlined below:

To begin, we need to start a Linux EC2 instance, which is where the rendering job will take place. After the tiles are created, they can be moved to S3, and the EC2 instance can be turned off until the next time we need to generate tiles.

Since the tile-rendering job is CPU-intensive, we recommend using an instance from the compute optimized C family. We also need an EBS volume attached to the instance with enough capacity for the raw PBF OSM map data.

Estimating infrastructure requirements before rendering tiles

In this example, we will render OSM PBF data from Spain, obtained from Geofabrik, which provides OSM map extracts. While we’re focusing on a single country, S3 provides direct access to data on a planetary level, accessible with any S3-compatible tool and through the same rendering process.

We need around 100Gb of storage for Spain, but If you are going to build the whole planet, consider up to 400Gb. As shown in the image below, an EBS volume of 500GiB is more than enough space to store the raw PBF data and the PostgreSQL database. Once created in the console, you need to attach it to the instance.

With our instance ready and the EBS volume attached, there are two steps required before we can begin the actual rendering job:

  1. Install Docker in our Amazon Linux instance.
  2. Move the PBF file with the OSM data to the newly created EBS volume (you can access the AWS CLI S3 tool through the Geofabrik provided link or the S3 instructions at the OpenStreetMap on AWS information site). The container image will run using Docker Compose, so we need to install it on our instance.

Time to render

Now, we are ready to start the rendering. Initiate the PostgreSQL database with the following command:

docker-compose run osm initdb

After the database is ready, we can start to import the PBF data into the database with the following command:

docker-compose run osm import

Below is a partial summary of the imported Spain PBF file:

Processing: Node(75938k 303.8k/s) Way(6063k 13.21k/s) Relation(161710 388.73/s)  parse time: 1125s
All indexes on  planet_osm_point created  in 180s
Completed planet_osm_point
Creating osm_id index on  planet_osm_polygon
Creating indexes on  planet_osm_polygon finished
All indexes on  planet_osm_polygon created  in 208s
Completed planet_osm_polygon
Creating osm_id index on  planet_osm_line
Creating indexes on  planet_osm_line finished
All indexes on  planet_osm_line created  in 257s
Completed planet_osm_line

The process may take longer or shorter depending on the instance type selected and the size of the PBF file. Our example takes around seven hours. When all the data has been imported, we are ready to render the tiles. The Docker Compose command for the task is:

docker-compose run osm render

Note: We need to take into account the NCAR EOL warning at the container-image wiki at this stage. Ensure /var/lib/mod_tile directory exists and Docker’s containers www-data user has write permissions before rendering. This can be done by accessing the Docker image:

$ docker-compose run osm bash
docker # mkdir -p /var/lib/mod_tile/default
docker # chown www-data /var/lib/mod_tile/default

Assess the output

Total for all tiles rendered
renderd[36]: DEBUG: Connection 0, fd 8 closed, now 0 left
Meta tiles rendered: Rendered 349528 tiles in 26755.87 seconds (9.78 tiles/s)
Total tiles rendered: Rendered 22369792 tiles in 26755.87 seconds (625.63 tiles/s)
Total tiles handled: Rendered 349528 tiles in 26755.87 seconds (9.78 tiles/s)

Serve the map tiles

At this point, tiles are rendered and we are ready to start serving the map tiles. The container image also contains an Apache module with mod_tile server. Let’s bring it up with:

docker-compose up osm

We can now connect to the instance at port 8000 and check if the map displays correctly. Remember to allow connectivity on the port 8000 by opening the port in the instance’s security group. The map should display correctly, as shown below.


We can zoom in and out of the map at the maximum level we specified in the Docker Compose YAML file. Let’s zoom in on Spain’s capital, Madrid.

Specify points of interest

As a final step, we are going to set markers on the map to specify points of interest. For this task, OpenLayers is the tool of choice. With OpenLayers, we can add div elements into webpages containing maps with markers and related information.

The best way to show this at work is to show a sample piece of code below as a HTML/JS file. It is important to note the format of the directive OpenLayers.Layers.OSM:

<code class="lang-apacheconf">var newL = new OpenLayers.Layer.OSM("Default", "/osm_tiles/${z}/${x}/${y}.png", {numZoomLevels: 12});</code>

OpenLayers connects to the tile server, in this case seated on the localhost path /osm_tiles and with the tile pattern ${z}/${x}/${y}.png. You should change the path to the tile’s URL. For more information, check the OpenLayers library for Layer.OSM.

<html>
  <head>
    <title>Dan OSM in AWS Blog</title>
    <style type="text/css">
      html, body, #basicMap {
          width: 100%;
          height: 100%;
          margin: 0;
      }
    </style>
    <script src="http://www.openlayers.org/api/OpenLayers.js"></script>
    <script>
      function init() {
           var options = {
                projection: new OpenLayers.Projection("EPSG:900913"),
                displayProjection: new OpenLayers.Projection("EPSG:4326"),
                units: "m",
                maxResolution: 156543.0339,
                maxExtent: new OpenLayers.Bounds(-20037508.34, -20037508.34,
                                                 20037508.34, 20037508.34),
                numZoomLevels: 12,
                controls: [
                        new OpenLayers.Control.Navigation(),
                        new OpenLayers.Control.PanZoomBar(),
                        new OpenLayers.Control.Permalink(),
                        new OpenLayers.Control.ScaleLine(),
                        new OpenLayers.Control.MousePosition(),
                        new OpenLayers.Control.KeyboardDefaults()

                  ]
            };
        map = new OpenLayers.Map("basicMap",options);
        var newL = new OpenLayers.Layer.OSM("Default", "/osm_tiles/${z}/${x}/${y}.png", {numZoomLevels: 19});
        map.addLayer(newL);
        map.zoomIn();
var lonLat = new OpenLayers.LonLat( -3.6896 , 40.4531 )
              .transform(
            new OpenLayers.Projection("EPSG:4326"), // transform from WGS 1984
            map.getProjectionObject() // to Spherical Mercator Projection
          );
        var zoom=10;
        var markers = new OpenLayers.Layer.Markers( "Markers" );
        map.addLayer(markers);
        markers.addMarker(new OpenLayers.Marker(lonLat));
        map.setCenter (lonLat, zoom);

      }
    </script>
  </head>
  <body onload="init();">
    <div id="basicMap"></div>
  </body>
</html>

The marked point on the map is established in the variable lonLat, and the new marker layer is built along with it. A transformation is needed from the spatial reference system, WGS 1984 geographic coordinates, to the Web or Mecartor projection coordinates reference system, which is the usual standard for web mapping apps.

In the example, we provided the longitude and latitude of one of the most representative business districts in Madrid. The figure below is the result of the code and you can see the marker showing up with the map.

With the tiles generated and rendered, we are set to start serving maps by deploying them to Apache servers using mod_tile. It is important to consider the right architectures to provide geographic services, or embed them into webpages across digital properties.

Choose the right architecture

AWS helps provide a highly available and efficient multi-tier architecture to provide the mapping service inside an organization logic. All the tiles data can be transferred to Amazon S3 for persistent and durable storage.

Amazon Cloudfront, the AWS global content delivery network service that securely delivers data, videos, applications, and APIs to viewers with low latency and high transfer speeds, can help to make the user experience faster and smoother, while also being cost-effective.

The Apache mod_tile is capable of serving tiles stored in S3, allowing them to have a common shared storage for tiles, instead of deploying them individually on each Apache server.

With AWS, we can launch servers in EC2 instances leveraging AWS Availability Zones for increased availability of the service. A draft architecture is shown in the figure below.

We could also adopt a microservice-oriented architecture. This would entail customizing the container image that we have used for the rendering, where the tile serving is a microservice inside an application logic. Amazon Elastic Container Service (ECS) and Amazon Elastic Container Service for Kubernetes (EKS) are also services worth considering.

The potential of open data

Combining AWS and OSM, we have created a solution that can serve maps on government websites and can also be used as the basis for government organizations to create other map-based services for their citizens. OpenCycleMap is one such OSM-based service that displays bike routes around the world that cities could provide using the approach we’ve shown here.

This is just one example of what AWS can deliver with open data. With new public datasets increasingly becoming available on AWS, the opportunities for new, innovative public services are endless.

A guest post by Daniel Bernao, Solutions Architect, AWS

相關推薦

Develop and Extract Value from Open Data

Open data is fostering new opportunities for innovation, both in terms of entrepreneurship and public service. AWS embraces open data, providing t

Notes and technical questions from interviewing as a Data Scientist in 2018

Notes and technical questions from interviewing as a Data Scientist in 2018After almost three years at Jobr/Monster, I have decided to leave to pursue a di

Extracting Value from Data with Deep Learning

Extracting Value from Data with Deep LearningOne of the byproducts of our digitally transformed world is the accumulation of large quantities of data. Onli

Explore and get value out of your raw data: An Introduction to Splunk

Install Splunk EnterpriseLet’s start by installing Splunk Enterprise in your machine. Installing Splunk is quite straightforward and the setup package is a

Learnings from a Data Science Conference, Open Data Science Europe

Learnings from a Data Science Conference, Open Data Science EuropeLast week I attended Open Data Science Europe hosted at the Novotel, London West. This is

Power from wind: Open data on AWS

Data that describe processes in a spatial context are everywhere in our day-to-day lives and they dominate big data problems. Map data, for instan

使用windows server2012時FileZilla客戶端連接時報150 Opening data channel for directory listing of "/" 響應:425 Can't open data connection

nec 響應 但是 .cn 簡單 family 問題 用戶 中間 425 Can‘t open data connection 和 讀取目錄列表失敗 問題解決 這個問題主要是由於使用Passive Mode模式造成的,解決這個問題很簡單: 1、在ftp服務軟件中設置指定端口

[knowledge][dpdk] open data plane

data knowledge tap .org tps end arch http arc https://www.opendataplane.org/ https://en.wikipedia.org/wiki/OpenDataPlane odp vs dpdk:

jQuery移動開發 jQuery Mobile Develop and Design 中文pdf掃描版

表單 cms ron pre com sig 作用 upa word 《jQuery移動開發》主要介紹使用jQuery Mobile框架創建移動網站的技巧和方法。全書首先介紹jQuery Mobile框架的基礎知識,以及HTML5在其中的作用;接著介紹UI組件的創建,包括對

zabbix Get value from agent failed: cannot connect to [[10.2.72.132]:10050]: [113] No route to host

http CP 取數 -- OS all alt host zabb 描述:item主動模式可以獲取數據,被動模式不可以。zabbix server無法訪問agent服務器的10050端口 解決:開啟端口即可: redhat 7.x版本 firewall-cmd -

Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused

http 9.png emctl 服務器 無法 span gen 進程 運維 監控zabbix服務端這臺服務器,然後顯示Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Co

c#npoi 報錯Cannot get a numeric value from a text cell 的解決

異常 str etc long rim 否則 ring npoi col 一般是因為cell裏邊的值為數字導致,有時變成文本格式還是解決不了這個問題. 下邊的代碼是c# 改變設置cell類型的方法 是用這個參數 CellType.String Row.GetCel

numpy 創建ndarray(from existing data

創建 nump tee spa 數組 array nbsp 緩沖區 一維數組 1 numpy.array array(object[, dtype=None, copy=True, order=‘K‘, subok=False, ndmin=0]) 2 numpy.as

How to extract pcd from a rosbag? 如何從rosbag中提取pcd

disk div files org class ont int osb rac 4.1 bag_to_pcd Reads a bag file, saving all ROS point cloud messages on a specified topic as PCD

[Rust] Pass a JavaScript Function to WebAssembly and Invoke it from Rust

his scrip wrap pro mis document java env load In some cases it’s useful to be able to invoke a JavaScript function inside Rust. Thi

安卓:Could not read cache value from'C:\Users\Username\.gradle\daemon\1.12\registry.bin'

android studio在載入專案的時候報錯: Error:Could not read cache value from'C:\Users\Username\.gradle\daemon\1.12\registry.bin' 參考stack overflow上的一個解決方法,刪除

小程式學習之旅----open-data web-view 以及 canvas、map

<open-data type="userAvatarUrl"></open-data> <open-data type="userGender" lang="zh_CN"></open-data> <view></view>

JavaScript SUM and GROUP BY of JSON data

This is my first attempt at doing JavaScript with some JSON data objects and need some advice on the pr

How to SUM and GROUP BY of JSON data?

How to SUM and GROUP BY of JSON data? Source: StackOverflow.com Question Some server-side code actually generates a JSON formatted stri

解決FileZilla_Server:425 Can't open data connection

  在阿里云云伺服器 windows server 2012 上安裝FileZilla Server時出現425 Can't open data connection客戶端無法獲取目錄列表的問題,下面就是解決這個問題的方法   在伺服器上安裝FileZilla Ser