My Dad Should Be a Photographer

My dad captured the sun as it was setting over Waldo Lake in Oregon. It’s the second deepest lake in Oregon, right after Crater Lake. Ten square miles of pure clean water. On a clear day, you could…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Geospatial Natural Language Processing

Well, how’d you do it?

Throw the complexities of geospatial analysis into that and you end up with a lengthy blog post about an amazingly interesting area. Let’s get started.

There are numerous challenges that both traditional and modern NLP (Natural Language Processing) systems have to deal with. Here’s a sampling of some of the most prominent ones:

Building dependency trees: How are the different parts of speech in a sentence related?

Word-sense disambiguation:

“I told my uncle to check if we had enough Tuna and Bass before we left the bank”

Would this sentence make any less sense if Bass referred to the sound attribute and the bank referred to a financial institution? Not grammatically, but most people who read this will realize this is likely referring to the type of fish and a riverbank. Many words have multiple meanings, and deciding between them is hard.

Co-reference resolution:

“He told me we were a bit short on Bass but they’d been less than tasty this season anyway”

The “He” is referring to the uncle and the “they” is referring to the fish. In general, it is not obvious who the pronoun is referring to, or the subject may be much earlier in the text (such as many speakers talking in sequence, with the last one addressing the group as “we”).

Entity extraction:

“I’d venture to say my Apple earbuds would survive a dip in the river”

We realize here that the earbuds are made by a corporate entity, Apple, and not literally made from apples (although this depends on knowing more about the speaker).

In some ways, you can say almost all unstructured data is spatial. If there are entities in your data, those entities exist in one or more locations. To make things easier to follow through the article, we will deal with 3 special cases of geospatial information being embedded into unstructured data — call these the “extremely lucky,” “just plain lucky,” and “neither lucky nor unlucky” case.

If we’re extremely lucky, locations are given coordinates.

If we’re just plain lucky, locations are given as proper place names or addresses.

If we’re not lucky nor unlucky, those locations are given relative to other locations (“The motel down the street from the town hall”). This is pretty common in informal conversations, and we’ll talk about this difficult case in a bit.

These traditional models have had a different set of problems. Grammatical rule-based systems required months of work by experienced computational linguists. On the other hand, statistical models required large amounts of manually annotated training data. Both types of NER systems were still brittle and needed huge efforts in tuning the system to perform well in a new domain.

If your .JSON file was correctly loaded you should see the following DataFrame output:

a table of training dynamics:

and a preview of the output once training is complete:

Once we’re satisfied with the model’s performance on our training and validation data, we can throw it out into the world and test it against data that wasn’t part of its training. To inference against new text, we can use the following command:

Hotspot analysis for the crime reports

We’ve looked at how to build a slick workflow for training a deep learning model to extract entities, write the results to file, and automatically read these files and display analytics on a web map. This was all done using standard off-the-shelf packages. To scale this we’ll need a few more moving parts.

Yes, global awareness.

Global situational awareness dashboard, map symbology showing clusters of negative sentiment

Firstly, here’s a picture of the pipeline:

Let’s start with the first three steps. When a HTTP request is received (when one of our subscribed RSS feeds is updated), we process the request to a JSON format that look like this:

Finally, we combine the article with the attributes attached from the Cognitive Services API and the NetOwl API and create a streaming output that can be fed to GeoEvent (more on this below). The final output looks like this:

Building entity link charts from news articles in ArcGIS Pro

Let’s notch up the difficulty a bit more.

Imagine a government call center where citizens would call in to report all kinds of non-emergency situations — anything from potholes to having their office overrun by cats (a story for another time). However, imagine these citizens call in with only approximate or colloquial descriptions of the locations they’re referring to, something like “down the street from the cemetery,” “two blocks down from the central bank,” or “between the embassy and park.” Note that this is basically the “not lucky nor unlucky” scenario we described above and the central focus of a project we got from the government of Abu Dhabi.

The operators at this call center used their own knowledge of the city or had to reach out to multiple sources to help resolve the caller’s described location, and with high call volumes there was just sometimes not enough time to figure out the specifics and requests had to go to a backlog and be reached back out to. They needed a way to get an approximate grasp of the locations being referred to in the call in near real-time.

Firstly, we built a multi-scale grid over the city:

then, we associated with each grid cell a specific probability (1/#cells, to start) of it being the location of interest. We then built a list of different types of “evidence” — pertaining to location — that we’d use to update each grid cell’s probability of being the location of interest. This evidence was separated into several sub-types, such as address evidence (an exact street address), POI evidence (such as a central bank, bridge, port etc.), directional evidence (N/S/E/W), distance evidence, street evidence, and several others. A mention of each of these types of evidence would prompt a geographic search against related features (such as searching for the polyline feature designating the mentioned street) and a corresponding probability update on the grid cells.

The grid cells would be colored by their current probability so the operator would be able to, in near real-time, see parts of the maps light up as likely candidates for the location being mentioned on the phone. In this way, the operator could help form responses directly for the caller during the call.

Thanks for reading.

Add a comment

Related posts:

6 Use Cases for Public View Documents in DashboardFox

One thing that makes DashboardFox different than other business intelligence software is that with every concurrent session license you buy, you get five public view document licenses. These reports…

Indesk Hijau Biru Sebagai Terobosan Dalam Pemenuhan Ruang Terbuka Hijau

Bagaimana cara indeks hijau biru Indonesia dapat membantu memenuhi kebutuhan rph yang diamanahkan oleh undang-undang 26 2007 di mana 10% rph di kota atau kawasan perkotaan itu ada 20% publik dan 10%…

Knowing the Most obvious opportunity to Buy A Home

Numerous people wonder what the most fitting open door to buy another house is. They consider buying when they see that the expenses of homes will start to go higher shortly. Buying Matthews homes at…