Just another Network site

Data Harvesting From News Stories in Near Real-Time

Data Harvesting From News Stories in Near Real-Time  #BigData #Analytics

  • Recently, a client asked us if we could pull details about car accidents from news stories for a quick proof-of-concept project.
  • Because we hadn’t tagged the entities (key words and information) they were looking for before, we grabbed a few random news articles mentioning car accidents and ran them through our entity extraction solution, Rosoka, to see if we could deliver what they were looking for.
  • Take the following two news stories for example:

    A person can easily scan documents like these and spot entity data without much trouble.

  • The project specifics are confidential, but we can say that the entity data needed is extremely time sensitive, so combining our depth in harvesting with Rosoka’s entity extraction were the perfect fit for this client’s needs.
  • Now the client can take that data and apply their own “secret sauce” to leverage a brand new way to process open source news data.

Our Global News Data Feed continuously harvests around 10,000 news sources from around the world. Entity-tagging shows promise in working with news stories.

@craigbrownphd: Data Harvesting From News Stories in Near Real-Time #BigData #Analytics

Recently, a client asked us if we could pull details about car accidents from news stories for a quick proof-of-concept project. They wanted to see if we could extract the names and ages of those involved, the location of the accident, and a time value for when the accident occurred.

Because we hadn’t tagged the entities (key words and information) they were looking for before, we grabbed a few random news articles mentioning car accidents and ran them through our entity extraction solution, Rosoka, to see if we could deliver what they were looking for. Learn what we were able to find and deliver for this client.

Carrying out this task presented some obstacles. Take the following two news stories for example:

A person can easily scan documents like these and spot entity data without much trouble.  Doing this for 10,000 documents per day, however, quickly eclipses a person’s scale.

For this project, there were two main issues to address:

Our Global News Data Feed is the perfect fit to address both of these issues.

With efficiency, the data feed continuously harvests around 10,000 news sources from around the world. By default, we extract over 15 unique entity types from every harvested story. For this project, we also set up new custom entities to tag.

After setting up a few new harvests configurations and performing entity-tagging, we found that the results were unique and impressive. The project specifics are confidential, but we can say that the entity data needed is extremely time sensitive, so combining our depth in harvesting with Rosoka’s entity extraction were the perfect fit for this client’s needs.

We had a base system up and running for our client within hours. Now the client can take that data and apply their own “secret sauce” to leverage a brand new way to process open source news data.

There will be more details on this solution as the project moves forward and commercializes.

Data Harvesting From News Stories in Near Real-Time

Comments are closed, but trackbacks and pingbacks are open.