Accern API
Authentication
curl "https://feed.accern.com/v3/alphas?token=TOKEN"
require 'uri'
require 'net/http'
url = URI("https://feed.accern.com/v3/alphas?token=TOKEN")
http = Net::HTTP.new(url.host, url.port)
request = Net::HTTP::Get.new(url)
response = http.request(request)
puts response.read_body
import requests
url = "https://feed.accern.com/v3/alphas?token=TOKEN"
req = requests.get(url)
text_response = req.text # read response as Text
print(text_response)
json_response = req.json() # read response as JSON
print(json_response)
if (!require("jsonlite")) install.packages("jsonlite")
url <- "https://feed.accern.com/v3/alphas?token=TOKEN"
response <- fromJSON(url)
print(response)
Make sure to replace TOKEN above
To authenticate provide your authentication token in the url. We provide the authentication token in your welcome email.
Feed
curl "https://feed.accern.com/v3/alphas?token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas?token=TOKEN")
url = "https://feed.accern.com/v3/alphas?token=TOKEN"
url <- "https://feed.accern.com/v3/alphas?token=TOKEN"
The above url returns the most recent 100 documents.
[
{
"id": 1774184,
"article_id": {
"$oid": "589b44a569fe9f7f77024f4a"
},
"article_sentiment": 0.098,
"article_traffic": null,
"article_type": "blog",
"article_url": "http://feedproxy.google.com/~r/RedmondPie/~3/mF7K04DF1y4/",
"author_id": null,
"correlations": null,
"entities": [
{
"name": "Apple Inc.",
"type": "Public",
"index": "S&P 500, Russell 1000, Russell 3000, Wilshire 5000, BARRON'S 400, NASDAQ 100",
"region": "North America",
"sector": "Technology",
"ticker": "AAPL",
"country": "United States",
"exchange": "NASDAQ",
"industry": "Computer Manufacturing",
"entity_id": "EQ0010169500001000",
"global_id": "BBG000B9XRY4",
"competitors": [
"GOOG",
"HPQ"
]
},
{
"name": "Amazon.com, Inc.",
"type": "Public",
"index": "S&P 500, Russell 1000, Russell 3000, Wilshire 5000, NASDAQ 100",
"region": "North America",
"sector": "Consumer Services",
"ticker": "AMZN",
"country": "United States",
"exchange": "NASDAQ",
"industry": "Catalog/Specialty Distribution",
"entity_id": "EQ0021695200001000",
"global_id": "BBG000BVPV84",
"competitors": [
"AAPL",
"BKS"
]
}
],
"event_author_rank": [
{
"author_rank": 4,
"event_group": "Employment Actions"
},
{
"author_rank": 4,
"event_group": "Employment Actions"
}
],
"event_groups": [
{
"type": "Recruitment",
"group": "Employment Actions"
},
{
"type": "Layoff",
"group": "Employment Actions"
}
],
"event_impact_score": {
"overall": 40.88471673254282,
"on_entities": [
{
"entity": "AAPL",
"on_entity": 31
},
{
"entity": "AMZN",
"on_entity": 32
}
]
},
"event_source_rank": [
{
"event_group": "Employment Actions",
"source_rank": 6
},
{
"event_group": "Employment Actions",
"source_rank": 6
}
],
"event_summary": {
"group": "",
"theme": "",
"topic": "",
"action": "",
"sub-theme": "",
"acting_party": ""
},
"first_mention": false,
"harvested_at": "2017-02-08 16:17:39 UTC",
"overall_author_rank": 5,
"overall_source_rank": 6,
"source_id": null,
"story_id": {
"$oid": "589a6b6469fe9f7f70ac1df6"
},
"story_saturation": "high",
"story_sentiment": 0.072,
"story_shares": null,
"story_volume": 58
}
]
GET https://feed.accern.com/v3/alphas?token=TOKEN
By default this request will return the most recent 100 documents.
Filtering
Filter by last_id
curl "https://feed.accern.com/v3/alphas?last_id=1774184&token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas?last_id=1774184&token=TOKEN")
url = "https://feed.accern.com/v3/alphas?last_id=1774184&token=TOKEN"
url <- "https://feed.accern.com/v3/alphas?last_id=1774184&token=TOKEN"
Filter by index
curl "https://feed.accern.com/v3/alphas?index=sp500&token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas?index=sp500&token=TOKEN")
url = "https://feed.accern.com/v3/alphas?index=sp500&token=TOKEN"
url <- "https://feed.accern.com/v3/alphas?index=sp500&token=TOKEN"
Filter by multiple indexes
curl "https://feed.accern.com/v3/alphas?index=sp500,dow30&token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas?index=sp500,dow30&token=TOKEN")
url = "https://feed.accern.com/v3/alphas?index=sp500,dow30&token=TOKEN"
url <- "https://feed.accern.com/v3/alphas?index=sp500,dow30&token=TOKEN"
Filter by ticker
curl "https://feed.accern.com/v3/alphas?ticker=amzn&token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas?ticker=amzn&token=TOKEN")
url = "https://feed.accern.com/v3/alphas?ticker=amzn&token=TOKEN"
url <- "https://feed.accern.com/v3/alphas?ticker=amzn&token=TOKEN"
Filter by multiple tickers
curl "https://feed.accern.com/v3/alphas?ticker=aapl,amzn&token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas?ticker=aapl,amzn&token=TOKEN")
url = "https://feed.accern.com/v3/alphas?ticker=aapl,amzn&token=TOKEN"
url <- "https://feed.accern.com/v3/alphas?ticker=aapl,amzn&token=TOKEN"
Parameter | Description |
---|---|
last_id | Returns the latest 100 documents that came after the provided id. Used to prevent duplicates while keeping in sync (see streaming section). |
index | Filters documents by the index, see below table for supported indexes. To filter by multiple indexes pass a comma separated list of indexes. |
ticker | Filters documents by ticker. To filter by multiple tickers pass a comma separated list of tickers. |
Allowed index values
index | expected query string value |
---|---|
S&P 500 | sp500 |
Russell 1000 | russell1000 |
Russell 3000 | russell3000 |
Wilshire 5000 | wilshire5000 |
Barron’s 400 | barrons400 |
DOW 30 | dow30 |
File Format
curl "https://feed.accern.com/v3/alphas.csv?token=TOKEN"
url = URI("https://feed.accern.com/v3/alphas.csv?token=TOKEN")
url = "https://feed.accern.com/v3/alphas.csv?token=TOKEN"
url <- "https://feed.accern.com/v3/alphas.csv?token=TOKEN"
By default the response from the API feed is in JSON format. But if you append .csv you will get the data in CSV format.
CSV Columns |
---|
id |
article_id |
story_id |
harvested_at |
entities_name_1 |
entities_ticker_1 |
entities_global_id_1 |
entities_entity_id_1 |
entities_type_1 |
entities_exchange_1 |
entities_sector_1 |
entities_industry_1 |
entities_country_1 |
entities_region_1 |
entities_index_1 |
entities_competitors_1 |
entities_name_2 |
entities_ticker_2 |
entities_global_id_2 |
entities_entity_id_2 |
entities_type_2 |
entities_exchange_2 |
entities_sector_2 |
entities_industry_2 |
entities_country_2 |
entities_region_2 |
entities_index_2 |
entities_competitors_2 |
event_groups_group_1 |
event_groups_type_1 |
event_groups_group_2 |
event_groups_type_2 |
story_sentiment |
story_saturation |
story_volume |
first_mention |
article_type |
article_sentiment |
overall_source_rank |
event_source_rank_1 |
event_source_rank_2 |
overall_author_rank |
event_author_rank_1 |
event_author_rank_2 |
event_impact_score_overall |
event_impact_score_entity_1 |
event_impact_score_entity_2 |
event_summary_group |
event_summary_theme |
event_summary_topic |
event_summary_action |
event_summary_sub-theme |
event_summary_acting_party |
article_url |
Streaming
To stay in sync with the API you make a request with last_id=[lastest document id]
then grab the id of the latest document that comes back and repeat. We packaged this logic up in our Accern gem, to install follow the instructions on the repo.
Backfill
The API allows you to access data going back 30 days, anything older we provide via other means. To start from 30 days ago and move forward you have to provide last_id=0
. Then continue to hit the API while setting the last_id
query string parameter.
Accern Overview
The Accern API provides a comprehensive, REST-based interface for accessing all financial-related articles processed by our platform within the last 30 days.
Each article is processed through our data pipeline, extracted for entities like equities and financial events which are made available through the API.
We also include numerous, insightful analytics like sentiment, impact score, story saturation, etc.
NOTE: Further details in Accern Analytics Section.
Data Attributes
Sample object (article)
Table illustrates all the attributes in a single object (article) of the Accern API response
{
"id": 1774184,
"article_id": {
"$oid": "589b44a569fe9f7f77024f4a"
},
"article_sentiment": 0.098,
"article_traffic": null,
"article_type": "blog",
"article_url": "http://feedproxy.google.com/~r/RedmondPie/~3/mF7K04DF1y4/",
"author_id": null,
"correlations": null,
"entities": [
{
"name": "Apple Inc.",
"type": "Public",
"index": "S&P 500, Russell 1000, Russell 3000, Wilshire 5000, BARRON'S 400, NASDAQ 100",
"region": "North America",
"sector": "Technology",
"ticker": "AAPL",
"country": "United States",
"exchange": "NASDAQ",
"industry": "Computer Manufacturing",
"entity_id": "EQ0010169500001000",
"global_id": "BBG000B9XRY4",
"competitors": [
"GOOG",
"HPQ"
]
},
{
"name": "Amazon.com, Inc.",
"type": "Public",
"index": "S&P 500, Russell 1000, Russell 3000, Wilshire 5000, NASDAQ 100",
"region": "North America",
"sector": "Consumer Services",
"ticker": "AMZN",
"country": "United States",
"exchange": "NASDAQ",
"industry": "Catalog/Specialty Distribution",
"entity_id": "EQ0021695200001000",
"global_id": "BBG000BVPV84",
"competitors": [
"AAPL",
"BKS"
]
}
],
"event_author_rank": [
{
"author_rank": 4,
"event_group": "Employment Actions"
},
{
"author_rank": 4,
"event_group": "Employment Actions"
}
],
"event_groups": [
{
"type": "Recruitment",
"group": "Employment Actions"
},
{
"type": "Layoff",
"group": "Employment Actions"
}
],
"event_impact_score": {
"overall": 40.88471673254282,
"on_entities": [
{
"entity": "AAPL",
"on_entity": 31
},
{
"entity": "AMZN",
"on_entity": 32
}
]
},
"event_source_rank": [
{
"event_group": "Employment Actions",
"source_rank": 6
},
{
"event_group": "Employment Actions",
"source_rank": 6
}
],
"event_summary": {
"group": "",
"theme": "",
"topic": "",
"action": "",
"sub-theme": "",
"acting_party": ""
},
"first_mention": false,
"harvested_at": "2017-02-08 16:17:39 UTC",
"overall_author_rank": 5,
"overall_source_rank": 6,
"source_id": null,
"story_id": {
"$oid": "589a6b6469fe9f7f70ac1df6"
},
"story_saturation": "high",
"story_sentiment": 0.072,
"story_shares": null,
"story_volume": 58
}
Attributes | Type | Description |
---|---|---|
id | integer | unique id for feed (1 or greater) |
article_id.$oid | string | unique id per article |
article_sentiment | decimal | determines if article was written positively/negatively (-1.000 - 1.000) |
article_type | string | determines the source of an information (ex. blog, article) |
article_url | url string | original link to article |
entities | list | List of associated equities objects that are identified for this article |
entities_name | string | name of the company (8,000+ U.S. public equities) |
entities_type | string | Classifying if it’s publicly traded (ex. public) |
entities_index | string | Comma-separated string of indices company is listed on |
entities_region | string | Region of the company’s headquarters |
entities_sector | string | Sector of the company |
entities_ticker | string | Ticker of the company |
entities_country | string | Country of company’s headquarters |
entities_exchange | string | Exchange the company is traded on |
entities_industry | string | Industry of the company |
entities_entity_id | string | Entity level ID of the company, derived from Bloomberg Open Symbology |
entities_global_id | string | Unique global ID of the company, derived from Bloomberg Open Symbology |
entities_competitors | list | List of top three competitors associated with the company |
event_author_rank | list | Each object indicates the author’s reliability in reporting on specific events |
event_groups | list | Each object has a major event group and a subsection of that group |
event_groups_type | string | A subsection of an event group for more detail |
event_groups_group | string | A major event i.e. event group |
event_impact_score | object | Calculates the article’s impact i.e. chance of affecting the associated company’s stock price |
event_impact_score_overall | decimal | Determines chance of event affecting stock prices in general by end of trading day |
event_impact_score_on_entities | list | Determines chance of event affecting associated company’s stock price by end of trading day |
event_source_rank | list | Each object indicates the source’s reliability in reporting on specific events |
event_summary_topic | string | Level 1 event category |
event_summary_group | string | Level 2 event category |
event_summary_theme | string | Level 3 event category |
event_summary_sub_theme | string | Level 4 event category |
event_summary_action | string | action of an event |
event_summary_acting_party | string | parties associated with the event |
first_mention | boolean | If this article is the first one to break this new story |
harvested_at | datetime | UTC formatted time when Accern received article |
overall_author_rank | integer | rank (1-10) of how reliable author is at releasing articles in general |
overall_source_rank | integer | rank (1-10) of how reliable source is at releasing articles in general |
story_saturation | string | how much exposure this story has currently. ex. high, mid, low |
story_sentiment | decimal | +ve/-ve sentiment score of the story by averaging related articles’ sentiment published so far |
story_volume | integer | number of articles associated with this story until now |
Data Coverage
Brief elaboration on what kind of data and where it comes from. Table of average daily statistics regarding the data processing pipeline.
Types of Data -
Accern acquires information from many types of online sources.
- Public News Websites
- Public Blogs
- Press Releases
- Financial Documents ex. SEC Filings
- Other Social Media ex. Tumblr
How we Acquire the Data -
Accern has multiple avenues for financial information. They include our own, in-house web scrapers and data obtained via partnered data providers.
Data Providers: Majority of our data comes through our data providers. Currently monitoring around 300million+ sources (websites).
Proprietary Scrapers: Accern has proprietary crawlers that monitor around 0.5million+ public, high-alpha sources. NOTE: These important sources break market-moving news the fastest.
Quick Info on Data Pipeline -
Numbers | Numbers |
---|---|
Total Websites Monitored | 300 million-plus |
Number of Articles Processed Each Day | 5 million-plus |
Number of Articles Delivered Each Day | 20,000-plus |
Format of Data Delivered | JSON, CSV |
Real-time Data Delivery Method | REST API, Web Portal |
Processing Time Per Article Published | 40 milliseconds |
Trading Analytics Derived Per Article | 10-plus |
Archive Date Range Available | 08/25/2012 to 08/19/2016 (4 years) |
Number of Archive Articles | 15 million-plus |
Archive Delivery Method | FTP, Dropbox |
Data Financial Asset Mapping | Tickers, Bloomberg ID |
Financial Assets Coverage | 8,000-plus U.S. public equities |
Financial Events Coverage | 1,000-plus financial events |
Story Classification
THOUSANDS of sources/authors post about the SAME news stories resulting in MILLIONS of articles/day.
Accern groups together articles talking about similar information (i.e. certain equities A, B, & C are involved in some financial event D) into UNIQUE STORIES. Accern is the pioneer of this story classification process, allowing you to track how information flows online.
- You will know the current exposure of a financial news headline
- You will know the overall sentiment of this market event
- You will know if the source that just posted a rumor is reliable based on authenticity of previous rumor-related stories published
- Many more analytics
In Depth: Story classification model is agnostic of article sentiment and it only groups based on the extracted entities and events. Semantic structure of the article is taken as input. The model identifies important themes and checks in the last 2 weeks for similar themes. If similar theme was found, it groups along with this theme. A combination of machine learning models used to extract entities and then linked back amongst the 8k+ U.S equities and 1k+ financial events.
Accern has compiled multiple entity dictionaries for linking names mentioned in media to the correct entities.
Proprietary Entities Dictionary Accern has developed a proprietary dictionary that maps all ways financial entities are mentioned in the media to the 8,000-plus U.S. public equities.
The dictionary consists of over 150,000 company name variations and includes Bloomberg IDs, ticker symbols, etc. mapped back to these company names. How? - Looking at historical data, we identify the different ways companies have been mentioned to build and update this dictionary.
Proprietary Financial Events Dictionary Accern has developed a proprietary dictionary that maps over 30,000 financial event variations mentioned in media to an aggregated list of 1,000-plus financial events. How? - We worked with financial analysts/equity researchers to figure out an initial list of important financial events. Then, starting with this list and looking at historical data, we figured out different variations of event names.
Noise Cancellation
Accern processes millions of articles/day of which a significant amount ends up as spam/ads/coupons/etc. To tackle this issue, Accern has optimized it’s noise detection algorithms over the years. This noise cancellation mechanism takes around 150M+ articles (articles, blogs, etc.) every day and gives out around 25k articles at the end.
Proprietary Pattern Recognition Spam Detector
Our proprietary noise cancellation mechanism identifies if the input article qualifies as spam and does not contain any insightful information.
How? - A compiled list of regex-expression based phrases are used to classify the articles as spam or not. Using machine learning models and pattern recognition, Accern is able to automatically figure out which articles can be classified as spam based on the way their titles are written.
Example: Save Up to 50% on Your Purchases with Coupons at Amazon.com!
Accern will use pattern recognition to identify future articles talking about coupon codes from Amazon and automatically remove it.
Proprietary Blacklist of Spam Websites Accern has compiled a proprietary list of websites that are known to release spam, irrelevant content to financial markets investors.
How? - Looking in the historical archive of articles posted by different news sources, we calculate the probability that a newly published article from this source will turn out as spam. If the probability is above a threshold, we include it in our blacklist which is constantly kept up to date.
Accern Analytics
Unique Stories (story_id)
What is it? Financial news stories are published on the web and social media in many forms. For ex. articles, & SEC filings. We scour millions of these articles talking about financial events and group them into similar stories. Each story specifies 2 important details. The equities that are being talked about (currently amongst 8000+ US public equities) and a description of the associated financial events (currently 1000+ financial event distinctions available)
Quick Definition: a story is an event that involves a company.
How is it created? Every day we scrape a million+ articles of various types such as blogs, SEC filings and news articles. These mentions go through the story classification model to identify the associated equities (from the existing 8000+ US public equities) and the associated financial event (1000+ distinctions available). All mentions that are similar in the above entities are grouped together into their own stories. Each story which is a combination of companies and events is given unique ids - (story_id).
Examples
An example of a big, viral story regarding Google and Legal Actions consists of articles below that are all talking about similar information.
‘Google alleges Uber stole its self-driving secrets’ by Livemint.com
‘Google accuses Uber of stealing self-drive technology’ by Business-standard.com
‘Lawsuit: Google self-driving car spinout Waymo claims Uber using stolen laser-mapping technology’ by geekwire.com
'Waymo: Uber stole our self-driving car tech’ by cnet.com
An example of a story reporting on a Rumour about Apple can be seen to have been initially posted on a blog before finally showing up on other sources wih bigger reach.
“Apple reportedly plans to 'significantly’ expand Seattle office after Turi acquisition” by bizjournals.com
“Apple plans expansion of artificial intelligence efforts in Seattle” by forums.imore.com
First Mention (first_mention)
What is it? Whenever a financial news story breaks out, many articles (mentions) get published about the same story. First Mention tells us if the article is the FIRST to break that new story.
Quick Definition: A story that has not been mentioned on the internet for at least 2 weeks.
How is it created? The story classification model extracts a theme(combination of event and companies) from input article. It then searches for highly similar themes in the last 2 weeks.
If it finds one, it groups this article with the existing story (theme) and sets it’s first_mention to FALSE. Otherwise, it creates a new story and this article’s first_mention attribute is set to TRUE.
Examples
In brief, the power of knowing an article is the first to break a new story ->
date | headline | first_mention |
---|---|---|
08:50 AM 28 Feb | Xbox launches Netflix-like service for gamers | TRUE |
11:35 AM 28 Feb | GameStop stock price tanks after Microsoft announces new digital-gaming service | FALSE |
Sentiment (article_sentiment)
What is it? Sentiment score (-1 - +1) of the article based on title and content.
Quick Definition: determines if the article was written positively or negatively by the author/editor.
How is it created? Sentiment Analysis of articles involves 3 parallel models. (bag of words + n-grams + deep-learning)
Bag of Words involves a proprietary list of 300,000-plus positive and negative words, differently weighed, which are used to gauge a base sentiment of an article.
N-grams involves a proprietary list of positive and negative two- to three-word phrases which are used to gauge a more accurate sentiment in articles. These lists are compiled by financial analysts.
Next, a Deep Learning model predicts how much the article is positive or negative based on the vector representation of its text.
Finally, a meta model (ensemble learning) uses the output of all these 3 models to generate a final score.
Examples
Snippets of articles with a negative sentiment about a publicly traded equity - Tesco Inc
Tesco strike to escalate - “Over 2-thousand staff in 22 Tesco stores will be on strike by the middle of next week. Another 24 stores were balloted for industrial action by Mandate over the past three nights - 6 agreed to join the 16 stores currently on the picket line. The retailer says the results of the ballot mean there’s an onus on the union to call-off the strike….”
Striking Tesco workers will not have Family Income Supplement suspended - “The ongoing strike at a number of Tesco stores has been suspended from this morning after both sides in the dispute agreed to attend discussions at the invitation of the Labour Court. Tesco has confirmed that it will not make any changes to pre-1996 terms and conditions whilst this process is ongoing. The Mandate trade union said all pickets will be suspended and the talks are expected to get under way this weekend….”
Tesco workers shouldn’t lose Family Income Supplement - “This would be a completely unfair and mean spirited move. I believe that there is scope under the current legislation for the Minister to direct that no such decision is made. ‘By engaging in strike action, the workers are already seeing a reduction in their take home pay; a cut to their FIS payment will devastate families….’”
Sentiment (story_sentiment)
What is it? An average of all the sentiment scores of articles that fall into this story until now. NOTE: If there is only one article in a story so far, then its first_mention=TRUE , and story_sentiment equals article_sentiment.
Quick Definition: aggregates article sentiment and calculate the average sentiment for each story
How is it created? Article sentiments calculated via the procedure explained in the above Article Sentiment Section are all aggregated and the average is calculated. As the story grows, the overall sentiment keeps changing and a trend can be captured with time.
Examples
Looking at articles reporting on company earnings of Baidu (BIDU), there exist mixed reviews from different sources. Interestingly, the overall story sentiment saturated to end up as positive as more and more articles were posted.
Initial Negative articles -
- Baidu’s quarterly revenue falls 2.6 percent - “Baidu Inc reported a second straight drop in quarterly revenue as regulatory scrutiny into healthcare and related advertisements continued to take a toll on the Chinese internet search giant. The drop, however, was within the 17.84-18.38 billion yuan range the company had previously forecast. Analysts estimate that healthcare accounts for about 20-30 percent of Baidu’s search revenue, which represents more than 80 percent of the company’s total sales….”
Dispersed Positive articles -
Baidu reports stable 2016 revenue growth - “Chinese Internet giant Baidu reported stable revenue growth in 2016, helped in part by artificial intelligence (AI) upgrades to its various products. Baidu continued to see stable user growth for its search and map services, with its mobile payment business Baidu Wallet attracting 100 million users by the end of 2016, surging 88 percent compared with 2015….”
Baidu posts bleak fourth quarter, but sees business reshuffle driving 2017 growth - “Baidu Inc’s revenue fell for a second straight quarter, hurt by a government crackdown on healthcare advertising, but the internet search giant expects a rebound this year as it retools to find growth outside its core ad business. The company has stumbled over the past few years - firstly from a cash-burning subsidy war with rivals such as Alibaba in businesses like food delivery, movie tickets and taxi hailing….”
Accern Rank(overall_source_rank)
What is it? Accern Rank identifies if the information from a source is posted promptly and if that information will go viral (similar articles published by others). Rank 1 is lowest and Rank 10 is highest. In other words, it lets you know which sources usually are among the first to publish articles on a new story and also informs if they have a knack at posting on stories that become wide spread.
Quick Definition:
overall_source_rank - determines if the SOURCE is reliable at releasing trending stories. i.e. reliable indicating ability to post early and trending indicating potential of wide spread stories.
overall_author_rank - determines if the AUTHOR is reliable at releasing trending stories.
How is it created? A graphical model takes into account historical data (past articles), how certain news appeared in the past and how the distribution of articles within a story looked like. It checks in the past, which sources posted faster in comparison to other sources which then posted contextually similar articles.
Examples
Overall Source Rank (High) - StreetInsider releases stories first, and their stories get republished by many other sources.
Overall Author Rank (Low-Mid) - John Paul* releases stories on StreetInsider first, but his stories don’t get republished by any other authors.
*
Accern Rank(event_source_rank)
What is it? Ranks are based on the same Accern Rank model which tries to predict promptness and ability to post republished stories. Rank 1 is lowest and Rank 10 is highest. event_source_rank/event_author_rank is more precise. For ex. Tumblr posts rumors faster than others. Bloomberg posts financial docs faster than others. It would be prudent to the client to notice that sources will have varied ranks for different events.
Quick Definition:
event_source_rank - determines if the SOURCE is reliable at releasing articles associated with a financial event.
event_author_rank - determines if the AUTHOR is reliable at releasing articles associated with a financial event..
How is it created? The ranking model is the same. Ranks are calculated by filtering based on financial events.
Examples
Event Source Rank (High) - StreetInsider releases lawsuit stories first, and their lawsuit stories get republished by many other sources.
Event Author Rank (Low-Mid) - John Paul* releases lawsuit stories on StreetInsider late, but his stories are republished by some authors.
Saturation (story_saturation)
What is it? The online exposure of the story i.e. have a lot of people seen this story/information already? This is one of the useful metrics possible thanks to story classification.
Quick Definition: gauge the current potential exposure of a story
How is it created? Based on web traffic information (provided by Alexa Rank) of related articles and previous historical data, we predict the exposure of the story into different levels - low, mid, & high.
3-Step Process for Computing Saturation
Step 1: accumulate web traffic per story We accumulate total web traffic based on all related articles per story
Step 2: average web traffic per story We look back at all similar stories’ total web traffic and take an average per story
Step 3: segment average story web traffic We segment the average story web traffic into low, mid, and high saturation
Examples
Saturation (High) - story published on 100+ websites
Saturation (High) - story published on two major newswires
Saturation (Low) - story published on one medium-traffic website
Saturation (Low) - story published on five small websites
Impact (_overall)
What is it? When a certain type of story appears in media, calculate the probability that the stock price of the company moves up/down by more than 1% by EOD. Overall Impact score checks how an event like Company Earnings generally has high impact compared to other events. Overall Impact Score is an average across all the entity_impact_scores for different companies.
Quick Definition (event_impact_score_overall): determines if an event has a chance of affecting stock prices of companies in general by more than 1% at the end of the trading day.
How is it created? An example of a retrospective metric. Looking back in history archive, how market behaved for a past similar event is evaluated. In brief, by overlaying 3+ years of financial events data with stock prices market data, we determine if the event has a chance of moving the stock price of companies in general by more than 1% by the end of the day.
Examples
- Event Impact Score Overall (High) - In the past 3 years, whenever a lawsuit happened, it affected the stock prices of companies in general by 1% or more EOD.
Impact (_on_entities)
What is it? Entity Impact Score is more precise as it it returns a probability that particular event will affect SPECIFIC equities. For example, event_impact_score_on_entities of 90 for Apple and ‘Mergers & Acquisition’ event conveys this sort of story/theme moved the market before in the past and there is a high likelihood now as well. Also, this event can vary in impact score for different companies.
Quick Definition (event_impact_score_on_entities:) determines if an event has a chance of affecting the stock price of the mentioned company by more than 1% at the end of the trading day.
How is it created? Using the same procedure to calculate overall impact score of an event, this process filters based on every entity and calculates respective probabilities.
Examples
Sample Snippet
{
"....": "...",
"event_groups": [
{
"type": "Financial Results",
"group": "Company Earnings"
}
],
"event_impact_score": {
"overall": 48.55540720961282,
"on_entities": [
{
"entity": "EBAY",
"on_entity": 26
},
{
"entity": "AMZN",
"on_entity": 36
}
]
}
}
Considering the right-side snippet, we see that the results of Company Earnings reports have a lower impact on EBay and Amazon compared to all the companies overall.
Similarly, an event involving Criminal Actions/Fraud may have a high overall impact (event_impact_score.overall), but certain entities like Google are impacted 50% less. (event_impact_score.on_entities.on_entity)