Please tell me about Lexalytics and how has it evolved into the company it is today.
Lexalytics has been a leader in AI-driven text analytics and NLP for more than 15 years. We process billions of documents per day for data analytics companies and enterprise data analyst teams across the globe. Since founding the company in 2003, we’ve delivered the world’s first commercial sentiment analysis engine in 2004, the world’s first Twitter/microblog-specific text analytics in 2010, the world’s first semantic understanding based on Wikipedia in 2011, and the world’s first, unsupervised ML model for syntax analysis in 2014. We’re able to analyze text in more than 20 languages natively and have developed specialized, industry-specific versions of our software for multiple industries including pharmaceuticals, hospitality, airlines, and e-commerce, to name a few, and are continuously pushing innovation in the space to drive value for our customers.
In addition to our in-house expertise, Lexalytics launched the Magic Machines AI Labs initiative in January 2017 in partnership with the University of Massachusetts Amherst’s Center for Data Science and Northwestern University’s Medill School of Journalism, Media and Integrated Marketing Communications to drive innovation in artificial intelligence and machine learning.
Why analyze text-data and what are the specific challenges behind analyzing so much text-data?
Text data is ubiquitous throughout any organization: emails, employee surveys, applicants’ resumes, contracts, social media content, customer reviews, the list goes on. With so much content to deal with, it would be impossible to hire the number of people necessary to read, process and analyze it without the help of machines. But without analyzing that content, enterprises wouldn’t know what customers are saying, how employees are feeling, how products are performing in the marketplace and so on. This is why text analytics is so important to an organization.
The biggest challenge in analyzing text is accurately understanding the underlying insights in the data. “Natural language” text documents (like tweets or Facebook comments) may contain ambiguities, slang, misspellings and poor grammar, inconsistent semantics and syntax, industry-specific jargon, or sarcasm. Many text analytics systems require large, expensive data sets and sophisticated machine learning techniques to properly categorize and tag the data, along with constant tuning to keep up with the changes in language that happen over time.
How do you derive reliability and accuracy in your analysis? What is behind a good NLP engine?
One area where accuracy is really important in NLP is in named entity extraction, where the system automatically pulls proper nouns (people, places, products, companies, titles, jobs, etc.) Sentiment analysis -- determining whether a piece of content is positive, negative or neutral – also demands high accuracy. Lexalytics systems come pre-installed with lists of entities and pre-trained machine learning models so that customers can get started immediately and automatically detect relevant entities. People, places, dates, companies, products, jobs, and titles are all automatically detected. With Lexalytics machine learning models, customers can discover new competitors just entering the market, track the activity of spokespeople at competitors and customers, and catch new products at the moment of launch. Customers can also build their own lists of custom entities for tracking. Cuts of lumber, types of cancer, variants of a stereo model – anything that a business considers an entity — can be identified and tagged as such.
When it comes to evaluating the sentiment (positive, negative, neutral) of a given text document, studies show that human analysts tend to agree about 80-85%of the time. This is what’s known as inter-rater agreement. That 80-85% mark is the baseline Lexalytics tries to meet or beat when training a sentiment scoring system, depending on the customer problem. This also means that there will always be some text documents that even two humans can't agree upon. For example, if someone says, “We’re switching to DirecTV,” this would be a positively-scored document for DirecTV, but negatively scored for Comcast. Context is key. Lexalytics has near-peak accuracy right out of the box, but depending on the use case, some additional tuning may be necessary to meet a customer’s goals.
Lexalytics mentions on its capability to predict intention through text-analytics. How is this achieved? Can intent be quantified?
Historically, text mining has been a retroactive process: analyzing text to see what’s already been expressed in the past. Lexalytics’s technology sends this paradigm on a 180-degree course change and can help predict what a customer might do based on what they’ve said. With intention extraction, we can determine the expressed intent of customers and reviewers, that is, whether a person will buy, quit, sell, or recommend a product.
Suppose in your scores of customer reviews, a bunch of customers posted “I bought the new laptop yesterday, but I don’t like it. It keeps crashing on me (I think it’s a hardware issue). I’ll be returning it tomorrow.” Your run-of-the-mill text analytics will easily extract basic information, including that the sentiment is “negative,” the object is “laptop,” and categorize into something like “errors” for crashing if so configured. Lexalytics can take the analysis one step further by revealing that the customer bought the laptop, but now intends to “return” or quit it. The customer discovered that due to the crashing issue, they’re going to lose a customer. With this insight, the customer can now dig to the root of the problem and analyze their data in new ways.
Intentions are interesting because they can be used directly to develop new revenue streams, find leads, defend current revenue streams (by rapidly identifying potential customer churn), route social media support requests, along with many other possibilities.
How is Lexalytics NLP engine different from say traditional methods when it comes to analyzing intent?
Other text mining systems use simple keyword analysis to indicate the presence of intent-based on the presence of a word like “buy”. Lexalytics’s proprietary Syntax Matrix™, enables us to understand the structure of a sentence and identify intent without being restrained to keyword lists. Equally importantly, we extract all the contextual information our customers need to make a business decision on that intent, so they can plan and take action right away.
Going back to our laptop review example from above, a run-of-the-mill text analytics system will extract basic information, such as sentiment (negative) and the object/entity (“laptop”), and classify the content into a category like “errors”. Lexalytics’ intention extraction takes the analysis one step further by revealing that the customer bought the laptop, but now intends to return or “quit” it. Other text mining systems use simple keyword analysis to indicate the presence of intent based on the presence of a word like “buy”.
Could you give me some use cases/examples of how text-analytics is providing insights to your clients?
Sure, here are a couple:
One example of how an enterprise analytics team works with Lexalytics is Microsoft’s Customer Market Research team (CMR), which is dedicated to the design, deployment and analysis of customer surveys. The team worked with us on developing a new set of best practices for integrating a different type of voice-of-customer data: social media content. Microsoft used the Lexalytics Intelligence Platform to analyze social media content and generate context-rich insights into how people feel about thousands of the company’s products. The team validated those results using our reporting tools, and then compared the net sentiment score to quantitative Likert™ Scale survey data. Using this solution, the Customer Market Research team could compare how people talk about products and brands on social media, versus what they say in survey responses. Once they identified common discussion themes and topics, the team aggregated this information to use as a reliable, immediately actionable proxy for traditional survey responses, weeks ahead of receiving the actual surveys. This information helped Microsoft reduce survey spend by substituting social signals where possible, run better surveys by identifying gaps, and help other marketing and product teams make better-informed decisions.
Another example of how we work with customers is in the pharmaceutical space:
Pharmaceutical marketing teams around the world rely on AlternativesPharma to provide expert insights and recommendations related to the day-to-day challenges pharmaceutical brand managers face with regard to growing market share, demonstrating product value, increasing patient adherence and improving buy-in from healthcare professionals. AlternativesPharma draws these insights and recommendations from valuable, rarely tapped qualitative sources such as messages, comments and posts written by patients on social media, blogs and forums. Working with Lexalytics, AlternativesPharma performed a progressive analysis across their tens of thousands of data points. Lexalytics was then used to categorize the data into themes and sentiments, allowing the creation of “thematic maps.” These maps provide the company’s clients with valuable, actionable insights into patients’ emotions and behavior with regard to particular diseases and pharmaceutical products. The recommendations arising from the analysis have resulted in improvements and new approaches in how pharmaceutical companies communicate with healthcare providers, authorities and patients. For example, while introducing a new cancer therapy, one of AlternativePharma’s customers decided to completely change the topics and tonality of their communications to patients, crafting a campaign that deeply resonated with the needs and expectations of their patients, ensuring buy-in.
What is your take in terms of sentiment analysis for predicting returns in finance?
Lexalytics performed a lot of the early work in this area through our partnership with Thomson Reuters. What we learned is that sentiment can be used to predict returns in finance. In the best cases we saw between 30 and 40 basis points of advantage in algorithmic trading systems, although typically the gains were smaller, like 10 to 20 basis points, but they were there.
The interesting thing about it was that the sentiment signal was often a negative indicator for trading systems. A positive bit of news around a company often signaled a short but quick drop in the equity price. Being that we are not traders this seemed counter-intuitive to us, but the trading experts at Thomson Reuters indicated that many trading systems sell on good news.
The end result is that there is plenty of evidence that sentiment can be used as a signal in trading systems, but the returns aren’t enormous, so it hasn’t become a standard component of every hedge funds algorithmic trading system.
Another area we’re excited about regarding NLP is in regulatory compliance -- reducing non-compliance risk by ensuring that financial advisors are making required disclosures and offering appropriate recommendations. Financial services firms must demonstrate that their employees are working in their clients’ best interests. These disclosure requirements may include commission disclosure, cost of credit disclosure or own-product disclosure.
Each disclosure, in turn, may contain a dozen or more sub-components. This adds up to a major burden for the service provider. On average, financial firms dedicate 10-15% of their workforce and spend a combined $270 billion on regulatory compliance annually.
Lexalytics is working to automate this process by applying its AI and machine learning models to the financial services space. We combine our semi-structured data parser with text analytics to quickly analyze long financial documents and extract all of the components: legal disclosures, asset allocation tables, statements of advice, client roles, and more. Because our natural language processing technology gives us an actual understanding of the underlying information, we can make complex connections between data points wherever and however they appear in the document. Then we use artificial intelligence to structure this data and prepare it for further analysis. We empower financial auditors to review all of their documents almost simultaneously, instead of spot-checking 1 in 100 documents. This substantially reduces non-compliance risk for financial services firms and banks.
What's the most challenging moment in your entrepreneurial journey that has forced you to rethink the business?
Lexalytics has undergone two significant transformations in its history. The first was the advent of Cloud computing which brought a flood of new vendors into the market and forced us to create a cloud based offering for our NLP. While this was a challenge, it would be unfair to call it a threat to the business as it was easier for us to respond to this market evolution than others. The second has been the creation of free and nearly-free offerings from Google and Amazon has taken away the bottom end of the Text Analytics market, which forced a re-examination of the business and forced us to accelerate the enterprise-friendly features of our technology to make it more business focused.
What are you most excited about when it comes to future technology and what is in store for Lexalytics?
From a technology perspective there is no doubt that the tight integration between our AI framework (AI Assembler) and our NLP engine (Salience) provides us with a unique and exciting ability to not only auto-build machine learning models that tackle novel problems, but to deploy these models at the click of a button to our NLP stack. Integration work between our AI stack and our NLP stack is seamless, meaning that we can solve hard problems and deploy the solutions in a fraction of the time it used to take.
Lots of businesses are working with cutting edge machine learning algorithms, and can write the necessary plumbing to integrate content and build a viable model, but this takes time and energy. Further it’s important to realize that building a viable model isn’t the same thing as deploying it in an operational environment; that too takes time and energy. With AI Assembler and Salience we have components to tackle every piece of the problem, allowing us to train, test and deploy novel AI in a quarter of the time it would take most providers.
Jeff Catlin has more than 20 years of experience in the fields of search, classification and text analytics products and services. He has held technical, managerial and senior management positions within a variety of companies including Thomson Financial and SovereignHill Software. Prior to the formation of Lexalytics, Jeff acted as the General Manager for the unstructured data group of LightSpeed Software where he was responsible for sales, marketing and development efforts for the Knowledge Appliance and iFocus products. Prior to joining LightSpeed, he was co-owner of PleasantStreet Technologies which produced a news-filtering product. Jeff graduated from UMass Amherst with a degree in Electrical Engineering in 1987.