Saturday, April 21, 2018

Fascinating Lecture on Uncovering Insight from Patent Data Using Text Mining with Applications to Cybersecurity

Yesterday, Dr. Davit Khachatryan, who is an Isenberg School PhD alum in Management Science, and now is a Professor at Babson College, gave a captivating lecture in our UMass Amherst INFORMS Speaker Series. The title of his talk was: Uncovering Insight from Patent Data Using Text Mining.

Prior to joining the faculty at Babson College, Davit was a Senior Associate at the top consulting company, PWC. It was a pleasure to welcome a very successful alum back to campus! Davit was the 8th and final guest speaker in our series for the 2017-2018 year. This year we had the pleasure of hosting Professor Renata Konrad of WPI, Professors Shannon Roberts and Hari Balasubramanian of UMass Amherst, Professor Dmytro Matsypura (also an Isenberg School PhD alum and my former doctoral student) from the University of Sydney in Australia, Dr. Les Servi of MITRE, Professor Jim Orlin of MIT, and Professor Burcu Balcik of Ozyegin University in Turkey. As the Faculty Advisor to the award-winning UMass Amherst INFORMS Student Chapter, I help the students to organize this Speaker Series, which adds tremendously to the intellectual life of the Isenberg School and UMass Amherst and also provides leadership and networking opportunities for the students. After each talk, I host a lunch at the UMass Amherst University Club so that we get to know the speaker(s) in a relaxing manner and continue the discussions.

Deniz Besik, the President of the Student Chapter this year, introduced Professor Khachatryan and then he began his lecture.
He provided us with a fascinating introduction to the patenting process in the US, dating back to President George Washington and noted that the first patent in the US was issued in 1790 to Samuel Hopkins and it was signed by Washington. The first business processing patent was given to Herman Hollerith in 1889 and then the company was renamed as IBM.

There were more than 100 million patents issued worldwide by 2016. In effect, a patent is a government-sanctioned right for a short-term monopoly to exclude others for 20 years, Davit said. However, a patent is public knowledge.

He provided us with a list of what can't be patented:
  • laws of nature
  • natural phenomena
  • abstract ideas
and what can be - subject matter that is:
  • novel
  • non-obvious
  • useful
  • enabled. 
He provided us with an example of a patent:
and noted that a patent could be challenged for validity, once it is granted, and he singled out the Cisco vs. Cirrex Systems case, which Cisco won. He emphasized why validity matters - a patent not only needs to be granted but should stay granted. His research involves, in part, identifying the "drafting quality" of a patent and seeing, through text mining and statistics, whether the specifications in a patent application and the claims align.

The application that he described was for patents for cybersecurity for business data processing, a topic of great interest to me, since our supernetwork team has been doing a lot of research and publishing in cybersecurity. Even my most recent talk, which was just last Monday at the INFORMS Analytics Conference in Baltimore, was on cybersecurity. Davit defined a novel measure of  "drafting alignment" for the assessment of consistency between specifications and claims in a patent. His framework, done with a co-author at Babson, was represented in the following chart:
And, intriguingly, he identified all business processing cybersecurity patents since November 29, 2000, using a NIST keyword glossary and arrived at 2,379 such patents, and displayed a slide with those who are responsible for such patents, which include many company household names.
His approach makes use of LDA - Latent Dirichlet Allocation, based on the work of Blei et al. (2003). Each document is a probabilistic mixture of topics and he identified 300 cybersecurity topics in the patent data. He discussed text mining issues such as the preprocessing of data (the removal of Greek letters, for example), the removal of common, patent-related language, stemming, tokenization to unigrams - a vector of words and the construction of a document-term matrix. I very much appreciated his measure with credit given also to Hellinger to construct a single number for each patent document.

The lecture was outstanding and members in the audience were from the Isenberg School,  the Department of Mechanical and Industrial Engineering, as well as the Department of Civil and Environmental Engineering, among other departments at UMass Amherst. I was delighted that the announcement for his talk made the UMass Amherst homepage!
After a lot of applause, we took a group photo of some members of the audience with the speaker and then headed off to lunch.
The conversations at lunch were fabulous and ranged from updates on what is happening in the Isenberg School and UMass to GE and even Armenia, the country where Professor Khachatryan was born. And, as is our tradition, we topped off the delicious lunch at the University Club with dessert.
After, dessert, Professor Khachatryan was interviewed for the chapter's youtube channel and I will let you know when the interview gets posted.

We thank our guest speaker, Professor Davit Khachatryan, for such an illuminating and fascinating lecture! I must admit our Isenberg School of Management Management Science PhD alums are incredible educators and role models.