<img height="1" width="1" style="display:none" src="https://www.facebook.com/tr?id=1691976594387096&amp;ev=PageView&amp;noscript=1">

ESTIMATED TIME:   2 minute read

Synthetic Data Improves Microsoft’s Next-Gen Conversational AI Engine

At a glance 

The Spur Group helped Microsoft create better modeling within structured, synthetic data production that resulted in an evolving, scalable conversational artificial intelligence (AI) engine. Our collaboration improved functionality performance, increased data collection relevancy, reduced annotation overhead, and expanded production capacity. 

What we delivered 

  • Data strategy, engineering, and analysis 
  • AI/machine learning 
  • Content strategy and development
  • Project, process, and program management 

Microsoft’s next generational AI engine 

The quantity of usable data and the quality of deep learning algorithms determine how “smart” an AI engine can become. However, most ventures reach the end of their data supply long before their algorithms begin to falter.  

Microsoft, a leader in technology advancements, acquired Sematic Machines to serve as its next generational AI engine. The AI needed to be trained on practical technologies for accurate responses with the end-users. To drive this result, Microsoft decided to study the AI’s ability to calendar by scheduling relevant meetings, reading daily agendas, and responding to similar tasks, allowing the tech company to focus on four components: 

  1. Improving data quality and product performance 
  2. Building a tailored data set focused on mitigating identified gaps and training their AI 
  3. Streamlining data production processes and defining best practices  
  4. Developing strategies to focus data production on accelerating product development  

Barriers to AI engine development 

Microsoft sought The Spur Group’s expertise to innovate their data production processes and systems. The technology leader asked us to help improve the data quality and acceptance of its highly sophisticated conversational AI engine. 

Furthermore, limited data production scale failed to accelerate product and feature development, and the lack of available data hindered the AI engine’s deep modeling and machine learning components. 

The outsourced data production needed higher quality and relevancy standards. Minimal training in data across key scenarios additionally created gaps in the AI knowledge base. 

Refining the data production process 

Our team developed a strategy for tackling large scale data production with Microsoft. We aided the client in recruiting the right talent, building the culture, and developing the execution strategy.  

“We would not be in the position we are today without The Spur Group. They brought a level of expertise and professionalism that was above and beyond,” said Mikko Ollila, Microsoft Principal Program Manager. 

Within two weeks of strategy development, our team established a team of data scientists to audit Microsoft’s data production, workflow, processes, systems, and algorithms. Taking a hands-on approach, we spent the initial phase working side-by-side with Microsoft’s team to fully understand the end-to-end workflow. We considered existing processes and uncovered gaps in the data. 

Once completed, we pivoted our focus to working in lockstep with AI researchers to define gaps in the knowledge base and response generation. The targeted data enabled our team to create 56 classifications and predictions to increase the AI conversational responses. Subsequently, we developed a Markov chain algorithm to provide automatic language categorization. 

Improved data management and process optimization 

14%

Average functionality performance

33%

Reduction in annotation overhead

36

Data migrations 

 

Quantitative results show we successfully improved Microsoft’s data management. Our teams raised average functionality performance by 14%. While managing AI performance across more than 100 product features, we additionally performed 36 data migrations to ensure data usability and model acceptance. 

“Our partnership significantly accelerated the evolution of our AI engine and our progress toward our product vision,” Ollila said. 

In terms of process optimization, our team saved thousands of hours in data production through machine teaching and automation strategies. We built automated solutions to reduce manual annotation requirements by up to 96%. Our teams also helped increase data collection relevancy and reduced annotation overhead by 33%. Finally, we expanded production capacity by training an additional 18 specialists and 12 annotators in conversational AI best practices and processes.