Our Core Technologies

Our expertise lies in a broad spectrum of areas, spanning machine learning, signal processing, statistical modeling and computer vision. We have strong research and industrial background in fields such as automatic speech recognition, human action/activity recognition and multimodal gesture, multiple sensors’ and modalities’ fusion, natural language processing, sign language modeling and recognition, data analysis and visualization, social signal processing and behavior analytics. We have worked with powerful tools and technologies and we are continuously extending our in-house machine learning components.

Core technology:
Deep Learning

Deep learning (DL) is an area in machine learning which has been recently reinvented. It has already led to computational breakthroughs in fields, such as object recognition in computer vision. With DL we can model high level data abstractions by using a deep graph with multiple processing layers, composed of multiple linear and non-linear transformations. Nowadays, it is considered to be a cutting edge technology and is employed in a variety of applications, while producing state-of-the-art results, and either outperforming traditional, “shallow” approaches, or yielding similar results without any “hand-crafting” effort. In our team we employ DL in problems such as classification, recognition, detection, forecasting and optimization by applying architectures such as deep neural networks (DNNs), convolutional deep neural networks (CNNs), deep belief networks (DBNs), recurrent neural networks (RNNs) and long short term memory (LSTM), to problems in computer vision, automatic speech recognition, natural language processing, stock forecasting and recommendation systems, to name but a few. Please, check out our use cases.

Expertise in Automatic
Speech Recognition

Our automatic speech recognition engine is based on high-end acoustic and language models, providing customizable speech-to-text solutions with state-of-the-art performance and accuracy.


  • Cloud/desktop-based versions; lighter version for mobiles/tablets
  • Vocabulary and grammar customizations
  • Keyword spotting
  • Offline terms’ retrieval, content analysis, call analytics
  • Speech-text transcription alignment of long/noisy audio with transcription errors
  • Speaker adaptation
  • Voice activity detection
  • Multiple languages support (currently: en-US and el-GR); extendable

Expertise in Video processing
& automatic

We provide features that span a great range of modeling and recognition applications such as multimodal events, human actions, hand gestures.


  • Cloud/desktop-based versions
  • Visual events vocabulary and grammar customizations
  • Visual event spotting
  • Offline visual events/actions retrieval,
  • Video analytics, visual and multimodal content analysis

Expertise in Activity detection,
analytics & recognition

Another case is learning events and patterns related to motion, employing sensor data from smartphones, or wearable devices. This includes cases of data from sports, dancing/art, car driving, bicycle riding and so on.


  • Cloud/desktop-based versions
  • Activity detection and recognition (e.g. sit, walk, run, drive, etc)
  • Transportation mode recognition
  • Driving trip detection