As an architecture team developing solutions for many startups, we are frequently asked one question—how can I incorporate machine learning algorithms into our product. The question comes mainly from startup founders who are eager to differentiate their product by incorporating machine learning early in the product development cycle.
For a startup, there is no large data set available at the beginning to train the machine learning models. However, the machine learning algorithm is effective only when you use a model that has been trained on the data specifically related to your users.
Here are different ways startups can slowly bridge the gap between developing a product that works well with initial users and no large data to developing a fully matured solution that uses machine learning algorithms with models trained with large data gathered over a period of time.
- Ask short but relevant questions to the user in your application or portal. Use them to build a user-profiles and provide personalization.
- Capture user location, device data and other context data in your application or portal and add it to the user profile.
- Capture user behavior in the app and feed it back into your personalization engine to enrich the initial profile.
- Use rules engine to co-relate different profile values, their weightage and classify users based on the rules engine output.
- Use regular expressions and look-up dictionaries to interpret the free text for natural language understanding.
- Use a hybrid approach for text interpretation: Some natural language processing libraries like Stanford NLP provide part of speech (POS) and named entity recognition (NER) tags that you can use in regular expression along with the text to provide a rich interpretation of the text.
- Narrow down the interpretation of unstructured data based on other related structured data.
- Tag data based on rules or regular expressions.
- Provide a manual review of tags or meta-data extracted from unstructured data and ways to correct it. This can be done using the backend portal which allows reviewers to the easy way of quickly inspecting the unstructured data and do corrections. There are also third-party services like Amazon Mechanical Turk which can be used to augment and correct meta-data using human intelligence.
- Provide ways for the user to manually tag data in your backend admin portal.
- Write regression tests to calculate precision, recall, and accuracy values based on manually tagged data. As you improve your rules and algorithms, keep running regression tests to make sure, you are improving these values.
- Use a weighted average of attributes. Initially, you have structured data and meta-data extracted from unstructured data. It is not possible to use feature engineering to decide on important attributes in data due to an initial lack of large amounts of data. In such cases, weights can be assigned to different attributes based on human knowledge and the final predicted value can be calculated as a weighted average of these attributes.
- Use the tagged data to train ML models.
- Leverage data acquired through your products in the market.
- Scrape relevant data from sources on the web and use it to train ML models.
- Generate synthetic data programmatically and use it to train the models: The success this method total depends on how close you can generate synthetic data to real-data with all the randomness and distributions and should be chosen carefully.
- New cloud-based services with prebuilt vertical AI ML models are coming up which can be accessed through APIs.This can be the first step to incorporate ML in your product.
- Use Machine Learning APIs from cloud providers with built-in models. For speech and text processing, you can use Amazon—Lex, Transcribe, Polly, Comprehend, Translate or Google—DialogFlow, Cloud Natural Language API, Cloud Speech API, Cloud Translation API or Microsoft Speech and Language APIs. For video and
image processing, you can use Amazon Rekognition, Google Vision, Microsoft Azure Image, and video processing APIs or Google’s Cloud Vision APIs.
- User MLaaS services like Amazon SageMaker, Microsoft Azure Machine Learning Studio, Google Cloud Machine Learning Engine, IBM Watson Machine Learning Studio.
- If you wish to develop stack from scratch, choose technology stack like Apache Spark, Flink that provides libraries to run ML algorithms at scale: You can initially run rules engine or procedural programs at scale within these frameworks till enough data is gathered to train ML models. You can then introduce ML algorithms within the same frameworks without needing to choose another stack.
- Make sure to capture all the raw data right from the start. When you have enough data, use it to train your supervised learning models and you are on your way to take advantage of machine learning algorithms in your product.
What other techniques have you used in your Products Development in the early stages without access to large amounts of data for personalization or natural language understanding?