Product matching in digital marketplaces: Multimodal model based on the transformer architecture
Abstract
In this paper we analyze the problem of intelligent product matching in digital marketplaces for which one requires evaluation of similarity of various records that describe products but may differ in format, content or volume of multimodal data. The subject area of this scientific research represents an intersection of entity resolution (ER) problem solving methods: record matching and multimodal data analysis. It is of extreme relevance in a fast-growing platform economy with the e-commerce market expanding exponentially. The main purpose of this research is to develop and test an intelligent multimodal model based on transformer architecture to improve the accuracy and robustness of product matching in digital marketplaces. The authors developed a model integrating textual, visual and tabular attributes which enables us to identify similar products, find competitive offers, detect duplicates and perform product clustering and segmentation in a more effective manner. The proposed approach is based on the self-attention mechanism which enables contextual-semantic relations modeling of various-nature data. In order to extract the vector representation of text descriptions, language models are applied, in particular the Sentence-BERT architecture; for the graphical component Vision Transformer is used; and tabular data are processed using specialized learning mechanisms based on TabTransformer structured data. The experiment we carried out demonstrated that the developed multimodal model efficiently solves the task of product matching in digital marketplaces in an environment of significant variability of product items and data heterogeneity. Additionally, the results suggest that the model can be adapted successfully for application in other product categories. The results obtained confirm the efficiency and expediency to apply the multimodal approach for digital marketplace product matching implementation. This allows the e-commerce market participants to significantly improve the quality of inventory management, increase pricing efficiency and strengthen their competitive advantages.
Downloads
References
Fletcher A., Ormosi P. L., Savani R. (2023) Recommender systems and supplier competition on platforms. Journal of Competition Law & Economics, vol. 19, no. 3, pp. 397–426. https://doi.org/10.1093/joclec/nhad009
Hussien F.T.A., Rahma A.M.S., Abdulwahab H.B. (2021) An e-commerce recommendation system based on dynamic analysis of customer behavior. Sustainability, vol. 13, no. 19, article 10786. https://doi.org/10.3390/su131910786
Chen F., Liu X., Proserpio D. et al. (2020) Studying product competition using representation learning. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘20), pp. 1261–1268. https://doi.org/10.1145/3397271.3401041
Hu S., Wei M. M., Cui S. (2023) The role of product and market information in an online marketplace. Production and Operations Management, vol. 32, no. 10, pp. 3100–3118. https://doi.org/10.1111/poms.14025
Cheung M., She J., Sun W., Zhou J. (2019) Detecting online counterfeit-goods seller using connection discovery. ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 15, no. 2, article 35. https://doi.org/10.1145/3311785
Sun J., Zhang X., Zhu Q. (2020) Counterfeiters in online marketplaces: Stealing your sales or sharing your costs. Journal of Retailing, vol. 96, no. 2, pp. 189–202. https://doi.org/10.1016/j.jretai.2019.07.002
Köpcke H., Thor A., Rahm E. (2010) Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment, vol. 3, nos. 1–2, pp. 484–493. https://doi.org/10.14778/1920841.1920904
Cohen W.W., Ravikumar P., Fienberg S.E. (2003) A Comparison of string distance metrics for name-matching tasks. Proceedings of Workshop on Information Integration (IJCAI-03), pp. 73–78.
Singh R., Meduri V.V., Elmagarmid A., et. al. (2017) Synthesizing entity matching rules by examples. Proceedings of the VLDB Endowment, vol. 11, no. 2, pp. 189–202. https://doi.org/10.14778/3149193.3149199
Wang J., Li G., Yu J.X, Feng J. (2011) Entity matching: How similar is similar. Proceedings of the VLDB Endowment, vol. 4, no. 10, pp. 622–633. https://doi.org/10.14778/2021017.2021020
Angermann H. (2022) TaxoMulti: Rule-based expert system to customize product taxonomies for multi-channel e-commerce. SN Computer Science, vol. 3, article 177. https://doi.org/10.1007/s42979-022-01070-8
Mao M., Chen S., Zhang F. et. al. (2021) Hybrid ecommerce recommendation model incorporating product taxonomy and folksonomy. Knowledge-Based Systems, vol. 214, article 106720. https://doi.org/10.1016/j.knosys.2020.106720
Aanen S. S., Vandic D., Frasincar F. (2015) Automated product taxonomy mapping in an e-commerce environment. Expert Systems with Applications, vol. 42, no. 3, pp. 1298–1313. https://doi.org/10.1016/j.eswa.2014.09.032
Ristoski P., Petrovski P., Mika P., Paulheim H. (2018) A machine learning approach for product matching and categorization: Use case: Enriching product ads with semantic structured data. Semantic Web, vol. 9, no. 5, pp. 707–728. https://doi.org/10.3233/SW-180300
Shah K., Kopru S., Ruvini J. D. (2018) Neural network based extreme classification and similarity models for product matching. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans – Louisiana, vol. 3, pp. 8–15. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-3002
Vaswani A., Shazeer N., Parmar N. et. al. (2017) Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, pp. 6000–6010. https://dl.acm.org/doi/pdf/10.5555/3295222.3295349
Zhang H., Shafiq M.O. (2024) Survey of transformers and towards ensemble learning using transformers for natural language processing. Journal of Big Data, vol. 11, article 25. https://doi.org/10.1186/s40537-023-00842-0
Mikolov T., Chen K., Corrado G., Dean J. (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781. https://doi.org/10.48550/arXiv.1301.3781
Pennington J., Socher R., Manning C. D. (2014) GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543.
He K., Zhang X., Ren S., Sun J. (2016) Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
Ba J.L., Kiros J.R., Hinton G.E. (2016) Layer normalization. arXiv:1607.06450. https://doi.org/10.48550/arXiv.1607.06450
Devlin J., Chang M. W., Lee K., Toutanova K. (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, vol. 1, pp. 4171–4186. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423
Reimers N., Gurevych I. (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3982–3992. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1410
Wu Z., Shen C., van den Hengel A. (2019) Wider or deeper: Revisiting the ResNet model for visual recognition. Pattern Recognition, vol. 90, pp. 119–133. https://doi.org/10.1016/j.patcog.2019.01.006
Tan M., Le Q. (2019) EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 6105–6114.
Dosovitskiy A., Beyer L., Kolesnikov A. et al. (2021) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Radford A., Kim J. W., Hallacy C. et. al. (2021) Learning transferable visual models from natural language supervision. Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 8748–8763.
Caron M., Touvron H., Misra I. et. al. (2021) Emerging properties in self-supervised vision transformers. arXiv:2104.14294. https://doi.org/10.48550/arXiv.2104.14294
Huang X., Khetan A., Cvitkovic M. et. al. (2020) TabTransformer: Tabular data modeling using contextual embeddings. arXiv:2012.06678. https://doi.org/10.48550/arXiv.2012.06678
Gorishniy Y., Rubachev I., Khrulkov V., et. al. (2021) Revisiting deep learning models for tabular data. Proceedings of the 35th International Conference on Neural Information Processing Systems (NIPS’21), article 1447, pp. 18932–18943.