recognition and classification model of music genres and

8
Research Article Recognition and Classification Model of Music Genres and Chinese Traditional Musical Instruments Based on Deep Neural Networks Ke Xu School of Art and Design, Qingdao University of Technology, Qingdao 266033, Shandong, China Correspondence should be addressed to Ke Xu; [email protected] Received 19 April 2021; Revised 31 May 2021; Accepted 7 June 2021; Published 22 June 2021 Academic Editor: Shah Nazir Copyright © 2021 Ke Xu. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e teaching of ideological and political theory courses and daily ideological and political education are two important parts of education for college students. With the iterative update of information technology, the individualized development of students, and the reform and innovation of ideological and political education, higher goals and requirements have been put forward for ideological and political education. Some universities have developed new paths in the teaching model, but they have not considered the evaluation module and paid little attention to their own development. ey only paid attention to the fact that it injected fresh blood into the reform of education model and ideological education but ignored the improvement of their own quality. erefore, with these limitations, the learning effect is not satisfactory. Keeping in view these issues, this article defines the concept of deep learning and ideological and political education of college students as the starting point and then analyzes the new precise and personalized concepts, new forms of intelligent teaching and evaluation, and new models of intelligent learning that deep learning brings to college students’ ideological and political education. is is a new path of intelligent linkage with the subject, object, and mediator. It can deepen the reform of the education and teaching mode of individualization, accuracy, interactivity, and vividness of college students’ ideological and political education and improve the evaluation and management of college students’ ideological and political education. e experimental results of the study showed the effectiveness of the proposed study. 1. Introduction Music is an abstract art that uses sound as a means of expression to reflect human emotions in real life [1]. Music, as an important component of human spiritual life, has occupied an important position in human daily life. Music can improve concentration, relieve people’s pressure on work and study, and be good for physical and mental health [2]; music can bring people aural pleasure and spiritual enjoyment[3],helpgetridofbademotionssuchassadness, loneliness, and sadness, and make people full of energy and passion. With the rapid development of Internet tech- nology and digital multimedia technology, digital media resources represented by audio and video have obtained good transmission channels and convenient storage media, and digital music resources and Internet music entertainment consumer users have shown explosive growth. e era of digital music has arrived. Music information extraction [4] has become a popular research direction in the field of computer science. Music genre classification is an important research content in the field of music information extraction. Music genre is a notable label that distinguishes music, and it is also the category that listeners pay most attention to and retrieve the most. In the past, the classification of music genres mostly used manual labeling methods. Manual labeling is to ask professionals with a professional background in music and higher musical literacy to label music works by category. With the continuous emergence of music creation and online uploads, the digital music resource library on the Internet has become increasingly large, and manual labeling methods have gradually failed to meet the needs. To classify a Hindawi Scientific Programming Volume 2021, Article ID 2348494, 8 pages https://doi.org/10.1155/2021/2348494

Upload: others

Post on 10-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recognition and Classification Model of Music Genres and

Research ArticleRecognition and Classification Model of Music Genres andChinese Traditional Musical Instruments Based on DeepNeural Networks

Ke Xu

School of Art and Design Qingdao University of Technology Qingdao 266033 Shandong China

Correspondence should be addressed to Ke Xu xukeqtecheducn

Received 19 April 2021 Revised 31 May 2021 Accepted 7 June 2021 Published 22 June 2021

Academic Editor Shah Nazir

Copyright copy 2021 Ke Xu -is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

-e teaching of ideological and political theory courses and daily ideological and political education are two important parts ofeducation for college students With the iterative update of information technology the individualized development of students andthe reform and innovation of ideological and political education higher goals and requirements have been put forward for ideologicaland political education Some universities have developed new paths in the teaching model but they have not considered theevaluation module and paid little attention to their own development-ey only paid attention to the fact that it injected fresh bloodinto the reform of education model and ideological education but ignored the improvement of their own quality -erefore withthese limitations the learning effect is not satisfactory Keeping in view these issues this article defines the concept of deep learningand ideological and political education of college students as the starting point and then analyzes the new precise and personalizedconcepts new forms of intelligent teaching and evaluation and newmodels of intelligent learning that deep learning brings to collegestudentsrsquo ideological and political education -is is a new path of intelligent linkage with the subject object and mediator It candeepen the reform of the education and teaching mode of individualization accuracy interactivity and vividness of college studentsrsquoideological and political education and improve the evaluation and management of college studentsrsquo ideological and politicaleducation -e experimental results of the study showed the effectiveness of the proposed study

1 Introduction

Music is an abstract art that uses sound as a means ofexpression to reflect human emotions in real life [1] Musicas an important component of human spiritual life hasoccupied an important position in human daily life Musiccan improve concentration relieve peoplersquos pressure onwork and study and be good for physical and mental health[2] music can bring people aural pleasure and spiritualenjoyment [3] help get rid of bad emotions such as sadnessloneliness and sadness and make people full of energy andpassion With the rapid development of Internet tech-nology and digital multimedia technology digital mediaresources represented by audio and video have obtainedgood transmission channels and convenient storage mediaand digital music resources and Internet music entertainment

consumer users have shown explosive growth -e era ofdigital music has arrived

Music information extraction [4] has become a popularresearch direction in the field of computer science Musicgenre classification is an important research content in thefield of music information extraction Music genre is anotable label that distinguishes music and it is also thecategory that listeners pay most attention to and retrieve themost In the past the classification of music genres mostlyused manual labeling methods Manual labeling is to askprofessionals with a professional background in music andhigher musical literacy to label music works by categoryWith the continuous emergence of music creation andonline uploads the digital music resource library on theInternet has become increasingly large and manual labelingmethods have gradually failed to meet the needs To classify a

HindawiScientific ProgrammingVolume 2021 Article ID 2348494 8 pageshttpsdoiorg10115520212348494

large digital music resource library if manual labeling isused it will consume a lot of manpower and time and thelabeling results are more subjective and the labeling stan-dards cannot be completely unified which is limited bydifferent professionals who label music -erefore the au-tomatic classification of music [5ndash7] has gradually become aresearch hotspot for researchers -e automatic classifica-tion of music genres can effectively solve the problem of highcost and time-consuming human labeling -rough thealgorithm a unified classification standard can be formu-lated and the algorithm can be continuously optimized anda highly accurate and objective classification result can beobtained

Due to the limited application of manual extraction ofmusic features [8] the robustness is poor and it is difficult todescribe the deep features and timing characteristics ofmusic Moreover in the current music genre classificationtasks traditional machine learning classifiers are mainlyused including BP neural networks support vector ma-chines and nearest neighbor algorithm classifiers Due to itsshallow structure the classifier limits the learning of musicfeatures and it is difficult to extract more effective features torepresent music which affects the accuracy of classificationIn recent years deep neural networks [9ndash12] have achievedgood results in natural language processing computer vision[13ndash16] and other research fields -e deep neural networkmodel can automatically learn deeper features from theshallow features and can reflect the local relevance of theinput data Deep learning provides a new solution for theautomatic classification of music

-erefore this study first studied the music genre rec-ognition [17 18] and classification algorithm [19 20] basedon deep neural network and improved the algorithmCompared with the classic algorithm that directly extractsthe acoustic features or music features of music and trainswith a classifier to obtain the recognition and classificationresults this algorithm improves the accuracy of the recog-nition and classification of music genres At the same timefor the recognition and classification of musical instrumentsthis study proposes a Chinese traditional musical instrumentrecognition and classification algorithm based on the deepbelief network in deep learning -e deep belief network isused in the feature extraction task of traditional Chinesemusical instrument music [21] which reduces the work ofmanual extraction and identification of features At the sametime the recognition and classification effect has also beenimproved compared with the classic algorithm -e fol-lowing are the main innovation points of this study

(i) Combining Bi-GRU and attention mechanism anovel music genre classification model is proposedwhich can learn more significant music featuresthereby improving the accuracy of classification

(ii) A Chinese traditional musical instrument recogni-tion and classification algorithm was proposed basedon a deep belief network -e deep belief network isused in the feature extraction task of Chinese tra-ditional musical instrument music which limitedthe recognition and classification effect

-is study is structured as follows Section 2 shows thebackground of the study -e methodology section of thestudy is given in Section 3 with details in the subsectionsSection 4 briefly explains the experiments and results of thestudy performed-e study is ended in Section 5 which is theconclusion

2 Background

With the rising and advancements in information tech-nology the individualized development of students and thereform and innovation of ideological and political educa-tion higher goals and requirements have been put forwardfor ideological and political education -e following are thedetails of this section

21 Blues It originated from the amateur music of poorblack slaves in the south of the United States in the past Ithad no accompaniment but a solo singing with emotionalcontent and later combined with the European chordstructure to formmusic of singing and guitar alternately-eblues are based on the pentatonic scale which is composedof five scales arranged in pure fifths

22 Classical It is the traditional musical art of Westernmusic and it is music created under the background ofmainstream European culture -e most prominent featureof classical music is that its works generally use notation torecord the score so that the rhythm and pitch can berecorded in detail and it is also conducive to the directcoordination of multiple performers Many types of musicalinstruments are used in classical music including wood-wind brass percussion keyboard bowed and pluckedstringed instruments

23 Jazz It originated from the blues combining and ab-sorbing classical music folk music and other musical styleson the basis of African music traditions and graduallyformed todayrsquos diverse jazz music

24 Country It originated in the southern United States Itis a kind of popular music with ethnic characteristics -emain characteristics of country music are its simple tunesteady rhythm being mainly narrative and a strong localflavor mostly in the form of ballads with body two-part orthree-part form Country music is mostly solo or choruswith harmonica guitar violin and other accompaniment-e themes of country music are generally love country lifecowboy humor family God and country

25 Rock It originated in the mid-1950s and was developedunder the influence of blues and country music It ischaracterized by prominent vocals and played with guitarbass and drum accompaniment and keyboard instrumentssuch as electronic organs organs and pianos are often usedRock music has a strong beat centered on various guitarsounds

2 Scientific Programming

26Metal It is a kind of rockmusic which was developed inBritain and the United States in its early days Metal musichas the characteristics of high explosive power weight andspeed Its weight is reflected in the low scale of electricguitars and point bass -e speed is reflected in the beat thebeat of metal music can reach more than 200 BPM and thebeat range of general pop music is only 80ndash130 BPM -ecore instruments of metal music are electric guitar electricbass and drums which control the rhythm and melody

27 Disco It is a kind of electronic music which originatedfrom African American folk dance and jazz dance Inrhythm the characteristics of rock music jazz and LatinAmerican music are mixed As ballroom music disco ischaracterized by a strong sense of rhythm arranged by livelystring music Disco is generally 44 shots and every shot isstrong about 120 BPM

28 Pop It originated in Britain and the United States in themid-1950s Popular music is eclectic often borrowing ele-ments of other styles of music But pop music also has itscore elements its structure is relatively short usually aboutthree minutes

29 Hip-Hop It originated in New York USA when it waspopular among African Americans and neighborhoodgatherings Hip-hop consists of two main components rapand DJing -e performer sings in the way of saying wordsaccording to the rhythm of the instrument or synthesis

210Reggae It is derived from the popular music of Ska andRock Steady which evolved in Jamaica It is the general termfor various dance music in Jamaica

211 Electronic It is a kind of music made using electronicmusical instruments and electronic technology In electronicmusic a variety of genres are often combined and they aremodulated into unique timbres through electronic musicalinstruments and synthesizers to form a unique styleCommonly used electronic musical instruments includeelectric guitars electric basses synthesizers and electronicorgans

212 Punk It is simple rock music derived from GarageRock and pre-punk rock consisting of three chords and asimple main melody

3 Methodology

31 Feature Sequence Extraction of Music Segment -eprocess of extracting the feature sequence of the musicsegment is shown in Figure 1 First analyze the music file toextract the note feature matrix then perform the mainmelody extraction and segment division based on the notefeature matrix and then combine the time points of thesegment division and the main melody of the music to

extract the feature vector based on the main melody for eachsegment It composes the feature sequence of the musicsegment and serves as the input of the later classifier

311 MainMelody Extraction Listening to a piece of musicthe perceptual information that people mainly obtain fromthe sense of hearing is the main melody of the music -emain melody is the soul of music and interprets the theme ofmusic -e main melody of music is the key to musicclassification and an important basis for distinguishingmusic genres -is section studies and implements a fast andeffective Skyline main melody extraction algorithm forextracting the main melody from music files

We define the relevant attributes of the notes Let ni andni+1 denote two adjacent notes si and si+1 denote the starttime of these two notes respectively pi and pi+1 denote thepitch of these two notes respectively and ei and ei+1 denotethe end of these two notes respectively

-e input of the Skyline algorithm is the note featurematrix -e following describe the specific steps of theSkyline algorithm

(1) Arrange the note vectors in the note feature matrix inascending order of their starting time and removethe note vectors of channel 10 percussioninstruments

(2) Traverse the note feature matrix For note vectorswith the same starting time keep the note vectorwith the highest pitch and discard other note vectors

(3) For two adjacent note vectors ni and ni+1 if si lt si+1ei gt si+1 and pi ltpi+1 are satisfied let ei si+1

312 Music Segment Division Firstly the sound file issampled framed and coded and the piano roll matrix isused to model the music playing then the similarity betweenany two frames is calculated by Euclidean distance togenerate a self-similar matrix and a special Gaussian isconstructed -e convolution kernel convolves along thediagonal of the self-similar matrix to generate a noveltycurve -e novelty curve is a time-series curve describing thechanges in musical performance Finally the peak points are

Music file

Music feature matrix

Main melody extraction Music segment division

Music segment feature sequence

Figure 1 -e specific process of extracting the feature sequence ofmusic segment

Scientific Programming 3

extracted from the novelty curve and segmented -e coreidea of the algorithm is to estimate the instantaneous musicnovelty by analyzing the local self-similarity of musicplaying at a significant novel point in time the music playedin the past or the future at that point in time is within a shortperiod of time It has a high degree of self-similarity andthere is a fairly low cross-similarity between the past and thefuture at this point in time Simply put in a short period oftime before this point in time the musical style of playing issimilar After this point in time the musical composition ischanged to another style of playing -e artistic style ofplaying music has undergone major changes and theemotions and themes expressed have also changed so themusic segments can be divided

32 Attention Mechanism When humans are observingvisual images the human brain quickly scans the images thatappear in the field of view and controls the line of sight to fallon the area that you want to focus on -e human brain willallocate different attention to observation according todifferent areas in the field of view image For the areas thatthe field of view focuses on the human brain will allocatemore attention resources to observe carefully to obtain moredetails of the target area Information will be ignored forother useless areas of view-e attentionmechanism [22 23]in deep learning is similar to this It is also a mechanism ofattention resource allocation It can filter out key infor-mation that is more conducive to deep learning tasks from alarge amount of information thereby improving the per-formance of deep learning tasks such as detection [24]prediction [25] and recognition [26 27]

Figure 2 shows a simplified schematic diagram of theencoding and decoding model that introduces the attentionmechanism-e codec model with the attention mechanismcan effectively improve its limitations -e encoder nolonger converts all the information of the input sequenceinto a fixed-length context vector For different outputs itwill focus on finding significant useful information related tothe current output from the input data and calculate dif-ferent context vectors Allow the model to better learn thealignment of input and output

Taken separately the attention mechanism can be un-derstood as a query calculation process Figure 3 is a gen-eralized structure diagram of the attention mechanism x isthe input sequence data and y is the query First input ycalculate the attention score of y and each input xi throughthe function f and then map the probability distributionbetween 0 and 1 through the softmax function Finally theprobability distribution and each input are correspondinglyweighted -en calculate the output value of the attentionmechanism

-e calculation equation of the attention mechanism isas follows

attention 1113944n

i1softmax f xi y( 1113857( 1113857lowast xi (1)

33 Classification Model Compared with 2D convolutionalnetworks 3D convolutional networks can better model timeinformation through 3D convolution and 3D pooling op-erations In a two-dimensional convolutional network theprocess of convolution and pooling is completed in space Ina three-dimensional convolutional network they perform intime and space In the introduction of 3D convolutionalnetwork above it was proposed that images should beoutput when 2D convolutional network is processing im-ages and images should also be output when multipleimages (which are regarded as different channels) are op-erated -erefore the time information of input data will belost after each convolution operation in the two-dimensionalconvolutional network Only three-dimensional convolu-tion can preserve the time information of the input signaland produce the output quantity -e same principle can beapplied to 2D pooling and 3D pooling

Figure 4 is the network model structure diagram of theclassification of music genres in this article-e classificationnetwork model designed in this study can be divided intothree parts according to different functions namely theinput layer the hidden layer and the output layer -e inputof the input layer is a sequence of musical segment featuresextracted frommusic -e main function of the hidden layeris to learn the final feature representation of music -ehidden layer is composed of Bi-GRU attention mechanismand fully connected layer

In the attention mechanism this article uses the fol-lowing formula to calculate the attention score corre-sponding to each feature vector

et tanh WHt + b( 1113857 (2)

where ei is the attention score of the feature vector Ht at thetime t in H

Decoder

y2y1 ym

Encoder

c1 c2 cm

x1 x2 xn

Figure 2 Schematic diagram of the codec model with the intro-duction of the attention mechanism

4 Scientific Programming

-en the calculated attention score is mapped to thevalue range (0 1) through the softmax function and theattention probability distribution of each feature vector isobtained

ai softmax ei( 1113857 exp ei( 1113857

1113936Lk1 exp ek( 1113857

(3)

-e calculated attention probability distribution andeach feature vector of the feature representation H areweighted and summed to obtain the feature vector repre-sentation v of the music file

v 1113944L

i1aiHi (4)

-e combination of Bi-GRL and the attention mecha-nism network allows the model to effectively learn the valueinformation of the different weights of the genre classifi-cation of each piece of music including forward andbackward value information and more accurately from theinput music -e useful information learned from thesegment feature sequence is helpful to improve the accuracyof classification

At the end of the hidden layer the music feature vectorpowder extracted by the attention mechanism of the fullyconnected layer is used to calculate the confidence score ofeach genre -e output layer uses the softmax function to

map the output of the hidden layer to the probability of eachgenre label to which the music file belongs Finally the genretag with the highest probability is selected as the genre tag ofthe music file

34 InstrumentRecognition -is study uses the music signalcharacteristics of traditional Chinese musical instruments toidentify musical instruments We regard the 2-second-segment musical instrument music signal as a sample usethe MFCC of the sample as the input feature and input itinto a deep belief network with H hidden layers (as shown inFigure 5) and through the output layer the softmax layeroutputs the predicted label of the musical instrument

4 Experiments and Results

41 Cross-Entropy Cost Function -e essence of neuralnetwork training is to continuously iterate to minimize theloss function and the process of model parameter conver-gence -is study uses the cross-entropy loss function todescribe the difference between the predicted value outputby the network model and the target expected value -eoutput layer of the network model calculates the probabilityof each genre through the softmax function and then cal-culates the cross-entropy loss function -e definition of thecross-entropy loss function is as follows

h1 h2 h3 h4

x1 x2 x3 x4

h0 c hprime3hprime2hprime1

y1 y2 y1

Figure 3 Attention mechanism

1h11hn

1h2x1

x2

xL

OutputA

ttent

ion

Softm

ax

Max

-po

olin

gA

vera

ge-

pool

ing

FC la

yer

FC la

yer

2h12hn

2h2

nh1nhn

nh2

Figure 4 Classification model

Scientific Programming 5

C 1n

1113944x

y ln(a) +(1 minus y)ln(1 minus a) (5)

where C represents the loss n is the number of samples x isthe input sample a is the output predicted value of thenetwork model input x and a is the target expected value ofthe network model input x

42Adam In the process of training the network model thesize of the learning rate has an important impact on theimprovement of the modelrsquos performance and the learningrate is one of the hyperparameters that are difficult to set-is article uses Adam optimization algorithm as the op-timization method of the network model Adam algorithm isan adaptive learning rate algorithm which has excellentperformance in practice and is widely used -e Adam al-gorithm designs independent adaptive learning rates fordifferent parameters by calculating the first-order momentestimation and the second-order moment estimation of thegradient -e calculation formula for adjusting the networkparameters is as follows

vt k1vtminus1 + 1 minus k1( 1113857gt

st k2stminus1 + 1 minus k2( 1113857g2t

1113954vt vt

1 minus kt1

1113954st st

1 minus kt2

Δθ α1113954vt

1113954st + ε1113968

(6)

43 Evaluation Environment In order to carry out the ex-periment smoothly we prepare the experimental data inadvance -is article downloads genre-labeled MIDI musicfiles from the Internet dedicated to sharingmusic constructsa real data set and collects a total of 2000 music files -ereare 5 genres in the data set including classical countrydance folk and metal -e number of 1VIIDI music files ofeach genre is shown in Table 1

44 Experimental Results A special Gaussian convolutionkernel is used to convolve along the diagonal of the self-similar matrix to obtain the novelty curve After smoothingthe novelty curve the peak point is extracted from it andused as the time point for segment division -e smoothednovelty curve and the extracted peak points are shown inFigure 6

According to the experimental settings 6 groups ofcomparative experiments were carried out -e brief de-scription of the experimental settings is shown in Table 2 Bycomparing Experiment 1 and Experiment 2 it can beconcluded that the classification effect of the extractedfeature set input to BP neural network for classificationexperiment is far lower than the classification effect of the 11features explored and selected in Experiment 2 according tothe genre classification task input to BP neural network Itcan be seen that the extracted music features are not suitablefor the classification task of music genres in this study whichindicates that feature extraction is not easy to be universaland feature sets usually need to be constructed according tothe actual classification task Meanwhile the validity of thefeature sets selected in this study in the classification task ofmusic genres is verified

Comparing Experiment 2 and Experiment 3 we canobtain that in Experiment 3 we divide the MIDI file intosections use the section as the analysis unit extract thefeatures of the section with the same feature combinationform the section feature sequence and input it into theclassification network Bi-GRL can learn the deeper ex-pression of music about time sequence and semantic in-formation from the input music segment feature sequencewhich can effectively improve the accuracy of music clas-sification and the classification effect is better than thetraditional 1VIIDI music classification method based on BPneural network

Comparing Experiment 4 Experiment 5 and Experi-ment 6 the music segment is divided into different methodsand the extracted music segment feature sequence will affectthe final classification performance In Experiment 5 andExperiment 6 the music was divided into segments withequal time intervals of 5 seconds and 10 seconds and thefinal classification accuracy of the experiment was lowerthan that obtained in Experiment 4 using the segment di-vision method introduced in this article -e possible reasonis that the development of music melody is a process ofrepetition and change and there is a certain transitionboundary -e division of music with equal duration doesnot take into account this music characteristic and theextracted segment feature sequence cannot describe the

RBM3

RBM2

RBM1

Figure 5 DBM-based recognition and classification networkstructure diagram of traditional Chinese musical instruments

6 Scientific Programming

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 2: Recognition and Classification Model of Music Genres and

large digital music resource library if manual labeling isused it will consume a lot of manpower and time and thelabeling results are more subjective and the labeling stan-dards cannot be completely unified which is limited bydifferent professionals who label music -erefore the au-tomatic classification of music [5ndash7] has gradually become aresearch hotspot for researchers -e automatic classifica-tion of music genres can effectively solve the problem of highcost and time-consuming human labeling -rough thealgorithm a unified classification standard can be formu-lated and the algorithm can be continuously optimized anda highly accurate and objective classification result can beobtained

Due to the limited application of manual extraction ofmusic features [8] the robustness is poor and it is difficult todescribe the deep features and timing characteristics ofmusic Moreover in the current music genre classificationtasks traditional machine learning classifiers are mainlyused including BP neural networks support vector ma-chines and nearest neighbor algorithm classifiers Due to itsshallow structure the classifier limits the learning of musicfeatures and it is difficult to extract more effective features torepresent music which affects the accuracy of classificationIn recent years deep neural networks [9ndash12] have achievedgood results in natural language processing computer vision[13ndash16] and other research fields -e deep neural networkmodel can automatically learn deeper features from theshallow features and can reflect the local relevance of theinput data Deep learning provides a new solution for theautomatic classification of music

-erefore this study first studied the music genre rec-ognition [17 18] and classification algorithm [19 20] basedon deep neural network and improved the algorithmCompared with the classic algorithm that directly extractsthe acoustic features or music features of music and trainswith a classifier to obtain the recognition and classificationresults this algorithm improves the accuracy of the recog-nition and classification of music genres At the same timefor the recognition and classification of musical instrumentsthis study proposes a Chinese traditional musical instrumentrecognition and classification algorithm based on the deepbelief network in deep learning -e deep belief network isused in the feature extraction task of traditional Chinesemusical instrument music [21] which reduces the work ofmanual extraction and identification of features At the sametime the recognition and classification effect has also beenimproved compared with the classic algorithm -e fol-lowing are the main innovation points of this study

(i) Combining Bi-GRU and attention mechanism anovel music genre classification model is proposedwhich can learn more significant music featuresthereby improving the accuracy of classification

(ii) A Chinese traditional musical instrument recogni-tion and classification algorithm was proposed basedon a deep belief network -e deep belief network isused in the feature extraction task of Chinese tra-ditional musical instrument music which limitedthe recognition and classification effect

-is study is structured as follows Section 2 shows thebackground of the study -e methodology section of thestudy is given in Section 3 with details in the subsectionsSection 4 briefly explains the experiments and results of thestudy performed-e study is ended in Section 5 which is theconclusion

2 Background

With the rising and advancements in information tech-nology the individualized development of students and thereform and innovation of ideological and political educa-tion higher goals and requirements have been put forwardfor ideological and political education -e following are thedetails of this section

21 Blues It originated from the amateur music of poorblack slaves in the south of the United States in the past Ithad no accompaniment but a solo singing with emotionalcontent and later combined with the European chordstructure to formmusic of singing and guitar alternately-eblues are based on the pentatonic scale which is composedof five scales arranged in pure fifths

22 Classical It is the traditional musical art of Westernmusic and it is music created under the background ofmainstream European culture -e most prominent featureof classical music is that its works generally use notation torecord the score so that the rhythm and pitch can berecorded in detail and it is also conducive to the directcoordination of multiple performers Many types of musicalinstruments are used in classical music including wood-wind brass percussion keyboard bowed and pluckedstringed instruments

23 Jazz It originated from the blues combining and ab-sorbing classical music folk music and other musical styleson the basis of African music traditions and graduallyformed todayrsquos diverse jazz music

24 Country It originated in the southern United States Itis a kind of popular music with ethnic characteristics -emain characteristics of country music are its simple tunesteady rhythm being mainly narrative and a strong localflavor mostly in the form of ballads with body two-part orthree-part form Country music is mostly solo or choruswith harmonica guitar violin and other accompaniment-e themes of country music are generally love country lifecowboy humor family God and country

25 Rock It originated in the mid-1950s and was developedunder the influence of blues and country music It ischaracterized by prominent vocals and played with guitarbass and drum accompaniment and keyboard instrumentssuch as electronic organs organs and pianos are often usedRock music has a strong beat centered on various guitarsounds

2 Scientific Programming

26Metal It is a kind of rockmusic which was developed inBritain and the United States in its early days Metal musichas the characteristics of high explosive power weight andspeed Its weight is reflected in the low scale of electricguitars and point bass -e speed is reflected in the beat thebeat of metal music can reach more than 200 BPM and thebeat range of general pop music is only 80ndash130 BPM -ecore instruments of metal music are electric guitar electricbass and drums which control the rhythm and melody

27 Disco It is a kind of electronic music which originatedfrom African American folk dance and jazz dance Inrhythm the characteristics of rock music jazz and LatinAmerican music are mixed As ballroom music disco ischaracterized by a strong sense of rhythm arranged by livelystring music Disco is generally 44 shots and every shot isstrong about 120 BPM

28 Pop It originated in Britain and the United States in themid-1950s Popular music is eclectic often borrowing ele-ments of other styles of music But pop music also has itscore elements its structure is relatively short usually aboutthree minutes

29 Hip-Hop It originated in New York USA when it waspopular among African Americans and neighborhoodgatherings Hip-hop consists of two main components rapand DJing -e performer sings in the way of saying wordsaccording to the rhythm of the instrument or synthesis

210Reggae It is derived from the popular music of Ska andRock Steady which evolved in Jamaica It is the general termfor various dance music in Jamaica

211 Electronic It is a kind of music made using electronicmusical instruments and electronic technology In electronicmusic a variety of genres are often combined and they aremodulated into unique timbres through electronic musicalinstruments and synthesizers to form a unique styleCommonly used electronic musical instruments includeelectric guitars electric basses synthesizers and electronicorgans

212 Punk It is simple rock music derived from GarageRock and pre-punk rock consisting of three chords and asimple main melody

3 Methodology

31 Feature Sequence Extraction of Music Segment -eprocess of extracting the feature sequence of the musicsegment is shown in Figure 1 First analyze the music file toextract the note feature matrix then perform the mainmelody extraction and segment division based on the notefeature matrix and then combine the time points of thesegment division and the main melody of the music to

extract the feature vector based on the main melody for eachsegment It composes the feature sequence of the musicsegment and serves as the input of the later classifier

311 MainMelody Extraction Listening to a piece of musicthe perceptual information that people mainly obtain fromthe sense of hearing is the main melody of the music -emain melody is the soul of music and interprets the theme ofmusic -e main melody of music is the key to musicclassification and an important basis for distinguishingmusic genres -is section studies and implements a fast andeffective Skyline main melody extraction algorithm forextracting the main melody from music files

We define the relevant attributes of the notes Let ni andni+1 denote two adjacent notes si and si+1 denote the starttime of these two notes respectively pi and pi+1 denote thepitch of these two notes respectively and ei and ei+1 denotethe end of these two notes respectively

-e input of the Skyline algorithm is the note featurematrix -e following describe the specific steps of theSkyline algorithm

(1) Arrange the note vectors in the note feature matrix inascending order of their starting time and removethe note vectors of channel 10 percussioninstruments

(2) Traverse the note feature matrix For note vectorswith the same starting time keep the note vectorwith the highest pitch and discard other note vectors

(3) For two adjacent note vectors ni and ni+1 if si lt si+1ei gt si+1 and pi ltpi+1 are satisfied let ei si+1

312 Music Segment Division Firstly the sound file issampled framed and coded and the piano roll matrix isused to model the music playing then the similarity betweenany two frames is calculated by Euclidean distance togenerate a self-similar matrix and a special Gaussian isconstructed -e convolution kernel convolves along thediagonal of the self-similar matrix to generate a noveltycurve -e novelty curve is a time-series curve describing thechanges in musical performance Finally the peak points are

Music file

Music feature matrix

Main melody extraction Music segment division

Music segment feature sequence

Figure 1 -e specific process of extracting the feature sequence ofmusic segment

Scientific Programming 3

extracted from the novelty curve and segmented -e coreidea of the algorithm is to estimate the instantaneous musicnovelty by analyzing the local self-similarity of musicplaying at a significant novel point in time the music playedin the past or the future at that point in time is within a shortperiod of time It has a high degree of self-similarity andthere is a fairly low cross-similarity between the past and thefuture at this point in time Simply put in a short period oftime before this point in time the musical style of playing issimilar After this point in time the musical composition ischanged to another style of playing -e artistic style ofplaying music has undergone major changes and theemotions and themes expressed have also changed so themusic segments can be divided

32 Attention Mechanism When humans are observingvisual images the human brain quickly scans the images thatappear in the field of view and controls the line of sight to fallon the area that you want to focus on -e human brain willallocate different attention to observation according todifferent areas in the field of view image For the areas thatthe field of view focuses on the human brain will allocatemore attention resources to observe carefully to obtain moredetails of the target area Information will be ignored forother useless areas of view-e attentionmechanism [22 23]in deep learning is similar to this It is also a mechanism ofattention resource allocation It can filter out key infor-mation that is more conducive to deep learning tasks from alarge amount of information thereby improving the per-formance of deep learning tasks such as detection [24]prediction [25] and recognition [26 27]

Figure 2 shows a simplified schematic diagram of theencoding and decoding model that introduces the attentionmechanism-e codec model with the attention mechanismcan effectively improve its limitations -e encoder nolonger converts all the information of the input sequenceinto a fixed-length context vector For different outputs itwill focus on finding significant useful information related tothe current output from the input data and calculate dif-ferent context vectors Allow the model to better learn thealignment of input and output

Taken separately the attention mechanism can be un-derstood as a query calculation process Figure 3 is a gen-eralized structure diagram of the attention mechanism x isthe input sequence data and y is the query First input ycalculate the attention score of y and each input xi throughthe function f and then map the probability distributionbetween 0 and 1 through the softmax function Finally theprobability distribution and each input are correspondinglyweighted -en calculate the output value of the attentionmechanism

-e calculation equation of the attention mechanism isas follows

attention 1113944n

i1softmax f xi y( 1113857( 1113857lowast xi (1)

33 Classification Model Compared with 2D convolutionalnetworks 3D convolutional networks can better model timeinformation through 3D convolution and 3D pooling op-erations In a two-dimensional convolutional network theprocess of convolution and pooling is completed in space Ina three-dimensional convolutional network they perform intime and space In the introduction of 3D convolutionalnetwork above it was proposed that images should beoutput when 2D convolutional network is processing im-ages and images should also be output when multipleimages (which are regarded as different channels) are op-erated -erefore the time information of input data will belost after each convolution operation in the two-dimensionalconvolutional network Only three-dimensional convolu-tion can preserve the time information of the input signaland produce the output quantity -e same principle can beapplied to 2D pooling and 3D pooling

Figure 4 is the network model structure diagram of theclassification of music genres in this article-e classificationnetwork model designed in this study can be divided intothree parts according to different functions namely theinput layer the hidden layer and the output layer -e inputof the input layer is a sequence of musical segment featuresextracted frommusic -e main function of the hidden layeris to learn the final feature representation of music -ehidden layer is composed of Bi-GRU attention mechanismand fully connected layer

In the attention mechanism this article uses the fol-lowing formula to calculate the attention score corre-sponding to each feature vector

et tanh WHt + b( 1113857 (2)

where ei is the attention score of the feature vector Ht at thetime t in H

Decoder

y2y1 ym

Encoder

c1 c2 cm

x1 x2 xn

Figure 2 Schematic diagram of the codec model with the intro-duction of the attention mechanism

4 Scientific Programming

-en the calculated attention score is mapped to thevalue range (0 1) through the softmax function and theattention probability distribution of each feature vector isobtained

ai softmax ei( 1113857 exp ei( 1113857

1113936Lk1 exp ek( 1113857

(3)

-e calculated attention probability distribution andeach feature vector of the feature representation H areweighted and summed to obtain the feature vector repre-sentation v of the music file

v 1113944L

i1aiHi (4)

-e combination of Bi-GRL and the attention mecha-nism network allows the model to effectively learn the valueinformation of the different weights of the genre classifi-cation of each piece of music including forward andbackward value information and more accurately from theinput music -e useful information learned from thesegment feature sequence is helpful to improve the accuracyof classification

At the end of the hidden layer the music feature vectorpowder extracted by the attention mechanism of the fullyconnected layer is used to calculate the confidence score ofeach genre -e output layer uses the softmax function to

map the output of the hidden layer to the probability of eachgenre label to which the music file belongs Finally the genretag with the highest probability is selected as the genre tag ofthe music file

34 InstrumentRecognition -is study uses the music signalcharacteristics of traditional Chinese musical instruments toidentify musical instruments We regard the 2-second-segment musical instrument music signal as a sample usethe MFCC of the sample as the input feature and input itinto a deep belief network with H hidden layers (as shown inFigure 5) and through the output layer the softmax layeroutputs the predicted label of the musical instrument

4 Experiments and Results

41 Cross-Entropy Cost Function -e essence of neuralnetwork training is to continuously iterate to minimize theloss function and the process of model parameter conver-gence -is study uses the cross-entropy loss function todescribe the difference between the predicted value outputby the network model and the target expected value -eoutput layer of the network model calculates the probabilityof each genre through the softmax function and then cal-culates the cross-entropy loss function -e definition of thecross-entropy loss function is as follows

h1 h2 h3 h4

x1 x2 x3 x4

h0 c hprime3hprime2hprime1

y1 y2 y1

Figure 3 Attention mechanism

1h11hn

1h2x1

x2

xL

OutputA

ttent

ion

Softm

ax

Max

-po

olin

gA

vera

ge-

pool

ing

FC la

yer

FC la

yer

2h12hn

2h2

nh1nhn

nh2

Figure 4 Classification model

Scientific Programming 5

C 1n

1113944x

y ln(a) +(1 minus y)ln(1 minus a) (5)

where C represents the loss n is the number of samples x isthe input sample a is the output predicted value of thenetwork model input x and a is the target expected value ofthe network model input x

42Adam In the process of training the network model thesize of the learning rate has an important impact on theimprovement of the modelrsquos performance and the learningrate is one of the hyperparameters that are difficult to set-is article uses Adam optimization algorithm as the op-timization method of the network model Adam algorithm isan adaptive learning rate algorithm which has excellentperformance in practice and is widely used -e Adam al-gorithm designs independent adaptive learning rates fordifferent parameters by calculating the first-order momentestimation and the second-order moment estimation of thegradient -e calculation formula for adjusting the networkparameters is as follows

vt k1vtminus1 + 1 minus k1( 1113857gt

st k2stminus1 + 1 minus k2( 1113857g2t

1113954vt vt

1 minus kt1

1113954st st

1 minus kt2

Δθ α1113954vt

1113954st + ε1113968

(6)

43 Evaluation Environment In order to carry out the ex-periment smoothly we prepare the experimental data inadvance -is article downloads genre-labeled MIDI musicfiles from the Internet dedicated to sharingmusic constructsa real data set and collects a total of 2000 music files -ereare 5 genres in the data set including classical countrydance folk and metal -e number of 1VIIDI music files ofeach genre is shown in Table 1

44 Experimental Results A special Gaussian convolutionkernel is used to convolve along the diagonal of the self-similar matrix to obtain the novelty curve After smoothingthe novelty curve the peak point is extracted from it andused as the time point for segment division -e smoothednovelty curve and the extracted peak points are shown inFigure 6

According to the experimental settings 6 groups ofcomparative experiments were carried out -e brief de-scription of the experimental settings is shown in Table 2 Bycomparing Experiment 1 and Experiment 2 it can beconcluded that the classification effect of the extractedfeature set input to BP neural network for classificationexperiment is far lower than the classification effect of the 11features explored and selected in Experiment 2 according tothe genre classification task input to BP neural network Itcan be seen that the extracted music features are not suitablefor the classification task of music genres in this study whichindicates that feature extraction is not easy to be universaland feature sets usually need to be constructed according tothe actual classification task Meanwhile the validity of thefeature sets selected in this study in the classification task ofmusic genres is verified

Comparing Experiment 2 and Experiment 3 we canobtain that in Experiment 3 we divide the MIDI file intosections use the section as the analysis unit extract thefeatures of the section with the same feature combinationform the section feature sequence and input it into theclassification network Bi-GRL can learn the deeper ex-pression of music about time sequence and semantic in-formation from the input music segment feature sequencewhich can effectively improve the accuracy of music clas-sification and the classification effect is better than thetraditional 1VIIDI music classification method based on BPneural network

Comparing Experiment 4 Experiment 5 and Experi-ment 6 the music segment is divided into different methodsand the extracted music segment feature sequence will affectthe final classification performance In Experiment 5 andExperiment 6 the music was divided into segments withequal time intervals of 5 seconds and 10 seconds and thefinal classification accuracy of the experiment was lowerthan that obtained in Experiment 4 using the segment di-vision method introduced in this article -e possible reasonis that the development of music melody is a process ofrepetition and change and there is a certain transitionboundary -e division of music with equal duration doesnot take into account this music characteristic and theextracted segment feature sequence cannot describe the

RBM3

RBM2

RBM1

Figure 5 DBM-based recognition and classification networkstructure diagram of traditional Chinese musical instruments

6 Scientific Programming

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 3: Recognition and Classification Model of Music Genres and

26Metal It is a kind of rockmusic which was developed inBritain and the United States in its early days Metal musichas the characteristics of high explosive power weight andspeed Its weight is reflected in the low scale of electricguitars and point bass -e speed is reflected in the beat thebeat of metal music can reach more than 200 BPM and thebeat range of general pop music is only 80ndash130 BPM -ecore instruments of metal music are electric guitar electricbass and drums which control the rhythm and melody

27 Disco It is a kind of electronic music which originatedfrom African American folk dance and jazz dance Inrhythm the characteristics of rock music jazz and LatinAmerican music are mixed As ballroom music disco ischaracterized by a strong sense of rhythm arranged by livelystring music Disco is generally 44 shots and every shot isstrong about 120 BPM

28 Pop It originated in Britain and the United States in themid-1950s Popular music is eclectic often borrowing ele-ments of other styles of music But pop music also has itscore elements its structure is relatively short usually aboutthree minutes

29 Hip-Hop It originated in New York USA when it waspopular among African Americans and neighborhoodgatherings Hip-hop consists of two main components rapand DJing -e performer sings in the way of saying wordsaccording to the rhythm of the instrument or synthesis

210Reggae It is derived from the popular music of Ska andRock Steady which evolved in Jamaica It is the general termfor various dance music in Jamaica

211 Electronic It is a kind of music made using electronicmusical instruments and electronic technology In electronicmusic a variety of genres are often combined and they aremodulated into unique timbres through electronic musicalinstruments and synthesizers to form a unique styleCommonly used electronic musical instruments includeelectric guitars electric basses synthesizers and electronicorgans

212 Punk It is simple rock music derived from GarageRock and pre-punk rock consisting of three chords and asimple main melody

3 Methodology

31 Feature Sequence Extraction of Music Segment -eprocess of extracting the feature sequence of the musicsegment is shown in Figure 1 First analyze the music file toextract the note feature matrix then perform the mainmelody extraction and segment division based on the notefeature matrix and then combine the time points of thesegment division and the main melody of the music to

extract the feature vector based on the main melody for eachsegment It composes the feature sequence of the musicsegment and serves as the input of the later classifier

311 MainMelody Extraction Listening to a piece of musicthe perceptual information that people mainly obtain fromthe sense of hearing is the main melody of the music -emain melody is the soul of music and interprets the theme ofmusic -e main melody of music is the key to musicclassification and an important basis for distinguishingmusic genres -is section studies and implements a fast andeffective Skyline main melody extraction algorithm forextracting the main melody from music files

We define the relevant attributes of the notes Let ni andni+1 denote two adjacent notes si and si+1 denote the starttime of these two notes respectively pi and pi+1 denote thepitch of these two notes respectively and ei and ei+1 denotethe end of these two notes respectively

-e input of the Skyline algorithm is the note featurematrix -e following describe the specific steps of theSkyline algorithm

(1) Arrange the note vectors in the note feature matrix inascending order of their starting time and removethe note vectors of channel 10 percussioninstruments

(2) Traverse the note feature matrix For note vectorswith the same starting time keep the note vectorwith the highest pitch and discard other note vectors

(3) For two adjacent note vectors ni and ni+1 if si lt si+1ei gt si+1 and pi ltpi+1 are satisfied let ei si+1

312 Music Segment Division Firstly the sound file issampled framed and coded and the piano roll matrix isused to model the music playing then the similarity betweenany two frames is calculated by Euclidean distance togenerate a self-similar matrix and a special Gaussian isconstructed -e convolution kernel convolves along thediagonal of the self-similar matrix to generate a noveltycurve -e novelty curve is a time-series curve describing thechanges in musical performance Finally the peak points are

Music file

Music feature matrix

Main melody extraction Music segment division

Music segment feature sequence

Figure 1 -e specific process of extracting the feature sequence ofmusic segment

Scientific Programming 3

extracted from the novelty curve and segmented -e coreidea of the algorithm is to estimate the instantaneous musicnovelty by analyzing the local self-similarity of musicplaying at a significant novel point in time the music playedin the past or the future at that point in time is within a shortperiod of time It has a high degree of self-similarity andthere is a fairly low cross-similarity between the past and thefuture at this point in time Simply put in a short period oftime before this point in time the musical style of playing issimilar After this point in time the musical composition ischanged to another style of playing -e artistic style ofplaying music has undergone major changes and theemotions and themes expressed have also changed so themusic segments can be divided

32 Attention Mechanism When humans are observingvisual images the human brain quickly scans the images thatappear in the field of view and controls the line of sight to fallon the area that you want to focus on -e human brain willallocate different attention to observation according todifferent areas in the field of view image For the areas thatthe field of view focuses on the human brain will allocatemore attention resources to observe carefully to obtain moredetails of the target area Information will be ignored forother useless areas of view-e attentionmechanism [22 23]in deep learning is similar to this It is also a mechanism ofattention resource allocation It can filter out key infor-mation that is more conducive to deep learning tasks from alarge amount of information thereby improving the per-formance of deep learning tasks such as detection [24]prediction [25] and recognition [26 27]

Figure 2 shows a simplified schematic diagram of theencoding and decoding model that introduces the attentionmechanism-e codec model with the attention mechanismcan effectively improve its limitations -e encoder nolonger converts all the information of the input sequenceinto a fixed-length context vector For different outputs itwill focus on finding significant useful information related tothe current output from the input data and calculate dif-ferent context vectors Allow the model to better learn thealignment of input and output

Taken separately the attention mechanism can be un-derstood as a query calculation process Figure 3 is a gen-eralized structure diagram of the attention mechanism x isthe input sequence data and y is the query First input ycalculate the attention score of y and each input xi throughthe function f and then map the probability distributionbetween 0 and 1 through the softmax function Finally theprobability distribution and each input are correspondinglyweighted -en calculate the output value of the attentionmechanism

-e calculation equation of the attention mechanism isas follows

attention 1113944n

i1softmax f xi y( 1113857( 1113857lowast xi (1)

33 Classification Model Compared with 2D convolutionalnetworks 3D convolutional networks can better model timeinformation through 3D convolution and 3D pooling op-erations In a two-dimensional convolutional network theprocess of convolution and pooling is completed in space Ina three-dimensional convolutional network they perform intime and space In the introduction of 3D convolutionalnetwork above it was proposed that images should beoutput when 2D convolutional network is processing im-ages and images should also be output when multipleimages (which are regarded as different channels) are op-erated -erefore the time information of input data will belost after each convolution operation in the two-dimensionalconvolutional network Only three-dimensional convolu-tion can preserve the time information of the input signaland produce the output quantity -e same principle can beapplied to 2D pooling and 3D pooling

Figure 4 is the network model structure diagram of theclassification of music genres in this article-e classificationnetwork model designed in this study can be divided intothree parts according to different functions namely theinput layer the hidden layer and the output layer -e inputof the input layer is a sequence of musical segment featuresextracted frommusic -e main function of the hidden layeris to learn the final feature representation of music -ehidden layer is composed of Bi-GRU attention mechanismand fully connected layer

In the attention mechanism this article uses the fol-lowing formula to calculate the attention score corre-sponding to each feature vector

et tanh WHt + b( 1113857 (2)

where ei is the attention score of the feature vector Ht at thetime t in H

Decoder

y2y1 ym

Encoder

c1 c2 cm

x1 x2 xn

Figure 2 Schematic diagram of the codec model with the intro-duction of the attention mechanism

4 Scientific Programming

-en the calculated attention score is mapped to thevalue range (0 1) through the softmax function and theattention probability distribution of each feature vector isobtained

ai softmax ei( 1113857 exp ei( 1113857

1113936Lk1 exp ek( 1113857

(3)

-e calculated attention probability distribution andeach feature vector of the feature representation H areweighted and summed to obtain the feature vector repre-sentation v of the music file

v 1113944L

i1aiHi (4)

-e combination of Bi-GRL and the attention mecha-nism network allows the model to effectively learn the valueinformation of the different weights of the genre classifi-cation of each piece of music including forward andbackward value information and more accurately from theinput music -e useful information learned from thesegment feature sequence is helpful to improve the accuracyof classification

At the end of the hidden layer the music feature vectorpowder extracted by the attention mechanism of the fullyconnected layer is used to calculate the confidence score ofeach genre -e output layer uses the softmax function to

map the output of the hidden layer to the probability of eachgenre label to which the music file belongs Finally the genretag with the highest probability is selected as the genre tag ofthe music file

34 InstrumentRecognition -is study uses the music signalcharacteristics of traditional Chinese musical instruments toidentify musical instruments We regard the 2-second-segment musical instrument music signal as a sample usethe MFCC of the sample as the input feature and input itinto a deep belief network with H hidden layers (as shown inFigure 5) and through the output layer the softmax layeroutputs the predicted label of the musical instrument

4 Experiments and Results

41 Cross-Entropy Cost Function -e essence of neuralnetwork training is to continuously iterate to minimize theloss function and the process of model parameter conver-gence -is study uses the cross-entropy loss function todescribe the difference between the predicted value outputby the network model and the target expected value -eoutput layer of the network model calculates the probabilityof each genre through the softmax function and then cal-culates the cross-entropy loss function -e definition of thecross-entropy loss function is as follows

h1 h2 h3 h4

x1 x2 x3 x4

h0 c hprime3hprime2hprime1

y1 y2 y1

Figure 3 Attention mechanism

1h11hn

1h2x1

x2

xL

OutputA

ttent

ion

Softm

ax

Max

-po

olin

gA

vera

ge-

pool

ing

FC la

yer

FC la

yer

2h12hn

2h2

nh1nhn

nh2

Figure 4 Classification model

Scientific Programming 5

C 1n

1113944x

y ln(a) +(1 minus y)ln(1 minus a) (5)

where C represents the loss n is the number of samples x isthe input sample a is the output predicted value of thenetwork model input x and a is the target expected value ofthe network model input x

42Adam In the process of training the network model thesize of the learning rate has an important impact on theimprovement of the modelrsquos performance and the learningrate is one of the hyperparameters that are difficult to set-is article uses Adam optimization algorithm as the op-timization method of the network model Adam algorithm isan adaptive learning rate algorithm which has excellentperformance in practice and is widely used -e Adam al-gorithm designs independent adaptive learning rates fordifferent parameters by calculating the first-order momentestimation and the second-order moment estimation of thegradient -e calculation formula for adjusting the networkparameters is as follows

vt k1vtminus1 + 1 minus k1( 1113857gt

st k2stminus1 + 1 minus k2( 1113857g2t

1113954vt vt

1 minus kt1

1113954st st

1 minus kt2

Δθ α1113954vt

1113954st + ε1113968

(6)

43 Evaluation Environment In order to carry out the ex-periment smoothly we prepare the experimental data inadvance -is article downloads genre-labeled MIDI musicfiles from the Internet dedicated to sharingmusic constructsa real data set and collects a total of 2000 music files -ereare 5 genres in the data set including classical countrydance folk and metal -e number of 1VIIDI music files ofeach genre is shown in Table 1

44 Experimental Results A special Gaussian convolutionkernel is used to convolve along the diagonal of the self-similar matrix to obtain the novelty curve After smoothingthe novelty curve the peak point is extracted from it andused as the time point for segment division -e smoothednovelty curve and the extracted peak points are shown inFigure 6

According to the experimental settings 6 groups ofcomparative experiments were carried out -e brief de-scription of the experimental settings is shown in Table 2 Bycomparing Experiment 1 and Experiment 2 it can beconcluded that the classification effect of the extractedfeature set input to BP neural network for classificationexperiment is far lower than the classification effect of the 11features explored and selected in Experiment 2 according tothe genre classification task input to BP neural network Itcan be seen that the extracted music features are not suitablefor the classification task of music genres in this study whichindicates that feature extraction is not easy to be universaland feature sets usually need to be constructed according tothe actual classification task Meanwhile the validity of thefeature sets selected in this study in the classification task ofmusic genres is verified

Comparing Experiment 2 and Experiment 3 we canobtain that in Experiment 3 we divide the MIDI file intosections use the section as the analysis unit extract thefeatures of the section with the same feature combinationform the section feature sequence and input it into theclassification network Bi-GRL can learn the deeper ex-pression of music about time sequence and semantic in-formation from the input music segment feature sequencewhich can effectively improve the accuracy of music clas-sification and the classification effect is better than thetraditional 1VIIDI music classification method based on BPneural network

Comparing Experiment 4 Experiment 5 and Experi-ment 6 the music segment is divided into different methodsand the extracted music segment feature sequence will affectthe final classification performance In Experiment 5 andExperiment 6 the music was divided into segments withequal time intervals of 5 seconds and 10 seconds and thefinal classification accuracy of the experiment was lowerthan that obtained in Experiment 4 using the segment di-vision method introduced in this article -e possible reasonis that the development of music melody is a process ofrepetition and change and there is a certain transitionboundary -e division of music with equal duration doesnot take into account this music characteristic and theextracted segment feature sequence cannot describe the

RBM3

RBM2

RBM1

Figure 5 DBM-based recognition and classification networkstructure diagram of traditional Chinese musical instruments

6 Scientific Programming

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 4: Recognition and Classification Model of Music Genres and

extracted from the novelty curve and segmented -e coreidea of the algorithm is to estimate the instantaneous musicnovelty by analyzing the local self-similarity of musicplaying at a significant novel point in time the music playedin the past or the future at that point in time is within a shortperiod of time It has a high degree of self-similarity andthere is a fairly low cross-similarity between the past and thefuture at this point in time Simply put in a short period oftime before this point in time the musical style of playing issimilar After this point in time the musical composition ischanged to another style of playing -e artistic style ofplaying music has undergone major changes and theemotions and themes expressed have also changed so themusic segments can be divided

32 Attention Mechanism When humans are observingvisual images the human brain quickly scans the images thatappear in the field of view and controls the line of sight to fallon the area that you want to focus on -e human brain willallocate different attention to observation according todifferent areas in the field of view image For the areas thatthe field of view focuses on the human brain will allocatemore attention resources to observe carefully to obtain moredetails of the target area Information will be ignored forother useless areas of view-e attentionmechanism [22 23]in deep learning is similar to this It is also a mechanism ofattention resource allocation It can filter out key infor-mation that is more conducive to deep learning tasks from alarge amount of information thereby improving the per-formance of deep learning tasks such as detection [24]prediction [25] and recognition [26 27]

Figure 2 shows a simplified schematic diagram of theencoding and decoding model that introduces the attentionmechanism-e codec model with the attention mechanismcan effectively improve its limitations -e encoder nolonger converts all the information of the input sequenceinto a fixed-length context vector For different outputs itwill focus on finding significant useful information related tothe current output from the input data and calculate dif-ferent context vectors Allow the model to better learn thealignment of input and output

Taken separately the attention mechanism can be un-derstood as a query calculation process Figure 3 is a gen-eralized structure diagram of the attention mechanism x isthe input sequence data and y is the query First input ycalculate the attention score of y and each input xi throughthe function f and then map the probability distributionbetween 0 and 1 through the softmax function Finally theprobability distribution and each input are correspondinglyweighted -en calculate the output value of the attentionmechanism

-e calculation equation of the attention mechanism isas follows

attention 1113944n

i1softmax f xi y( 1113857( 1113857lowast xi (1)

33 Classification Model Compared with 2D convolutionalnetworks 3D convolutional networks can better model timeinformation through 3D convolution and 3D pooling op-erations In a two-dimensional convolutional network theprocess of convolution and pooling is completed in space Ina three-dimensional convolutional network they perform intime and space In the introduction of 3D convolutionalnetwork above it was proposed that images should beoutput when 2D convolutional network is processing im-ages and images should also be output when multipleimages (which are regarded as different channels) are op-erated -erefore the time information of input data will belost after each convolution operation in the two-dimensionalconvolutional network Only three-dimensional convolu-tion can preserve the time information of the input signaland produce the output quantity -e same principle can beapplied to 2D pooling and 3D pooling

Figure 4 is the network model structure diagram of theclassification of music genres in this article-e classificationnetwork model designed in this study can be divided intothree parts according to different functions namely theinput layer the hidden layer and the output layer -e inputof the input layer is a sequence of musical segment featuresextracted frommusic -e main function of the hidden layeris to learn the final feature representation of music -ehidden layer is composed of Bi-GRU attention mechanismand fully connected layer

In the attention mechanism this article uses the fol-lowing formula to calculate the attention score corre-sponding to each feature vector

et tanh WHt + b( 1113857 (2)

where ei is the attention score of the feature vector Ht at thetime t in H

Decoder

y2y1 ym

Encoder

c1 c2 cm

x1 x2 xn

Figure 2 Schematic diagram of the codec model with the intro-duction of the attention mechanism

4 Scientific Programming

-en the calculated attention score is mapped to thevalue range (0 1) through the softmax function and theattention probability distribution of each feature vector isobtained

ai softmax ei( 1113857 exp ei( 1113857

1113936Lk1 exp ek( 1113857

(3)

-e calculated attention probability distribution andeach feature vector of the feature representation H areweighted and summed to obtain the feature vector repre-sentation v of the music file

v 1113944L

i1aiHi (4)

-e combination of Bi-GRL and the attention mecha-nism network allows the model to effectively learn the valueinformation of the different weights of the genre classifi-cation of each piece of music including forward andbackward value information and more accurately from theinput music -e useful information learned from thesegment feature sequence is helpful to improve the accuracyof classification

At the end of the hidden layer the music feature vectorpowder extracted by the attention mechanism of the fullyconnected layer is used to calculate the confidence score ofeach genre -e output layer uses the softmax function to

map the output of the hidden layer to the probability of eachgenre label to which the music file belongs Finally the genretag with the highest probability is selected as the genre tag ofthe music file

34 InstrumentRecognition -is study uses the music signalcharacteristics of traditional Chinese musical instruments toidentify musical instruments We regard the 2-second-segment musical instrument music signal as a sample usethe MFCC of the sample as the input feature and input itinto a deep belief network with H hidden layers (as shown inFigure 5) and through the output layer the softmax layeroutputs the predicted label of the musical instrument

4 Experiments and Results

41 Cross-Entropy Cost Function -e essence of neuralnetwork training is to continuously iterate to minimize theloss function and the process of model parameter conver-gence -is study uses the cross-entropy loss function todescribe the difference between the predicted value outputby the network model and the target expected value -eoutput layer of the network model calculates the probabilityof each genre through the softmax function and then cal-culates the cross-entropy loss function -e definition of thecross-entropy loss function is as follows

h1 h2 h3 h4

x1 x2 x3 x4

h0 c hprime3hprime2hprime1

y1 y2 y1

Figure 3 Attention mechanism

1h11hn

1h2x1

x2

xL

OutputA

ttent

ion

Softm

ax

Max

-po

olin

gA

vera

ge-

pool

ing

FC la

yer

FC la

yer

2h12hn

2h2

nh1nhn

nh2

Figure 4 Classification model

Scientific Programming 5

C 1n

1113944x

y ln(a) +(1 minus y)ln(1 minus a) (5)

where C represents the loss n is the number of samples x isthe input sample a is the output predicted value of thenetwork model input x and a is the target expected value ofthe network model input x

42Adam In the process of training the network model thesize of the learning rate has an important impact on theimprovement of the modelrsquos performance and the learningrate is one of the hyperparameters that are difficult to set-is article uses Adam optimization algorithm as the op-timization method of the network model Adam algorithm isan adaptive learning rate algorithm which has excellentperformance in practice and is widely used -e Adam al-gorithm designs independent adaptive learning rates fordifferent parameters by calculating the first-order momentestimation and the second-order moment estimation of thegradient -e calculation formula for adjusting the networkparameters is as follows

vt k1vtminus1 + 1 minus k1( 1113857gt

st k2stminus1 + 1 minus k2( 1113857g2t

1113954vt vt

1 minus kt1

1113954st st

1 minus kt2

Δθ α1113954vt

1113954st + ε1113968

(6)

43 Evaluation Environment In order to carry out the ex-periment smoothly we prepare the experimental data inadvance -is article downloads genre-labeled MIDI musicfiles from the Internet dedicated to sharingmusic constructsa real data set and collects a total of 2000 music files -ereare 5 genres in the data set including classical countrydance folk and metal -e number of 1VIIDI music files ofeach genre is shown in Table 1

44 Experimental Results A special Gaussian convolutionkernel is used to convolve along the diagonal of the self-similar matrix to obtain the novelty curve After smoothingthe novelty curve the peak point is extracted from it andused as the time point for segment division -e smoothednovelty curve and the extracted peak points are shown inFigure 6

According to the experimental settings 6 groups ofcomparative experiments were carried out -e brief de-scription of the experimental settings is shown in Table 2 Bycomparing Experiment 1 and Experiment 2 it can beconcluded that the classification effect of the extractedfeature set input to BP neural network for classificationexperiment is far lower than the classification effect of the 11features explored and selected in Experiment 2 according tothe genre classification task input to BP neural network Itcan be seen that the extracted music features are not suitablefor the classification task of music genres in this study whichindicates that feature extraction is not easy to be universaland feature sets usually need to be constructed according tothe actual classification task Meanwhile the validity of thefeature sets selected in this study in the classification task ofmusic genres is verified

Comparing Experiment 2 and Experiment 3 we canobtain that in Experiment 3 we divide the MIDI file intosections use the section as the analysis unit extract thefeatures of the section with the same feature combinationform the section feature sequence and input it into theclassification network Bi-GRL can learn the deeper ex-pression of music about time sequence and semantic in-formation from the input music segment feature sequencewhich can effectively improve the accuracy of music clas-sification and the classification effect is better than thetraditional 1VIIDI music classification method based on BPneural network

Comparing Experiment 4 Experiment 5 and Experi-ment 6 the music segment is divided into different methodsand the extracted music segment feature sequence will affectthe final classification performance In Experiment 5 andExperiment 6 the music was divided into segments withequal time intervals of 5 seconds and 10 seconds and thefinal classification accuracy of the experiment was lowerthan that obtained in Experiment 4 using the segment di-vision method introduced in this article -e possible reasonis that the development of music melody is a process ofrepetition and change and there is a certain transitionboundary -e division of music with equal duration doesnot take into account this music characteristic and theextracted segment feature sequence cannot describe the

RBM3

RBM2

RBM1

Figure 5 DBM-based recognition and classification networkstructure diagram of traditional Chinese musical instruments

6 Scientific Programming

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 5: Recognition and Classification Model of Music Genres and

-en the calculated attention score is mapped to thevalue range (0 1) through the softmax function and theattention probability distribution of each feature vector isobtained

ai softmax ei( 1113857 exp ei( 1113857

1113936Lk1 exp ek( 1113857

(3)

-e calculated attention probability distribution andeach feature vector of the feature representation H areweighted and summed to obtain the feature vector repre-sentation v of the music file

v 1113944L

i1aiHi (4)

-e combination of Bi-GRL and the attention mecha-nism network allows the model to effectively learn the valueinformation of the different weights of the genre classifi-cation of each piece of music including forward andbackward value information and more accurately from theinput music -e useful information learned from thesegment feature sequence is helpful to improve the accuracyof classification

At the end of the hidden layer the music feature vectorpowder extracted by the attention mechanism of the fullyconnected layer is used to calculate the confidence score ofeach genre -e output layer uses the softmax function to

map the output of the hidden layer to the probability of eachgenre label to which the music file belongs Finally the genretag with the highest probability is selected as the genre tag ofthe music file

34 InstrumentRecognition -is study uses the music signalcharacteristics of traditional Chinese musical instruments toidentify musical instruments We regard the 2-second-segment musical instrument music signal as a sample usethe MFCC of the sample as the input feature and input itinto a deep belief network with H hidden layers (as shown inFigure 5) and through the output layer the softmax layeroutputs the predicted label of the musical instrument

4 Experiments and Results

41 Cross-Entropy Cost Function -e essence of neuralnetwork training is to continuously iterate to minimize theloss function and the process of model parameter conver-gence -is study uses the cross-entropy loss function todescribe the difference between the predicted value outputby the network model and the target expected value -eoutput layer of the network model calculates the probabilityof each genre through the softmax function and then cal-culates the cross-entropy loss function -e definition of thecross-entropy loss function is as follows

h1 h2 h3 h4

x1 x2 x3 x4

h0 c hprime3hprime2hprime1

y1 y2 y1

Figure 3 Attention mechanism

1h11hn

1h2x1

x2

xL

OutputA

ttent

ion

Softm

ax

Max

-po

olin

gA

vera

ge-

pool

ing

FC la

yer

FC la

yer

2h12hn

2h2

nh1nhn

nh2

Figure 4 Classification model

Scientific Programming 5

C 1n

1113944x

y ln(a) +(1 minus y)ln(1 minus a) (5)

where C represents the loss n is the number of samples x isthe input sample a is the output predicted value of thenetwork model input x and a is the target expected value ofthe network model input x

42Adam In the process of training the network model thesize of the learning rate has an important impact on theimprovement of the modelrsquos performance and the learningrate is one of the hyperparameters that are difficult to set-is article uses Adam optimization algorithm as the op-timization method of the network model Adam algorithm isan adaptive learning rate algorithm which has excellentperformance in practice and is widely used -e Adam al-gorithm designs independent adaptive learning rates fordifferent parameters by calculating the first-order momentestimation and the second-order moment estimation of thegradient -e calculation formula for adjusting the networkparameters is as follows

vt k1vtminus1 + 1 minus k1( 1113857gt

st k2stminus1 + 1 minus k2( 1113857g2t

1113954vt vt

1 minus kt1

1113954st st

1 minus kt2

Δθ α1113954vt

1113954st + ε1113968

(6)

43 Evaluation Environment In order to carry out the ex-periment smoothly we prepare the experimental data inadvance -is article downloads genre-labeled MIDI musicfiles from the Internet dedicated to sharingmusic constructsa real data set and collects a total of 2000 music files -ereare 5 genres in the data set including classical countrydance folk and metal -e number of 1VIIDI music files ofeach genre is shown in Table 1

44 Experimental Results A special Gaussian convolutionkernel is used to convolve along the diagonal of the self-similar matrix to obtain the novelty curve After smoothingthe novelty curve the peak point is extracted from it andused as the time point for segment division -e smoothednovelty curve and the extracted peak points are shown inFigure 6

According to the experimental settings 6 groups ofcomparative experiments were carried out -e brief de-scription of the experimental settings is shown in Table 2 Bycomparing Experiment 1 and Experiment 2 it can beconcluded that the classification effect of the extractedfeature set input to BP neural network for classificationexperiment is far lower than the classification effect of the 11features explored and selected in Experiment 2 according tothe genre classification task input to BP neural network Itcan be seen that the extracted music features are not suitablefor the classification task of music genres in this study whichindicates that feature extraction is not easy to be universaland feature sets usually need to be constructed according tothe actual classification task Meanwhile the validity of thefeature sets selected in this study in the classification task ofmusic genres is verified

Comparing Experiment 2 and Experiment 3 we canobtain that in Experiment 3 we divide the MIDI file intosections use the section as the analysis unit extract thefeatures of the section with the same feature combinationform the section feature sequence and input it into theclassification network Bi-GRL can learn the deeper ex-pression of music about time sequence and semantic in-formation from the input music segment feature sequencewhich can effectively improve the accuracy of music clas-sification and the classification effect is better than thetraditional 1VIIDI music classification method based on BPneural network

Comparing Experiment 4 Experiment 5 and Experi-ment 6 the music segment is divided into different methodsand the extracted music segment feature sequence will affectthe final classification performance In Experiment 5 andExperiment 6 the music was divided into segments withequal time intervals of 5 seconds and 10 seconds and thefinal classification accuracy of the experiment was lowerthan that obtained in Experiment 4 using the segment di-vision method introduced in this article -e possible reasonis that the development of music melody is a process ofrepetition and change and there is a certain transitionboundary -e division of music with equal duration doesnot take into account this music characteristic and theextracted segment feature sequence cannot describe the

RBM3

RBM2

RBM1

Figure 5 DBM-based recognition and classification networkstructure diagram of traditional Chinese musical instruments

6 Scientific Programming

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 6: Recognition and Classification Model of Music Genres and

C 1n

1113944x

y ln(a) +(1 minus y)ln(1 minus a) (5)

where C represents the loss n is the number of samples x isthe input sample a is the output predicted value of thenetwork model input x and a is the target expected value ofthe network model input x

42Adam In the process of training the network model thesize of the learning rate has an important impact on theimprovement of the modelrsquos performance and the learningrate is one of the hyperparameters that are difficult to set-is article uses Adam optimization algorithm as the op-timization method of the network model Adam algorithm isan adaptive learning rate algorithm which has excellentperformance in practice and is widely used -e Adam al-gorithm designs independent adaptive learning rates fordifferent parameters by calculating the first-order momentestimation and the second-order moment estimation of thegradient -e calculation formula for adjusting the networkparameters is as follows

vt k1vtminus1 + 1 minus k1( 1113857gt

st k2stminus1 + 1 minus k2( 1113857g2t

1113954vt vt

1 minus kt1

1113954st st

1 minus kt2

Δθ α1113954vt

1113954st + ε1113968

(6)

43 Evaluation Environment In order to carry out the ex-periment smoothly we prepare the experimental data inadvance -is article downloads genre-labeled MIDI musicfiles from the Internet dedicated to sharingmusic constructsa real data set and collects a total of 2000 music files -ereare 5 genres in the data set including classical countrydance folk and metal -e number of 1VIIDI music files ofeach genre is shown in Table 1

44 Experimental Results A special Gaussian convolutionkernel is used to convolve along the diagonal of the self-similar matrix to obtain the novelty curve After smoothingthe novelty curve the peak point is extracted from it andused as the time point for segment division -e smoothednovelty curve and the extracted peak points are shown inFigure 6

According to the experimental settings 6 groups ofcomparative experiments were carried out -e brief de-scription of the experimental settings is shown in Table 2 Bycomparing Experiment 1 and Experiment 2 it can beconcluded that the classification effect of the extractedfeature set input to BP neural network for classificationexperiment is far lower than the classification effect of the 11features explored and selected in Experiment 2 according tothe genre classification task input to BP neural network Itcan be seen that the extracted music features are not suitablefor the classification task of music genres in this study whichindicates that feature extraction is not easy to be universaland feature sets usually need to be constructed according tothe actual classification task Meanwhile the validity of thefeature sets selected in this study in the classification task ofmusic genres is verified

Comparing Experiment 2 and Experiment 3 we canobtain that in Experiment 3 we divide the MIDI file intosections use the section as the analysis unit extract thefeatures of the section with the same feature combinationform the section feature sequence and input it into theclassification network Bi-GRL can learn the deeper ex-pression of music about time sequence and semantic in-formation from the input music segment feature sequencewhich can effectively improve the accuracy of music clas-sification and the classification effect is better than thetraditional 1VIIDI music classification method based on BPneural network

Comparing Experiment 4 Experiment 5 and Experi-ment 6 the music segment is divided into different methodsand the extracted music segment feature sequence will affectthe final classification performance In Experiment 5 andExperiment 6 the music was divided into segments withequal time intervals of 5 seconds and 10 seconds and thefinal classification accuracy of the experiment was lowerthan that obtained in Experiment 4 using the segment di-vision method introduced in this article -e possible reasonis that the development of music melody is a process ofrepetition and change and there is a certain transitionboundary -e division of music with equal duration doesnot take into account this music characteristic and theextracted segment feature sequence cannot describe the

RBM3

RBM2

RBM1

Figure 5 DBM-based recognition and classification networkstructure diagram of traditional Chinese musical instruments

6 Scientific Programming

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 7: Recognition and Classification Model of Music Genres and

music well so it affected classification performance InExperiment 4 this study finds the mutation points of musicplaying to divide the music segment which can achieve ahigher classification effect -e experimental results verifythe effectiveness of the music segmentation method used inthis study

5 Conclusion

In this study we propose a method of music genre classi-fication based on deep learning According to the charac-teristic sequence of the input music segment the cyclicneural network and attention mechanism are studied andthe Bi-GRU and attention mechanism are used to design theclassification network model Bi-GRU is good at processingsequence data It can learn the contextual semantics anddeep features of music from the sequence feature sequence-e attention mechanism is added to automatically assigndifferent attention weights to the features learned by Bi-GRUfrom different segments and learn more significant musicfeatures thereby improving the accuracy of classification In

addition this study also proposes a recognition and clas-sification algorithm for traditional Chinese musical in-struments based on deep belief networks -e experimentalresults of the study have achieved credible results

Data Availability

-e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

-e author has no conflicts of interest regarding the pub-lication of this study

References

[1] J A Sloboda and P N Juslin ldquoPsychological perspectives onmusic and emotionrdquo Music and Emotion eory andResearch pp 71ndash104 Oxford University Press Oxford UK2001

[2] A Kresovich M K Reffner Collins D Riffe andF R D Carpentier ldquoA content analysis of mental healthdiscourse in popular rap musicrdquo JAMA Pediatrics vol 175no 3 pp 286ndash292 2021

[3] T Eerola J K Vuoskoski H-R Peltola V Putkinen andK Schafer ldquoAn integrative review of the enjoyment of sadnessassociated with musicrdquo Physics of Life Reviews vol 25pp 100ndash121 2018

[4] Y Li W Hu and Y Wang ldquoMusic rhythm customizedmobile application based on information extractionrdquo inProceedings of the 4th International Conference on SmartComputing and Communication pp 304ndash309 BirminghamUK October 2019

[5] F Medhat D Chesmore and J Robinson ldquoAutomaticclassification of music genre using masked conditional neuralnetworksrdquo in Proceedings of the 2017 IEEE InternationalConference on Data Mining (ICDM) pp 979ndash984 IEEE NewOrleans LA USA November 2017 In press

[6] S Vishnupriya and K Meenakshi ldquoAutomatic music genreclassification using convolution neural networkrdquo in Pro-ceedings of the 2018 International Conference on ComputerCommunication and Informatics (ICCCI) pp 1ndash4 IEEECoimbatore India January 2018

[7] S Shetty and S Hegde ldquoAutomatic classification of carnaticmusic instruments using MFCC and LPCrdquo Data Manage-ment Analytics and Innovation Springer Singaporepp 463ndash474 2020

[8] Y T Chen C H Chen S Wu and C C Lo ldquoA two-stepapproach for classifying music genre on the strength of AHPweighted musical featuresrdquo Mathematics vol 7 no 119 pages 2019 In press

[9] R Liu X Ning W Cai and G Li ldquoMultiscale dense cross-attention mechanism with covariance pooling for hyper-spectral image scene classificationrdquo Mobile Information Sys-tems vol 2021 Article ID 9962057 15 pages 2021

[10] C Yan G Pang X Bai Z Zhou and L Gu ldquoBeyond tripletloss person re-identification with fine-grained difference-aware pairwise lossrdquo IEEE Transactions on Multimedia 2021

[11] Y Ding X Zhao Z Zhang W Cai and N Yang ldquoMultiscalegraph sample and aggregate network with context-awarelearning for hyperspectral image classificationrdquo IEEE Journal

Table 1 Number of music files in five genres

Classical Country Pop Rock Metal Total400 386 375 440 399 2000

35

30

25

20

15

10

5

00 200 400 600 800 1000 1200 1400 1600

Figure 6 Novelty curve and peak point

Table 2 Comparison of experimental results of musicclassification

Experimentnumber Category Acc

1 BP neural network + local features 0752 BP neural network + global features 0863 Bi-GRU+dense 0884 Bi-GRU+ attention +dense 091

5 Bi-GRU+ attention + dense (5seconds) 087

6 Bi-GRU+ attention + dense (10seconds) 088

Scientific Programming 7

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming

Page 8: Recognition and Classification Model of Music Genres and

of Selected Topics in Applied Earth Observations and RemoteSensing vol 14 pp 4561ndash4572 2021 In Press

[12] Y Tong L Yu S Li J Liu H Qin and W Li ldquoPolynomialfitting algorithm based on neural networkrdquo ASP Transactionson Pattern Recognition and Intelligent Systems vol 1 no 1pp 32ndash39 2021

[13] X Ning K Gong W Li L Zhang X Bai and S TianldquoFeature refinement and filter network for person re-identi-ficationrdquo IEEE Transactions on Circuits and Systems for VideoTechnology 2020

[14] W Cai Z Wei R Liu Y Zhuang Y Wang and X NingldquoRemote sensing image recognition based on multi-attentionresidual fusion networksrdquo ASP Transactions on PatternRecognition and Intelligent Systems vol 1 no 1 pp 1ndash8 2021

[15] X Zhang Y Yang Z Li X Ning Y Qin and W Cai ldquoAnimproved encoder-decoder network based on strip poolmethod applied to segmentation of farmland vacancy fieldrdquoEntropy vol 23 no 4 p 435 2021

[16] X Ning X Wang S Xu et al ldquoA review of research on co-trainingrdquo Concurrency and Computation Practice and Ex-perience 2021

[17] D Bisharad and R H Laskar ldquoMusic genre recognition usingconvolutional recurrent neural network architecturerdquo ExpertSystems vol 36 no 4 Article ID e12429 2019

[18] S Iloga O Romain and M Tchuente ldquoA sequential patternmining approach to design taxonomies for hierarchical musicgenre recognitionrdquo Pattern Analysis and Applications vol 21no 2 pp 363ndash380 2018

[19] S Oramas F Barbieri O Nieto and X Serra ldquoMultimodaldeep learning for music genre classificationrdquo Transactions ofthe International Society for Music Information Retrievalvol 1 no 1 pp 4ndash21 2018

[20] H Bahuleyan ldquoMusic genre classification using machinelearning techniquesrdquo 2018 httparxivorgabs180401149

[21] R Yang L Feng H Wang J Yao and S Luo ldquoParallelrecurrent convolutional neural networks-based music genreclassification method for mobile devicesrdquo IEEE Access vol 8pp 19629ndash19637 2020 In press

[22] W Cai and Z Wei ldquoRemote sensing image classificationbased on a cross-attention mechanism and graph convolu-tionrdquo IEEE Geoscience and Remote Sensing Letters pp 1ndash52020 In Press

[23] W Cai B Liu Z Wei M Li and J Kan ldquoTARDB-Net triple-attention guided residual dense and BiLSTM networks forhyperspectral image classificationrdquo Multimedia Tools andApplications vol 80 no 7 pp 11291ndash11312 2021

[24] Z Chu M Hu and X Chen ldquoRobotic grasp detection using anovel two-stage approachrdquo ASP Transactions on Internet ofings vol 1 no 1 pp 19ndash29 2021

[25] W Sun P Zhang Z Wang and D Li ldquoPrediction of car-diovascular diseases based on machine learningrdquo ASPTransactions on Internet of ings vol 1 no 1 pp 30ndash352021

[26] L Sun W Li X Ning L Zhang X Dong and W HeldquoGradient-enhanced softmax for face recognitionrdquo IEICETransactions on Information and Systems vol E103D no 5pp 1185ndash1189 2020

[27] Y Zhang W Li L Zhang X Ning L Sun and Y LuldquoAGCNN adaptive gabor convolutional neural networks withreceptive fields for vein biometric recognitionrdquo Concurrencyand Computation Practice and Experience Article ID e56972020 In press

8 Scientific Programming