Website sections
Editor's Choice:
- Family tree professions study in dow and
- Instructions for the preparation and conduct of the exam on the subject "Physics
- Order of the Ministry of Health of the Russian Federation from 30
- 580 N of 10.12 12. New Rules for the Financial Support of Preventive Measures to Reduce Industrial Injuries and Occupational Diseases of Employees! Ministry of Labor and Social Protection of the Russian Federation
- Methodical recommendations “Filling and maintaining a magazine of planning and accounting for club formation
- Competitions for educators from the Ministry of Education
- All-Russian open lesson on the fight against AIDS
- Reader Competition Regulations
- Project "Model for creating a school of financial literacy" Mopeu & I
- A collection of slides collected in one file form a presentation
Advertising
Oge physics procedure. Instructions for the preparation and conduct of the exam on the subject "Physics |
Lesson summary Visualization of information in text documents (Grade 8, lesson 25, textbook L. Bosova). In the lesson, a generalization of ideas about how to create lists, tables, graphical objects and the possibilities of their use in text documents takes place. Planned educational results: Solved training tasks: Key concepts learned in the lesson: ICT tools used in the lesson: E-learning resources Features of the presentation of the content of the lesson topic 1. Organizational moment (1 minute)
2. Repeat (5 minutes)
3. Learning new material (20 minutes)
1 slide - name of the presentation; 2 slide - keywords; 3 slide - visualization (diagram with examples); 4 slide - lists (diagram); 5 slide - tables (schemes); View and discuss the animation "Working with tables." 6 slide - graphic images (schemes); 7 slide - the most important thing. Questions and Tasks
4. Practical part (15 minutes)
5. Summarizing the lesson. Post homework. Grading (4 minutes)
Homework. All material for the lesson is in the archive. Archive includes: Download (2.35 MB, rar): Lesson summary REPEAT No. 1. Determine which group of operations (editing or formatting) the following actions belong to: formatting Replacing one character with another; Insert a missing word; Font change; Removing a fragment of text; Text alignment in width; Automatic spell check; Line spacing Resizing page margins; Delete an erroneous character; Search and replacement; Moving text fragments. REPEAT No. 2. Define which group (character properties or No. 2 paragraph properties) the following properties belong: paragraph Font Alignment Interval after Indent of the first line Inscription Color Line spacing Indent on the left Indent before Size (size) of the font Indent on the right REPEAT No. 3. In which of the following sentences No. 3 are the spaces between words and punctuation marks correctly placed? Where are the mistakes made? 1) 2) 3) 4) From your native land - die, do not go. Speech is not to weave bast shoes. Where he was born, there he fit. Saying is funny, hiding is a sin. REPEAT No. 4. Select the parameters to be set when No. 4 sets the page parameters: Orientation Style Font size Paper size Page numbers Fields Line spacing Indentation Paragraph alignment Inscription Visualization - presentation of information in a visual form. Textual information is presented in the form of lists, tables, diagrams, provide illustrations (photographs, diagrams, drawings). Visualization of information List Table Diagram Illustration List of objects Grade 8 1. Algebra 2. English language Grade 3. Biology 4. Student. Geography Mathematics Informatics 5. Geometry 6. Informatics and ICT Ivanov 7. History 5 Sasha 4 8. Literature 9. OBL Orlova Katya 4 5 10. Social Studies 11. Russian language Petrov 12. Physics Victor 5 5 13. Chemistry 14. Drawing Color Chart LISTS All kinds of lists in documents are drawn up using lists. List items are considered as paragraphs drawn up according to a single model. List structure Labeled 1. 2. 3. 4. 5. 6. Russian language Algebra OBZh Social Studies Biology Technology Lesson schedule - example of a numbered list Numbered Russian language Literature Algebra Geometry Physics List of subjects studied in grade 7 - example of a bulleted list The structure distinguishes between single-level and multi-level lists. A list, the element of which is itself a list, is called multilevel Example: Chapter 1. Information and information processes § 1. 1. Information and its properties 1. 1. 1. Information and signal 1. 1. 2. Types of information 1. 1. 3 Properties of information § 1. 2. Information processes 1. 2. 1. The concept of information process 1. 2. 2. Collection of information LIST CREATION TOOLS Quickly create lists with notes and numbering using the buttons on the toolbar: CHANGE OF MARKED LISTS: To change the appearance of the marker, you can use the Change button. The Change Bulleted List window appears, which contains additional notes. When you click the Marker button, the Symbol dialog box appears, in which you can select any of the symbols as a list marker. CHANGE NUMBERED LISTS: To create your own version of a numbered list, click the Change button. The Modify Numbered List window appears. The Number format field indicates the text before and after the number of the list item, for example). In the Numbering field, the numbering style. The Start with ... field indicates the number (or letter) from which the list should begin. To change the font of the list item numbers, use the Font button. PRESENTING THE LIST OF COMPUTER DEVICES AS A MULTILEVEL LIST HAVING FOUR LEVELS OF IMPOSITION. Devices of a modern computer Processor Memory RAM Long-term memory Hard magnetic diskette Diskette Flash memory Optical discs CD DVD Input devices Keyboard Mouse Scanner Graphic tablet Digital camera Microphone Joystick Output devices Monitor LCD cathode ray tube printer Dot matrix printer Inkjet printer Laser printer Let's give the first line a formatting style, for example, Heading 1. Devices of a modern computer Processor Memory RAM Long-term memory Hard magnetic diskette Diskette Flash memory Optical discs CD DVD Input devices Keyboard Mouse Scanner Graphic tablet Convert the remaining lines to a multilevel § § list. For this it is necessary: \u200b\u200bselect all remaining lines; give the command Format-List. In the List dialog box, go to the Multilevel tab and select a list of type there: The list will take the following form: Devices of a modern computer 1. Processor 2. Memory 3. RAM 4. Long-term memory 5. Hard magnetic disk 6. Diskette 7. Flash memory 8. Optical disks 9. CD 10. DVD 11. Input devices 12 Keyboard 13. Mouse 14. Scanner 15. Graphic tablet 16. Digital camera 17. Microphone 18. Joystick 19. Output devices 20. Monitor 21. LCD monitor 22. Monitor with cathode ray tube 23. Printer 24. Dot matrix printer 25. Inkjet printer 26. Laser printer Select items 3 - 10 and lower their level. To do this, use the Increase Indent button of a modern computer device 1. Processor 2. Memory 3. RAM 4. Long-term memory 5. Hard disk 6. Diskette 7. Flash memory 8. Optical disks 9. CD 10. DVD 11. Input devices 12. Keyboard 13. Mouse 14. Scanner 15. Graphic tablet 16. Digital camera 17. Microphone 18. Joystick The list will take the following form: Devices of a modern computer 1. Processor 2. Memory 2. 1. RAM 2. 2. Long-term memory 2. 3. Hard disk 2. 4. Floppy disk 2. 5. Flash memory 2. 6. Optical disks 2. 7. CD 2. 8. DVD 3. Input devices 4. Keyboard 5. Mouse 6. Scanner 7. Graphic tablet 8. Digital camera 9. Microphone 10. Joystick Select items 2. 3 - 2. 8 and lower their level. To do this, use the Increase Indent button of a modern computer device 1. Processor 2. Memory 2. 1. RAM 2. 2. Long-term memory 2. 3. Hard disk 2. 4. Floppy disk 2. 5. Flash memory 2. 6. Optical disks 2. 7. CD 2. 8. DVD 3. Input devices 4. Keyboard 5. Mouse 6. Scanner 7. Graphic tablet 8. Digital camera 9. Microphone 10. Joystick The list will take the following form: Devices of a modern computer 1. Processor 2. Memory 2. 1. RAM 2. 2. Long-term memory 2. 2. 1. Hard disk 2. 2. 2. Floppy disk 2. 2. 3. Flash memory 2. 2. 4. Optical disks 2. 2. 5. CD 2. 2. 6. DVD 3. Input devices 4. Keyboard 5. Mouse 6. Scanner 7. Graphic tablet 8. Digital camera 9. Microphone 10. Joystick Select items 2. 2. 5 - 2. 2. 6 and lower their level. Devices of a modern computer 1. Processor 2. Memory 2. 1. RAM 2. 2. Long-term memory 2. 2. 1. Hard magnetic disk 2. 2. 2. Floppy disk 2. 2. 3. Flash memory 2. 2. 4. Optical discs 2. 2. 4. 1. CD 2. 2. 4. 2. DVD 3. Input devices 4. Keyboard 5. Mouse 6. Scanner 7. Graphic tablet 8. Digital camera 9. Microphone 10. Joystick Repeat the same operations for other items in the list: 3. Input devices 3. 1. Keyboard 3. 2. Mouse 3. 3. Scanner 3. 4. Graphic tablet 3. 5. Digital camera 3. 6. Microphone 3. 7. Joystick 4 4. Output devices 4. 1. Monitor 4. 1. 1. LCD monitor 4. 1. 2. Cathode ray tube monitor 4. 2. Printer 4. 2. 1. Dot matrix printer 4. 2. 2. Inkjet printer 4 2. 2. 3. Laser printer In the Russian-speaking sector of the Internet, there are very few educational practical examples (and with an example code even less) of the analysis of text messages in Russian. Therefore, I decided to collect data together and consider an example of clustering, since it does not require the preparation of data for training. Import numpy as np import pandas as pd import nltk import re import os import codecs from sklearn import feature_extraction import mpld3 import matplotlib.pyplot as plt import matplotlib as mpl Import vk # you pass the session id session \u003d vk.Session (access_token \u003d "") # URL to get access_token, instead of tvoi_id insert the id of the created Bk application: # https://oauth.vk.com/authorize?client_id\u003dtvoi_id&scope\u003dfriends, pages, groups, offline & redirect_uri \u003d https: //oauth.vk.com/blank.html&display\u003dpage&v\u003d5.21&response_type\u003dtoken api \u003d vk.API (session) poss \u003d id_pab \u003d -59229916 #id publics begin with a minus id no minus info \u003d api.wall.get (owner_id \u003d id_pab, offset \u003d 0, count \u003d 1) kolvo \u003d (info // 100) +1 shag \u003d 100 sdvig \u003d 0 h \u003d 0 import time while h We read the data into an array and proceed to normalization - reduction of the word to its initial form. This can be done in several ways using the Porter Stemmer, MyStem Stemmer and PyMorphy2. I want to warn you - MyStem works through wrapper, so the speed of operations is very slow. Let us dwell on the Porter Stemmer, although no one bothers to use others and combine them with each other (for example, go through PyMorphy2, and then Porter with the Stemmer). Titles \u003d open ("material4.csv", "r", encoding \u003d "utf-8", errors \u003d "ignore"). Read (). Split ("\\ n") print (str (len (titles)) + "requests read") from nltk.stem.snowball import SnowballStemmer stemmer \u003d SnowballStemmer ("russian") def token_and_stem (text): tokens \u003d filtered_tokens \u003d for token in tokens: if re.search ("[а-яА-Я]" , token): filtered_tokens.append (token) stems \u003d return stems def token_only (text): tokens \u003d filtered_tokens \u003d for token in tokens: if re.search ("[xAA-Z]", token): filtered_tokens.append (token) return filtered_tokens # Create dictionaries (arrays) from the resulting stems allwords_tokenized) Pymorphy2 import pymorphy2 morph \u003d pymorphy2.MorphAnalyzer () G \u003d for i in titles: h \u003d i.split ("") #print (h) s \u003d "" for k in h: #print (k) p \u003d morph.parse ( k) .normal_form #print (p) s + \u003d "" s + \u003d p #print (s) # G.append (p) #print (s) G.append (s) pymof \u003d open ("pymof_pod.txt", "w", encoding \u003d "utf-8", errors \u003d "ignore") pymofcsv \u003d open ("pymofcsv_pod.csv", "w", encoding \u003d "utf-8", errors \u003d "ignore") for item in G : pymof.write ("% s \\ n"% item) pymofcsv.write ("% s \\ n"% item) pymof.close () pymofcsv.close () pymystem3 The analyzer executable files for the current operating system will be automatically downloaded and installed the first time you use the library. From pymystem3 import Mystem m \u003d Mystem () A \u003d for i in titles: #print (i) lemmas \u003d m.lemmatize (i) A.append (lemmas) # This array can be saved to a file or "saved" import pickle with open ("mystem.pkl", "wb") as handle: pickle.dump (A, handle) Create a TF-IDF weight matrix. We will consider each search query as a document (this is done when analyzing posts on Twitter, where each tweet is a document). we will take tfidf_vectorizer from the sklearn package, and we will take stop words from the ntlk package (initially it will be necessary to download via nltk.download ()). The parameters can be adjusted as you see fit - from the upper and lower bounds to the number of n-grams (in this case, take 3). Stopwords \u003d nltk.corpus.stopwords.words ("russian") # you can expand the list of stop words stopwords.extend (["what", "this", "so", "here", "be", "how", "c", "k", "on"]) from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer n_featur \u003d 200000 tfidf_vectorizer \u003d TfidfVectorizer (max_df \u003d 0.8, max_features \u003d 10000, min_df \u003d 0.01, stop_words \u003d stopwords, use_idf \u003d True, tokenizer \u003d token_and_stem, ngram_range \u003d (1,3)) get_ipython (). magic ("time tfidf_matrix \u003d tfidf_vectorizer.fit_transform (titles)") print (tfidf_matrix.shape) Num_clusters \u003d 5 # K-means method - KMeans from sklearn.cluster import KMeans km \u003d KMeans (n_clusters \u003d num_clusters) get_ipython (). Magic ("time km.fit (tfidf_matrix)") idx \u003d km.fit (tfidf_matrix) clusters \u003d km.labels_.tolist () print (clusters) print (km.labels_) # MiniBatchKMeans from sklearn.cluster import MiniBatchKMeans mbk \u003d MiniBatchKMeans (init \u003d "random", n_clusters \u003d num_clusters) # (init \u003d "k-means ++", ' random 'or an ndarray) mbk.fit_transform (tfidf_matrix)% time mbk.fit (tfidf_matrix) miniclusters \u003d mbk.labels_.tolist () print (mbk.labels_) # DBSCAN from sklearn.cluster import DBSCAN get_ipython () magic ( time db \u003d DBSCAN (eps \u003d 0.3, min_samples \u003d 10) .fit (tfidf_matrix) ") labels \u003d db.labels_ labels.shape print (labels) # Agglomerative clustering from sklearn.cluster import AgglomerativeClustering agglo1 \u003d AgglomerativeClustering (n_clusters \u003d affiliate \u003d "euclidean") #affinity you can choose any or try s all in turn: cosine, l1, l2, manhattan get_ipython () magic ( "time answer \u003d agglo1.fit_predict (tfidf_matrix.toarray ())") answer.shape. # k-means clusterkm \u003d km.labels_.tolist () #minikmeans clustermbk \u003d mbk.labels_.tolist () #dbscan clusters3 \u003d labels #agglo # clusters4 \u003d answer.tolist () frame \u003d pd.DataFrame (titles, index \u003d) # k-means out \u003d ("title": titles, "cluster": clusterkm) frame1 \u003d pd.DataFrame (out, index \u003d, columns \u003d ["title", "cluster"]) #mini out \u003d ("title" : titles, "cluster": clustermbk) frame_minik \u003d pd.DataFrame (out, index \u003d, columns \u003d ["title", "cluster"]) frame1 ["cluster"]. value_counts () frame_minik ["cluster"]. value_counts () First you need to calculate the distance between the vectors. The cosine distance will be used for this. The articles suggest using subtraction from unity so that there are no negative values \u200b\u200band is in the range from 0 to 1, so we will do the same: From sklearn.metrics.pairwise import cosine_similarity dist \u003d 1 - cosine_similarity (tfidf_matrix) dist.shape The Incremental PCA algorithm is used as a replacement for the principal component method (PCA) when the data set to be decomposed is too large to fit in RAM. IPCA creates a low-level approximation for input using a memory size that is independent of the number of input data samples. # The principal component method is PCA from sklearn.decomposition import IncrementalPCA icpa \u003d IncrementalPCA (n_components \u003d 2, batch_size \u003d 16) get_ipython (). Magic ("time icpa.fit (dist) #demo \u003d") get_ipython (). Magic (" time demo2 \u003d icpa.transform (dist) ") xs, ys \u003d demo2 [:, 0], demo2 [:, 1] # PCA 3D from sklearn.decomposition import IncrementalPCA icpa \u003d IncrementalPCA (n_components \u003d 3, batch_size \u003d 16) get_ipython () .magic ("time icpa.fit (dist) #demo \u003d") get_ipython (). magic ("time ddd \u003d icpa.transform (dist)") xs, ys, zs \u003d ddd [:, 0], ddd [:, 1], ddd [:, 2] # You can immediately look at what the result will be #from mpl_toolkits.mplot3d import Axes3D #fig \u003d plt.figure () #ax \u003d fig.add_subplot (111, projection \u003d "3d ") # ax.scatter (xs, ys, zs) # ax.set_xlabel (" X ") # ax.set_ylabel (" Y ") # ax.set_zlabel (" Z ") # plt.show () From matplotlib import rc # enable Russian symbols on the chart font \u003d ("family": "Verdana") #, "weigth": "normal") rc ("font", ** font) # colors can be generated for clusters import random def generate_colors (n): color_list \u003d for c in range (0, n): r \u003d lambda: random.randint (0,255) color_list.append ("#% 02X% 02X% 02X"% (r (), r (), r ())) return color_list # set the colors of cluster_colors \u003d (0: "# ff0000", 1: "# ff0066", 2: "# ff0099", 3: "# ff00cc", 4: "# ff00ff",) # give the names to the clusters, but because of the randomness let it be just 01234 cluster_names \u003d (0: "0", 1: "1", 2: "2", 3: "3", 4: "4",) #matplotlib inline # create a data frame that contains the coordinates (from the PCA) + cluster numbers and the queries themselves df \u003d pd.DataFrame (dict (x \u003d xs, y \u003d ys, label \u003d clusterkm, title \u003d titles)) # group by cluster m groups \u003d df.groupby ("label") fig, ax \u003d plt.subplots (figsize \u003d (72, 36)) #figsize is tailored to your taste for name, group in groups: ax.plot (group.x, group. y, marker \u003d "o", linestyle \u003d "", ms \u003d 12, label \u003d cluster_names, color \u003d cluster_colors, mec \u003d "none") ax.set_aspect ("auto") ax.tick_params (axis \u003d "x", which \u003d "both", bottom \u003d "off", top \u003d "off", labelbottom \u003d "off") ax.tick_params (axis \u003d "y", which \u003d "both", left \u003d "off", top \u003d "off" , labelleft \u003d "off") ax.legend (numpoints \u003d 1) # show the legend only 1 point # add labels / names in x, at the position with the search query #for i in range (len (df)): # ax.text (df.ix [i] ["x"], df.ix [i] ["y"], df.ix [i] ["title"], size \u003d 6) # show the graph plt.show () plt .close () Example with 10 clusters Not quite what I would expect. We will use mpld3 to translate the drawing into an interactive graph. # Plot fig, ax \u003d plt.subplots (figsize \u003d (25.27)) ax.margins (0.03) for name, group in groups_mbk: points \u003d ax.plot (group.x, group.y, marker \u003d "o" , linestyle \u003d "", ms \u003d 12, # ms \u003d 18 label \u003d cluster_names, mec \u003d "none", color \u003d cluster_colors) ax.set_aspect ("auto") labels \u003d tooltip \u003d mpld3.plugins.PointHTMLTooltip (points, labels, voffset \u003d 10, hoffset \u003d 10, # css \u003d css) mpld3.plugins.connect (fig, tooltip) #, TopToolbar () ax.axes.get_xaxis (). set_ticks () ax.axes.get_yaxis (). set_ticks () # ax.axes.get_xaxis (). set_visible (False) # ax.axes.get_yaxis (). set_visible (False) ax.set_title ("Mini K-Means", size \u003d 20) #groups_mbk ax.legend (numpoints \u003d 1 ) mpld3.disable_notebook () # mpld3.display () mpld3.save_html (fig, "mbk.html") mpld3.show () # mpld3.save_json (fig, "vivod.json") # mpld3.fig_to_html (fig) fig , ax \u003d plt.subplots (figsize \u003d (51.25)) scatter \u003d ax.scatter (np.random.normal (size \u003d N), np.random.normal (size \u003d N), c \u003d np.random.random (size \u003d N), s \u003d 1000 * np.random.random (size \u003d N), alpha \u003d 0.3, cmap \u003d plt.cm.jet) ax.grid (color \u003d "white", linestyle \u003d "solid") ax.set_title ("Clusters", size \u003d 20) fig, ax \u003d plt.subplots (figsize \u003d (51.25)) labels \u003d ["point (0)". format ( i + 1) for i in range (N)] tooltip \u003d mpld3.plugins.PointLabelTooltip (scatter, labels \u003d labels) mpld3.plugins.connect (fig, tooltip) mpld3.show () fig, ax \u003d plt.subplots (figsize \u003d (72.36)) for name, group in groups: points \u003d ax.plot (group.x, group.y, marker \u003d "o", linestyle \u003d "", ms \u003d 18, label \u003d cluster_names, mec \u003d " none ", color \u003d cluster_colors) ax.set_aspect (" auto ") labels \u003d tooltip \u003d mpld3.plugins.PointLabelTooltip (points, labels \u003d labels) mpld3.plugins.connect (fig, tooltip) ax.set_title (" K-means " , size \u003d 20) mpld3.display () If you want in 3D and with a zoom, then there is a Plotly service, which has a plugin for Python. Plotly 3D # for example, just a 3D graph from the obtained values \u200b\u200bimport plotly plotly .__ version__ import plotly.plotly as py import plotly.graph_objs as go trace1 \u003d go.Scatter3d (x \u003d xs, y \u003d ys, z \u003d zs, mode \u003d "markers", marker \u003d dict (size \u003d 12, line \u003d dict (color \u003d "rgba (217, 217, 217, 0.14), width \u003d 0.5), opacity \u003d 0.8)) data \u003d layout \u003d go.Layout (margin \u003d dict (l \u003d 0, r \u003d 0, b \u003d 0, t \u003d 0)) fig \u003d go.Figure (data \u003d data, layout \u003d layout) py.iplot (fig, filename \u003d "cluster-3d-plot") Results can be seen here: Example And the final point is to perform hierarchical (agglomerative) clustering according to the Ward method to create a dendogram. In: from scipy.cluster.hierarchy import ward, dendrogram linkage_matrix \u003d ward (dist) fig, ax \u003d plt.subplots (figsize \u003d (15, 20)) ax \u003d dendrogram (linkage_matrix, orientation \u003d "right", labels \u003d titles) ; plt.tick_params (\\ axis \u003d "x", which \u003d "both", bottom \u003d "off", top \u003d "off", labelbottom \u003d "off") plt.tight_layout () # save the picture plt.savefig ("ward_clusters2. png ", dpi \u003d 200) Unfortunately, in the field of natural language research there are a lot of unresolved issues and not all data is easy and simple to group into specific groups. But I hope that this guide will increase interest in this topic and provide a basis for further experiments. |
Read: |
---|
Popular:
New
- See what the "Mediator" is in other dictionaries
- How to open a farm from scratch
- Why your ad is not working
- Is it profitable to breed sheep
- Description of the main types of management
- What is the plan for building an online business promotion?
- How and where to sell things on the Internet?
- Profitability of home turkey breeding business
- Marketing department what does
- Benefits of Green Giant Brand Products