import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Title st.markdown('
Grammar Analysis & Dependency Parsing
', unsafe_allow_html=True) # Introduction Section st.markdown("""

Understanding the grammatical structure of sentences is crucial in Natural Language Processing (NLP) for various applications such as translation, text summarization, and information extraction. This page focuses on Grammar Analysis and Dependency Parsing, which help in identifying the grammatical roles of words in a sentence and the relationships between them.

We utilize Spark NLP, a robust library for NLP tasks, to perform Part-of-Speech (POS) tagging and Dependency Parsing, enabling us to analyze sentences at scale with high accuracy.

""", unsafe_allow_html=True) # Understanding Dependency Parsing st.markdown('
Understanding Dependency Parsing
', unsafe_allow_html=True) st.markdown("""

Dependency Parsing is a technique used to understand the grammatical structure of a sentence by identifying the dependencies between words. It maps out relationships such as subject-verb, adjective-noun, etc., which are essential for understanding the sentence's meaning.

In Dependency Parsing, each word in a sentence is linked to another word, creating a tree-like structure called a dependency tree. This structure helps in various NLP tasks, including information retrieval, question answering, and machine translation.

""", unsafe_allow_html=True) # Implementation Section st.markdown('
Implementing Grammar Analysis & Dependency Parsing
', unsafe_allow_html=True) st.markdown("""

The following example demonstrates how to implement a grammar analysis pipeline using Spark NLP. The pipeline includes stages for tokenization, POS tagging, and dependency parsing, extracting the grammatical relationships between words in a sentence.

""", unsafe_allow_html=True) st.code(''' import sparknlp from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline import pyspark.sql.functions as F # Initialize Spark NLP spark = sparknlp.start() # Stage 1: Document Assembler document_assembler = DocumentAssembler()\\ .setInputCol("text")\\ .setOutputCol("document") # Stage 2: Tokenizer tokenizer = Tokenizer().setInputCols(["document"]).setOutputCol("token") # Stage 3: POS Tagger postagger = PerceptronModel.pretrained("pos_anc", "en")\\ .setInputCols(["document", "token"])\\ .setOutputCol("pos") # Stage 4: Dependency Parsing dependency_parser = DependencyParserModel.pretrained("dependency_conllu")\\ .setInputCols(["document", "pos", "token"])\\ .setOutputCol("dependency") # Stage 5: Typed Dependency Parsing typed_dependency_parser = TypedDependencyParserModel.pretrained("dependency_typed_conllu")\\ .setInputCols(["token", "pos", "dependency"])\\ .setOutputCol("dependency_type") # Define the pipeline pipeline = Pipeline(stages=[ document_assembler, tokenizer, postagger, dependency_parser, typed_dependency_parser ]) # Example sentence example = spark.createDataFrame([ ["Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul."] ]).toDF("text") # Apply the pipeline result = pipeline.fit(spark.createDataFrame([[""]]).toDF("text")).transform(example) # Display the results result.select( F.explode( F.arrays_zip( result.token.result, result.pos.result, result.dependency.result, result.dependency_type.result ) ).alias("cols") ).select( F.expr("cols['0']").alias("token"), F.expr("cols['1']").alias("pos"), F.expr("cols['2']").alias("dependency"), F.expr("cols['3']").alias("dependency_type") ).show(truncate=False) ''', language='python') # Example Output st.text(""" +------------+---+------------+---------------+ |token |pos|dependency |dependency_type| +------------+---+------------+---------------+ |Unions |NNP|ROOT |root | |representing|VBG|workers |amod | |workers |NNS|Unions |flat | |at |IN |Turner |case | |Turner |NNP|workers |flat | |Newall |NNP|say |nsubj | |say |VBP|Unions |parataxis | |they |PRP|disappointed|nsubj | |are |VBP|disappointed|nsubj | |' |POS|disappointed|case | |disappointed|JJ |say |nsubj | |' |POS|disappointed|case | |after |IN |talks |case | |talks |NNS|disappointed|nsubj | |with |IN |stricken |det | |stricken |NN |talks |amod | |parent |NN |Mogul |flat | |firm |NN |Mogul |flat | |Federal |NNP|Mogul |flat | |Mogul |NNP|stricken |flat | +------------+---+------------+---------------+ """) # Visualizing the Dependencies st.markdown('
Visualizing the Dependencies
', unsafe_allow_html=True) st.markdown("""

For a visual representation of the dependencies, you can use the spark-nlp-display module, an open-source tool that makes visualizing dependencies straightforward and easy to integrate into your workflow.

First, install the module with pip:

pip install spark-nlp-display

Then, you can use the DependencyParserVisualizer class to create a visualization of the dependency tree:

""", unsafe_allow_html=True) st.code(''' from sparknlp_display import DependencyParserVisualizer # Initialize the visualizer dependency_vis = DependencyParserVisualizer() # Display the dependency tree dependency_vis.display( result.collect()[0], # single example result pos_col="pos", dependency_col="dependency", dependency_type_col="dependency_type", ) ''', language='python') st.image('images/DependencyParserVisualizer.png', caption='The visualization of dependencies') st.markdown("""

This code snippet will generate a visual dependency tree like shown above for the given sentence, clearly showing the grammatical relationships between words. The spark-nlp-display module provides an intuitive way to visualize complex dependency structures, aiding in the analysis and understanding of sentence grammar.

""", unsafe_allow_html=True) # Model Info Section st.markdown('
Choosing the Right Model for Dependency Parsing
', unsafe_allow_html=True) st.markdown("""

For dependency parsing, the models "dependency_conllu" and "dependency_typed_conllu" are used. These models are trained on a large corpus and are effective for extracting grammatical relations between words in English sentences.

To explore more models tailored for different NLP tasks, visit the Spark NLP Models Hub.

""", unsafe_allow_html=True) # References Section st.markdown('
References
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # Community & Support Section st.markdown('
Community & Support
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # Quick Links Section st.markdown('
Quick Links
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True)