Spaces:
Sleeping
Sleeping
anas-awadalla
commited on
Commit
·
f947031
1
Parent(s):
63b0be5
demo
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- app.py +3 -3
- interleaved/0268b91c-d8cb-444e-bf70-26fc54262247.json +0 -0
- interleaved/05550f0f-3e27-4481-b3cf-676a221a8694.json +0 -0
- interleaved/057ccf20-02a3-4266-a79a-8a9779c45720.json +0 -1
- interleaved/183cde06-53bd-4407-b24a-d295c076a7ac.json +0 -1
- interleaved/1f3871b0-6748-4f39-bb0b-2ede13792e64.json +0 -1
- interleaved/1fc58fee-418f-4a85-a2a3-03d04325f176.json +0 -1
- interleaved/250a5b7d-e025-4204-a47e-6b21cd29cf0b.json +0 -1
- interleaved/2d328561-a846-46aa-a53d-6d844ae1f60a.json +0 -1
- interleaved/2f8c1d56-6be3-415a-9358-b481e8a65f39.json +0 -1
- interleaved/30739d01-e7eb-425f-a4af-b10691602f27.json +0 -1
- interleaved/31b0ec11-cd68-49bf-9782-a873465d616e.json +0 -1
- interleaved/31d06618-214b-4e98-a0b2-fa0db92a38fb.json +0 -1
- interleaved/31f4cbea-1545-4d39-9cbb-96dfdf153c54.json +0 -1
- interleaved/32af7d79-4044-46c0-8c74-8d556f6ef048.json +0 -1
- interleaved/3517ec99-6b25-47bd-a632-1c2460eb2dd2.json +0 -1
- interleaved/353218de-9e77-47e5-bc6e-a92ef77e1d8a.json +0 -1
- interleaved/35f92248-c552-4a88-8d96-5177e36fa323.json +0 -0
- interleaved/4e500ebb-cd91-423c-a953-8937940777d7.json +0 -1
- interleaved/4e6a6eb2-7bdd-4b79-bf64-540c0b5baf04.json +0 -1
- interleaved/4edabd45-dc9b-4fbf-b0db-3960b3e55dd1.json +0 -1
- interleaved/51533f23-be07-46e7-aba8-0a8bb9876a31.json +0 -1
- interleaved/53cdb676-2c21-40c3-94c7-751c004486ed.json +0 -1
- interleaved/5633735f-7e43-4d1f-884b-6b5b094bbdeb.json +0 -0
- interleaved/6435bf60-e53f-4d7e-a4a2-e46caaab519f.json +0 -1
- interleaved/6617c785-e5e7-4d8b-bb2d-5b06a20c4af3.json +0 -1
- interleaved/69f52948-906f-4d25-91be-ed91018fe5a8.json +0 -1
- interleaved/71fce5d5-32d0-4fe2-ac14-04e5fd0e5c5d.json +0 -1
- interleaved/7574a956-538e-45fc-b320-95feb03ad24a.json +0 -1
- interleaved/791770e7-09e9-4143-ad37-1ea7b380b96a.json +0 -1
- interleaved/7abd84cd-4c9a-4e7a-9e49-d40137c8f19d.json +0 -1
- interleaved/7af41fa6-8a38-4cbc-94f5-9be61687a40f.json +0 -1
- interleaved/7c1a651f-7521-4d43-ac2d-89e7d6358012.json +0 -0
- interleaved/86d16242-9603-42dd-bc6d-bdf9bdf9de04.json +0 -0
- interleaved/8f9ee91b-a578-419a-8870-763dab8950b5.json +0 -0
- interleaved/960eac75-a38c-45ba-93e3-a2e5c089d991.json +0 -1
- interleaved/9c83aa6b-5fdf-4aa7-94d5-986f6fe45433.json +0 -0
- interleaved/9ccc1adf-0e6a-49c6-9ebd-1206072217c3.json +0 -1
- interleaved/9fe3fdc9-6c09-4a04-8a84-eb04c9d865e8.json +0 -0
- interleaved/a5f98735-69c2-4045-ae2a-cf0b7d14b293.json +0 -1
- interleaved/a8d84fbd-67c2-4757-946c-4e588a705707.json +0 -1
- interleaved/b5daf6d0-df91-4fac-9835-4deefa717737.json +0 -1
- interleaved/b9950160-41cc-4149-9024-2a76cc042ff4.json +0 -0
- interleaved/ba26c516-c466-4b3d-9a9f-7394934d40e5.json +0 -1
- interleaved/c02f9657-d44c-438e-98b2-8ec7fda3b4b6.json +0 -0
- interleaved/c5a529f5-8540-4ff6-8e70-ccf5b47cf642.json +0 -1
- interleaved/c765e44b-e539-413e-a890-08f557e2ee5d.json +0 -1
- interleaved/dcd0961b-ade9-42f3-a3e1-36d8ad815e8d.json +0 -0
- interleaved/df14c47a-2a4c-4414-af6d-e2fef9e76757.json +0 -1
- interleaved/e0216c74-8b09-4cfc-bf19-7b8bdb07cb21.json +0 -1
app.py
CHANGED
@@ -26,7 +26,7 @@ def display_image_caption_pairs(json_data):
|
|
26 |
st.markdown(f"**Caption:** {caption}")
|
27 |
|
28 |
def display_interleaved_text_and_images(text, images_bytes):
|
29 |
-
pattern = r'<img
|
30 |
segments = re.split(pattern, text)
|
31 |
for i, segment in enumerate(segments):
|
32 |
st.markdown(segment)
|
@@ -77,8 +77,8 @@ def main():
|
|
77 |
display_image_caption_pairs(json_data)
|
78 |
else:
|
79 |
images_bytes = json_data['images']
|
80 |
-
# if there are no images remove the json file
|
81 |
-
# if len(images_bytes) ==
|
82 |
# print(f"Removing {selected_file}")
|
83 |
# os.remove(selected_file)
|
84 |
# st.session_state.file_index -= 1
|
|
|
26 |
st.markdown(f"**Caption:** {caption}")
|
27 |
|
28 |
def display_interleaved_text_and_images(text, images_bytes):
|
29 |
+
pattern = r'<img[^>]*>'
|
30 |
segments = re.split(pattern, text)
|
31 |
for i, segment in enumerate(segments):
|
32 |
st.markdown(segment)
|
|
|
77 |
display_image_caption_pairs(json_data)
|
78 |
else:
|
79 |
images_bytes = json_data['images']
|
80 |
+
# # if there are no images remove the json file
|
81 |
+
# if len(images_bytes) <= 1 or (len(images_bytes[0]) == 1 and len(images_bytes)==1) or json_data['txt'] == "":
|
82 |
# print(f"Removing {selected_file}")
|
83 |
# os.remove(selected_file)
|
84 |
# st.session_state.file_index -= 1
|
interleaved/0268b91c-d8cb-444e-bf70-26fc54262247.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/05550f0f-3e27-4481-b3cf-676a221a8694.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/057ccf20-02a3-4266-a79a-8a9779c45720.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/183cde06-53bd-4407-b24a-d295c076a7ac.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/1f3871b0-6748-4f39-bb0b-2ede13792e64.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/1fc58fee-418f-4a85-a2a3-03d04325f176.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "\\label{sec:concept-architecture}\n\nIn this section, we first introduce the general concept of our UG and the associated design challenges in the context of aerial manipulation.\nWe then take an in-depth look at the electro-mechanical, pneumatic, and software components.\n\n\n\\subsection{Concept}\nMultirotor platforms come with many benefits but also with a set of limitations. The most relevant ones are their limited payload capability, the constrained volume for attachments, their underactuated nature, and the challenging dynamics coupling. \nThe dynamics coupling is particularly important for aerial systems carrying manipulators \\cite{Kremer2022}, \nbut it also poses a problem for simpler 'claw' setups where only the grasping element gets in contact with the environment.\nElastic elements inserted in the construction of the grasping device efficiently reduce the dynamic coupling by softening the hard socks associated with typical grasping operations. \nThose elastic elements are inherently present in UGs as represented by their soft membrane.\nUGs are thus an ideal fit, provided they can be constructed to fit the size, weight and power envelope of aerial platforms.\n\nOur proof-of-concept aerial platform is a medium-sized (wheelbase of \\SI{430}{\\milli\\meter}), modified \\textit{AscTec Firefly} hexacopter with a maximum payload capacity of \\SI{1}{\\kilo\\gram}. \nThis airframe conveniently features a cargo bay measuring $\\SI{120}{\\milli\\meter} \\times \\SI{120}{\\milli\\meter}$, which is used as the anchor point for TRIGGER.\nThis particular mounting scheme with the gripper oriented toward the bottom is commonly called a 'claw'.\n\nCompatibility with state-of-the-art autopilots (e.g., Pixhawk) is assured by either directly connecting the gripper to the autopilot via UART or by connecting it to the corresponding companion computer using USB.\nFor ease of integration, our concept envisions being directly powered by the UAV's main 3S-4S battery, which eliminates carrying an additional battery.\nLightweight construction, modularity, and tight integration of the electronics, the sensors, and the software are the driving concepts of TRIGGER.\n\nTo make our work easily reproducible, accessible, and low-cost (below $\\$100$, without the manufacturing equipment), we limited our design to widely available and inexpensive manufacturing techniques, where a Fused Deposition Modeling (FDM) printer and a high-power single-stage vacuum pump represent the bulk of the cost. Furthermore, we designed our grasper around customary off-the-shelf parts.\n\nThe complete grasping system is detailed in \\cref{fig:gripper-asm}. \nIts major subsystems are explained hereafter.\n\n\n\n\\subsection{Pneumatics and Mechanics}\nThe role of the pneumatic system is twofold: \\begin{enumerate*}\n \\item to pressurize the membrane and thus allow the contained granular material to flow easily within the free, air-filled volume;\n \\item to vaccumize the membrane and consequently jam (i.e., solidify) the granular material\n\\end{enumerate*}.\n\n\nUGs can be realized in two distinct topologies, i.e., either as closed-loop or open-loop systems. \nIn closed-loop systems, the fluid surrounding the granular material stays contained within the system. An example of such a system is the magnetorheological fluid-based UG shown in \\cite{Nishida2016} for the hydraulic UG presented in \\cite{Sakuma2018}.\nGenerally, these systems have the main disadvantage that the fluid has to stay contained within the system (e.g., in tanks that add weight and cost) and that leakage must be considered as a critical failure mode.\nOn the other hand, open-loop systems exchange their fluid with their environment. \nThe operating fluid in that case is thus typically air, resp. water for underwater applications \\cite{Licht2016}.\nThose systems have the salient advantage that their fluid is abundantly present in their surroundings, which eliminates the storage needs and reduces the severity of leakage, e.g., due to membrane rupture.\nOpen-loop systems are generally better suited for lightweight construction and require less engineering effort.\n\nTherefore, the pneumatic system presented in this paper (\\cref{fig:pneumatic-arrangement}) has an open-loop structure and uses air as its operating fluid.\nIt consists of two small, non-reversible diaphragm pumps (P1, P2) coupled to two pneumatic solenoid 2/1-way valves (V1, V2). \nThe air pressure in the system is measured by the Microelectromechanical System (MEMS) pressure sensor $P$.\nThis particular setup is very low cost and has a favorable mass distribution due to symmetry.\nBy design, diaphragm pumps act as one-way check-valves, not restricting the airflow in their nominal direction, which therefore requires closing the valve associated with the antagonistic pump such that they can establish a pressure differential.\nThis particular topology also permits to seal off the system.\nThe membrane can thus remain pressurized (resp. in a state of vacuum) without powering the pumps, which saves energy.\n\nWe utilize two 12V, \\SI{7}{\\watt}, \\texttt{SC3704PM} diaphragm pumps rated for a pressure differential of \\SI{46}{\\kilo\\pascal} at \\SI{2}{\\liter\\per\\minute}. The miniature 2/1 air valves are of type \\texttt{SCO520FVG}.\nOur low-power pneumatic system typically consumes less than \\SI{10}{\\watt}, contrary to other systems frequently featured in the literature, which are using heavy (more than \\SI{1}{\\kilo\\gram}) stationary, high-power vacuum pumps in the \\SI{500}{\\watt} range and reaching pressure differentials beyond \\SI{80}{\\kilo\\pascal} \\cite{Brown2010}, \\cite{Santarossa2021}.\nNote that our lower-power system naturally comes with longer cycle times and a lower maximum pressure differential (we measured approximately \\SI{28}{\\kilo\\pascal}), $3\\times$ lower than conventional solutions.\nHowever, we will show in \\cref{sec:experiments} that this does not adversely affect the performance of the UG.\n\n\n\n\nConcerning the mechanical structure, our modular design approach is shown in \\cref{fig:gripper-asm,fig:gripper-balloon-module}. It consists of three larger sub-assemblies, namely \\begin{enumerate*}\n \\item the base, containing the pumps, valves, and controller board,\n \\item the gripper-floor, forming the interface between the pneumatic system and the detachable membrane module,\n \\item the membrane module, which firmly holds onto the filled, custom silicone membrane. \n It contains a paper filter that seals off the filler material from the environment while permitting air to circulate freely.\n A mechanical support structure prevents it from tearing under load.\n The membrane module is firmly pressed against the cast-in-place silicone seal on the gripper-floor by screwing the wedge onto the external printed thread to create an air-tight seal.\n\\end{enumerate*}\n\n\n\n\nThis modular concept has three main advantages: \\textit{first}, it enables quick iteration on membrane module designs; \\textit{second}, it allows to quickly and effortlessly swap between different membrane modules during the tests; \\textit{third}, it enables the platform to be compatible with different types of grippers, given that there are some geometries that cannot be picked up by a UG (e.g., large flat surfaces), which require highly specialized grippers such as vacuum cups.\n\nOur UG is designed to be mounted like a 'claw' on a multirotor; therefore, it does double duty, i.e., it operates as a gripper but also serves as the landing gear. \nAs such, it is dimensioned to withstand the total weight and impact of a landing UAV, which comes with several advantages that we discuss in \\cref{sec:discussion}.\n\n\n\n\n\\subsection{Material Selection}\nOur membrane is made from the soft silicone rubber \\textit{Trollfactory Type 23}, shore hardness 10 A with \\SI{600}{\\percent} elongation at break.\nThe reasons for selecting such a soft rubber are twofold: \\textit{first}, it allows us to widen the tolerances on the membrane's thickness as small deviations no longer have a significant impact on the overall stiffness;\n\\textit{second}, it maximizes the contact area between the membrane and the payload and, therefore, the quality of the grasp is increased. \nFurthermore, this particular silicone can be mixed with a silicon additive called \\textit{deadener} (also sometimes referred to as \\textit{slacker}), which gives the silicone more human skin-like physical properties.\nThis further increases the softness of the material and, more importantly, makes it sticky.\nThe intensity of those effects is controlled by the relative amount of additive added to the mixture.\nThis specific silicone rubber is very viscous (\\SI{14}{\\pascal\\second}) and thus does not flow easily.\nThis aspect has to be considered for the mold design and casting process to avoid trapping air inside the mold and thus creating voids in the thin membrane.\n\nFor the printed structural parts, PET-G was chosen over PLA for its higher impact resistance and lower density. \nFurthermore, PLA is prone to creep under sustained load. \nThe structural parts would not benefit from high-end polymers such as PA6-CF or PEEK as there are no special requirements concerning the stiffness or heat resistance that could motivate such a choice.\n\nAiming for a lightweight design, we choose EPS as filler material as it has a density of only \\SI{17}{\\gram\\per\\liter}, which is by an order of magnitude lower than other commonly used materials such as ground coffee or glass beads (\\cref{tab:filler-materials}).\nMoreover, the soft EPS particles develop higher holding forces than rigid particles due to the squeezing effect, which is a result of the elasticity of the EPS beads themselves \\cite{Santarossa2021}.\n\n\\begin{table}\n\\centering\n\\begin{tabular}{lll}\n\\toprule\nMaterial & Density (\\si{\\gram\\per\\liter}) & Particle Size (\\si{\\milli\\meter}) \\\\ \\midrule\nEPS & 17 & 1-4 \\\\\nCoffee & 308 & 0.2-2 \\\\\nPolymer & 940 & 0.1-0.2 \\\\\nGlass & 2500 & 0.2-0.4 \\\\ \\bottomrule\n\\end{tabular}\n\\caption{Comparison of filler materials. EPS has by far the lowest density.}\n\\label{tab:filler-materials}\n\\end{table}\n\n\n\\subsection{Fabrication}\nBased on our previous experience with silicone \\cite{ZhiliChenHamedRahimiNohooji2016}, we chose a silicone casting process to create the membrane. \nWe created a three-part mold (i.e., left and right shell, plus core) printed from PET-G using a common FDM 3D printer. \nOur approach is similar to \\cite{Sakuma2018}; however, due to the very thin \\SI{0.6}{\\milli\\meter} membrane and the high viscosity of the silicone rubber, the process had to be adapted. \nMore precisely, instead of pouring the silicone into the mold, we inject it directly through the core using a syringe (\\cref{fig:mold-casting}). \nThis technique enables very thin-walled castings (assuming proper alignment of the shells). \nBut, more importantly, it allows the silicone mixture to spread evenly with a fairly low risk of catching air bubbles in the process.\nThe usual precautions should be taken when working with silicone, such as properly degassing the silicone after mixing.\nOur membrane has a nominal diameter of \\SI{80}{\\milli\\meter}, a nominal thickness of \\SI{0.6}{\\milli\\meter}, a height of \\SI{60}{\\milli\\meter}, an encompassing volume of \\SI{0.2}{\\liter} and a total mass of only \\SI{18}{\\gram} (without filler).\n\n\n\nThe structural parts were also fabricated from PET-G using FDM printing. \nThe resulting parts have proven to be sufficiently airtight using optimal print settings. \nAt the mating point of two structural parts (i.e., \\MARKERCIRCLE{5} and \\MARKERCIRCLE{9} in \\cref{fig:gripper-balloon-module}), a silicone gasket is introduced that assures an air-tight connection between the two parts.\n\nOur filler material consists of a mixture of Expanded Polystyrene (EPS) beads of various sizes ranging from \\SI{1}{\\milli\\meter} to \\SI{4}{\\milli\\meter}. \nContrary to rigid filler materials such as glass beads, the softness of the particles gives birth to a squeezing effect which is reported to increase the holding force within certain limits \\cite{Santarossa2021}. \nAnother consideration for the choice of the filler material was the density or, more precisely, the resulting weight of the filled membrane. \nEPS beads have a very low density of approx. \\SI{17}{\\gram\\per\\liter} and thus do not add much mass to the system. \nOther materials such as ground coffee with a density of \\SI{308}{\\gram\\per\\liter} or glass beads with \\SI{2500}{\\gram\\per\\liter} result in significant extra weight.\nWe added \\SI{2.2}{\\gram} filler material (\\SI{0.13}{\\liter}) to the membrane corresponding to a fill ratio of \\SI{66}{\\percent}.\n\nThe total mass of the assembly (\\SI{380}{\\gram}) is distributed among the different components as shown in \\cref{fig:mass-distribution}. \nThe pneumatic system represents the bulk of the mass (\\SI{160}{\\gram}), followed by the structural plastic parts (\\SI{115}{\\gram}) and the fasteners (\\SI{35}{\\gram}), fittings and tubing (less than \\SI{8}{\\gram}).\nThe mass added by the filler material (\\SI{2.2}{\\gram}) is, however, negligible.\n\n\n\n\\subsection{Electronics and Firmware}\n\nThe system depicted in the block diagram in \\cref{fig:sensors-and-controller} is implemented on a single, completely custom $\\SI{47}{\\milli\\meter}\\times\\SI{47}{\\milli\\meter}$ controller board which is shown in \\cref{fig:gripper-asm}. \nIt is designed to work and integrate easily with common UAV hardware. \nAs such, it can be powered directly from the main power bus of the drone.\nFurthermore, it features USB and UART serial ports for communication with an autopilot or an off-board computer.\n\nAt the heart of the controller is an ultra-low power STM32L1 microcontroller that does the logic processing, collection/processing of the sensor data, the communication with the off-board peripherals, and the control of the quad-channel motor driver that powers the pneumatic hardware.\n\nDue to the low power requirements of the controller (less than \\SI{50}{\\milli\\ampere} at \\SI{12}{\\volt}) we have favored linear DC/DC regulators over switching converters for the $\\SI{5}{\\volt}$ and $\\SI{3.3}{\\volt}$ rails as the latter greatly increase the design complexity and cost. \nThe output stage (valves and pumps) is directly powered by the main power bus. \nCurrent chopping motor drivers ensure that each actuator operates at its nominal operating point regardless of the bus voltage.\n\n\nThe load cell and the onboard air pressure sensor provide the required data for the system to monitor itself and to work autonomously.\n\nThe processed sensor readings are exposed via serial to enable more advanced applications.\nSuch applications include activation force tracking, the possibility of feeding back the weight of the grasped payload to the controller as a known disturbance, and the detection of a successful or unsuccessful grasp after takeoff based on the load cell readings.\nWe refer to the measured force $F_m$ in gram-force '\\SI{}{\\gramforce}' and the measured pressure as $P$ in '\\SI{}{\\kilo\\pascal}'.\nThis enables applications such as controlling the activation force and also empowers the internal logic to control the pressure inside the membrane and to prevent conditions such as membrane rupture due to over-pressure and to assure a consistent air pressure while approaching the payload.\n\nWe define two pressure thresholds, namely $P_{min}=\\SI{-21}{\\kilo\\pascal}$, the lower trigger point, and $P_{max}=\\SI{0.5}{\\kilo\\pascal}$ the upper trigger point.\nThose trigger points are used to switch reliably between the 'closed' and 'opened' states of the gripper.\nIn particular, $P \\geq P_{max}$ signals that the membrane is full and any additional air would stretch the membrane (consequently increasing the internal pressure). \n$P \\leq P_{min}$ signals that a vacuum is established and thus the gripper is considered 'closed'.\n\nThe firmware on the MCU is making use of \\textit{FreeRTOS}, running two tasks using preemptive multitasking as shown in \\cref{alg:tasks}. \nTask 1 handles sensors and actuation, and task 2 handles serial communication.\nInter-task communication takes place over thread-safe FIFO queues. \nFor the underlying state machine (automaton), we direct the reader to the \\hyperref[sec:appendix]{Appendix}.\n\n\n\n\n\\begin{algorithm}[tb]\n \n \\caption{FreeRTOS, sensor acquisition and actuation}\n \\label{alg:tasks}\n \\begin{algorithmic}\n \\Procedure{task 1: sensors and actuation}{}\n \\State let $k_{gr}$ be the current gripper state\n \\State let $S(k_{gr})$ be the automaton in \\cref{fig:automaton}\n \\State let $f_1$, $f_2$ be lowpass FIR filters\n \\Loop\n \\State collect push button states $\\mathbf{u_{bt}}$\n \\State fetch raw sensor data $P^*$, $F_m^*$\n \\State process sensor data $P \\gets f_1(P^*)$, $F_m \\gets f_2(F_m^*)$\n \\State create state vector $\\mathbf{q} \\gets (t, k_{gr}, P, F_m)$\n \\State process automaton $\\mathbf{u_a} \\gets S(\\mathbf{q}, \\mathbf{u})$)\n \\State apply $\\mathbf{u_a}$ to actuators \n \\EndLoop\n \\EndProcedure\n \\end{algorithmic}\n \\begin{algorithmic}\n \\Procedure{task 2: communication}{}\n \\Loop\n \\State outbound communication, send $\\mathbf{q}$\n \\State inbound communication, receive $\\mathbf{u_{usr}}$\n \\State create command vector $\\mathbf{u} \\gets (\\mathbf{u_{usr}}, \\mathbf{u_{bt}})$\n \\EndLoop\n \\EndProcedure\n \\end{algorithmic}\n\\end{algorithm}\n\n\n\\subsection{Grasping Procedure}\n\\label{sec:grasping-procedure}\n\n\n\n\nAlthough usage of our UG by hand is straightforward and allows grasping of a variety of shapes and materials (see \\cref{fig:grasped-objects}), a defined grasping procedure is required for our aerial platform such that successful grasps can be achieved without relying on human intuition.\nTypically this procedure consists of four main steps (\\cref{fig:procedure}):\n\\begin{enumerate}\n \\item The grasp starts by pushing the fluidized gripper against the payload. \n Doing so elastically deforms the membrane and the filler material flows freely, distributing itself around the payload. \n At this point, valves V1 and V2 are still closed such that the free volume remains unchanged. \n The evacuation phase is then triggered once the measured force reaches the desired activation force, i.e., $F_{m} \\geq F_a$. \n \\item Evacuating the air out of the membrane takes a couple of seconds (governed by the flow rate of the pumps). \n During that period, the membrane shrinks, and the contact force drops in response to that unless the gripper is further moved toward the payload. \n In the context of low activation forces, it is essential to keep good contact with the payload. \n Failure to do so will lead to a poor or unsuccessful grasp as the filler hardens without properly surrounding the payload. \n We thus track the nominal activation force during this interval. \n Other publications in this field usually avoid this step by pushing the gripper with a very high force into the payload, e.g., with \\SI{17}{\\newton} as seen in \\cite{GomezPaccapelo2021}, which is, however, not possible with most small to medium aerial systems.\n \\item Once the membrane's internal pressure satisfies $P \\leq P_{min}$ (vacuum), the grasping procedure is considered completed. The gripper is then retracted from the payload (here, at a constant velocity). \n Since the payload is fixated on the support and cannot be lifted, a negative force is measured corresponding to the holding force $F_h$. \n In practice, with the gripper mounted on a UAV, instead of the holding force, the actual weight of the lifted payload would be measured, which could be fed back into the autopilot.\n \\item Releasing the payload (i.e., opening resp. resetting the gripper) is achieved by pumping air into the membrane until $P \\geq P_{max}$ is reached, then closing valves V1 and V2. \n The gripper is now ready to grasp the next object.\n\\end{enumerate}\n\nThe activation force $F_{a}$ is an essential quantity for a successful grasping operation.\nAn insufficient activation force causes the grasping operation to fail.\nOn the other hand, choosing $F_{a}$ too high may destabilize the aerial platform and also cause damage to the force sensor and membrane. \nTherefore, the optimal activation force has to be chosen sufficiently high not to sacrifice performance but also as low as possible to minimize the impact on the aerial platform.\n\n", "images": []}
|
|
|
|
interleaved/250a5b7d-e025-4204-a47e-6b21cd29cf0b.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/2d328561-a846-46aa-a53d-6d844ae1f60a.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "\\label{sec:aerial-application}\nThis section briefly presents a pick-and-release manipulation task with the UG attached to a UAV.\nThe overall goal of this experiment is to validate the fundamental concept shown in \\cref{fig:concept}.\n\nOur proof-of-concept platform based on the frame of the \\textit{AscTec Firefly} features a \\textit{Raspberry Pi 3} with a \\textit{Navio 2} running the \\textit{PX4} autopilot. \nWe created a custom \\textit{PX4} firmware module to communicate with the gripper and to link one of the remote control channels to trigger its opening resp, closing state transition.\nHerein, the UAV is carefully manually piloted in position control mode, relying on human intuition to keep a sufficient amount of activation force.\n\nThe whole experiment is pictured in \\cref{fig:aerial-application} and the supplementary video is available online \\footnote{Supplementary video: \\href{https://youtu.be/Az5bXnZUNlY}{https://youtu.be/Az5bXnZUNlY}}.\nThe task of the UAV is to take off, grab the payload (orange), and drop it in the drop-off area. \nDuring the setup of the experiment, the UAV is manually placed on the checkerboard. \nThe gripper is then closed using the push buttons on the controller board.\nThe membrane then forms a flat and rigid surface for the UAV to rest on.\nAt that stage, the UAV is ready to take off.\n\nAfter successful takeoff, the membrane of the UG is fluidized (triggered by the remote) and the payload is approached. \nIdeally, the membrane would hit the payload dead center, but the UG is fairly tolerant to positional errors. \nAfter the activation force crosses a threshold of \\SI{250}{\\gramforce}, the evacuation phase is automatically triggered and the filler material hardens, creating a firm grasp on the payload. \nNow, the drone is piloted to the drop-off zone, where it releases its payload by fluidizing the gripper, triggered via the remote control.\n\nLastly, the UAV is piloted back to the checkerboard, where it safely lands on the UG. Notice that the landing gear, although present, never touches the ground. \nIt was kept for safety reasons only.\n\n", "images": []}
|
|
|
|
interleaved/2f8c1d56-6be3-415a-9358-b481e8a65f39.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/30739d01-e7eb-425f-a4af-b10691602f27.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/31b0ec11-cd68-49bf-9782-a873465d616e.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/31d06618-214b-4e98-a0b2-fa0db92a38fb.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/31f4cbea-1545-4d39-9cbb-96dfdf153c54.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/32af7d79-4044-46c0-8c74-8d556f6ef048.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/3517ec99-6b25-47bd-a632-1c2460eb2dd2.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "\\input{Figures/archi}\nIn this section, we will introduce the proposed cross-reference and local-global conditional network for solving few-shot image segmentation. \nIn the beginning, we describe our network in the 1-shot case. \nAfter that, we describe our finetuning scheme in the case of $k$-shot learning. Our network includes four key modules: the Siamese encoder, the cross-reference module, the conditional module, and the mask refinement module. The overall architecture is shown in Figure~\\ref{archtecture}.\n\n\\subsection{Method overview}\n\nUnlike previous existing few-shot segmentation methods \\cite{zhang2019canet,shaban2017one,Dong2018FewShotSS,mdl,fss1000} unilaterally guide the segmentation of query images with support images, our proposed CRCNet enables support, and query images guide the segmentation of each other. \nWe argue that the relationship between support-query image pairs is vital to few-shot segmentation learning. Experiments in Table~\\ref{Table:ablation-condition-cr} validate the effectiveness of our new architecture design.\nAs shown in Figure~\\ref{archtecture}, for every query-support pair, we encode the image pair into the features with the Siamese encoder, then apply the cross-reference module to mine out co-occurrent objects features. To fully utilize the annotated mask, the conditional module will incorporate the category information of support set annotations for foreground mask predictions. Finally, our mask refines module caches the confidence region maps recurrently for final foreground prediction. \nIn the case of $k$-shot learning, previous works \\cite{zhang2018sg,zhang2019canet,shaban2017one} only simply average the results of different 1-shot predictions. In contrast, our CRCNet adopts an optimization-based method that finetunes the model to use more support data. Table~\\ref{ablation:Fuse-and-FT} demonstrates the advantages of our method over previous works.\n\n\\subsection{Siamese encoder}\nThe Siamese encoder is a pair of parameter-shared convolutional neural networks that encode the query and support images to feature maps. Unlike the models in~\\cite{shaban2017one,rakelly2018conditional}, we use a shared feature encoder to encode the support and the query images. Our cross-reference module can provide better co-occurrent mine features to locate the foreground regions by embedding the images into the same space. To acquire representative feature embeddings, we use skip-connections to utilize multiple-layer features. As observed in CNN feature visualization literature\\cite{zhang2019canet,yosinski2015understanding}, features in lower layers often relate to low-level cues, and higher layers often relate to segment cues combined the lower level features and higher-level features and passing to followed modules.\n\n\n\\subsection{Cross-Reference Module} \\label{sec:cross-ref}\nThe cross-reference module is designed to mine co-occurrent features in two images and generate updated representations. The design of the module is shown in Figure~\\ref{co-segmentation}. Given two input feature maps generated by the Siamese encoder, we first use global average pooling to acquire the two images' global statistics. The two feature vectors are then sent to two fully connected (FC) layers, respectively. The Sigmoid activation function attached after the last FC layer transforms the vector values into the channel's importance, which is in the range of [0,1]. After that, the vectors in the two branches are fused by element-wise multiplication. Intuitively, only the two branches' common features will highly activate the fused importance vector. Finally, we use the fused vector to re-weight the input feature maps to generate reinforced feature representations. In comparison to the basic features, the reinforced features focus more on the co-occurrent representations. \n\nWe add a head to directly predict the two images' co-occurrent objects during training time based on the reinforced feature representations. This sub-task aims to facilitate the co-segmentation module's learning to mine better feature representations for the downstream tasks. To generate the co-occurrent objects' predictions in two images, the reinforced feature maps in the two branches are sent to a decoder to generate the predicted maps $QM_{sub}$ and $SM_{sub}$. \nThe decoder is composed of several convolutional layers and ASPP~\\cite{chen2018deeplab} layers. Finally, we generate a two-channel prediction with a convolutional layer corresponding to the foreground and background scores. \n\n\\input{Figures/co-seg}\n\n\\subsection{Conditional Module}\nAs shown in Figure~\\ref{Figure:condition}, we design a conditional module to incorporate the category information for foreground mask predictions efficiently. Our Conditional module is composed of global and local conditional modules. Given a support and query image pair, we first use a Siamese encoder to extract their features. Then we use the support annotations to filter out the irrelevant support features. We generate the global category-relevant vector with a masked global average pooling as a global condition to guide the query prediction. After that, we leverage the spatial support features as a local condition to enhance feature comparison. Finally, we fuse the global and local conditional features into representations to better predict the query masks. \n\n\\input{Figures/condition}\n\n\\textbf{Global conditional module.}\nThe global conditional module takes the feature representations generated by the Siamese encoder and a category-relevant vector as inputs. The category-relevant vector is the fused feature embedding of the target category, which is achieved by applying foreground average pooling~\\cite{zhang2019canet} over the category region. As the few-shot segmentation's goal is only to find the foreground mask of the assigned object category, the task-relevant vector serves as a condition to segment the target category.\nThe structure of our global conditional module is shown in Figure~\\ref{global-condition}. At first, apply a bilinearly upsampling to the category-relevant vector to the feature maps' same spatial size. Then we concatenate them with the query features. After that, we use a residual convolution to process the concatenated features. The global conditional modules in the support branch and the query branch have the same structure.\n\n\\input{Figures/global-condition}\n\n\\textbf{Local conditional module.}\nThe cross-reference model aims to mine out the co-occurrent objects between the images among the channel space. In particular, to mine the co-occurrent regions of two feature maps, the cross-reference model exchanges global information, generating weights for the channels. To complement the cross-reference module, we propose a local conditional model that aims to leverage the region-level comparison between query and support images to further enhance the co-occurrent objects among the spatial dimensions.\nOur local conditional module leverages the spatial element-wise representations as a condition to capture local similarities. As is depicted in Figure~\\ref{local-condition}, the local conditional module takes the query and support image features ($F_q$, \\textit{resp.}$F_s$) generated by the Siamese encoder as inputs. We generate a similarity matrix $M_{sim}$ between the query and support representations as follows:\n\n\\begin{equation}\n M_{sim}=\\theta(F_q) ^{T} \\bigotimes \\delta (F_s) .\n\\end{equation}\n\n\nHere $\\theta$ and $\\delta$ denote the non-linear transfer operation, which is implemented with 1$\\times$ 1 convolutional layer followed by a ReLu activation layer, and $\\bigotimes$ denotes the matrix multiplication operation. The row of the similarity matrix indicates the similarity of each query feature to all the support features. \nHowever, the similarity matrix $M_{sim}$ contains all the support features, including foreground and background, but we only need to enhance the same category features. We filter out the background and generate a foreground similarity matrix by reshaping the support annotation and expanding to the same size as $M_{sim}$, then multiplying to the similar matrix $M_{sim}$. After that, the foreground similarity matrix is normalized row-wise to derive an attention matrix $M^{'}_{sim}$ for each position in the query feature to support features. \n\n\\begin{equation}\n M^{'}_{sim}=Softmax(M_{sim} \\cdot \\Phi (Mask_{sup}) ).\n\\end{equation}\n\n\nHere $\\cdot$ denotes the element-wise multiplication operation, and $\\Phi (Mask_{sup})$ denotes the reshaped and extended support mask annotation. The softmax is performed row-wise. Finally, we generate the query attention maps $F^{'}_{q}$ with a matrix multiplication to the support feature: \n\n\\begin{equation}\n F^{'}_{q} = F_{s} \\bigotimes M^{'}_{sim}. \n\\end{equation}\n\nWe concatenate the outputs from the global and local conditional modules to generate our final conditional features. \n\n\\input{Figures/local-condition}\n\n\\subsection{Mask Refinement Module}\nAs is often observed in the weakly supervised semantic segmentation literature~\\cite{zhang2019canet,kolesnikov2016seed}, directly predicting the object masks can be difficult. It is a common principle to firstly locate seed regions and then refine the results. Based on such a principle, we design a mask refinement module to refine the predicted mask step-by-step. Our motivation is that the probability maps in a single feed-forward prediction can reflect where is the confident region in the model prediction. We can gradually optimize the mask and find the whole object regions based on the confident regions and the image features. As shown in Figure~\\ref{memory}, our mask refinement module has two inputs. One is the saved confidence map in the cache, and the second is the concatenation of the outputs from the conditional and cross-reference modules. The cache is initialized with a zero mask for the initial prediction, and the module makes predictions solely based on the input feature maps. The module cache is updated with the generated probability map every time the module makes a new prediction. We run this module multiple times to generate a final refined mask.\n\nThe mask refinement module includes two main blocks: the global convolution block and the combined block.\nThe global convolution block~\\cite{peng2017large} aims to capture features in a large field-of-view while containing few parameters. It includes two groups of $1\\times7$ and $7\\times1$ convolutional kernels. The combined block effectively fuses the feature branch from different feature levels and the cached branch to generate refined feature representations. \n\n\\input{Figures/memory}\n\n\\subsection{Finetuning for K-Shot Learning}\nIn the case of $k$-shot learning, we propose to finetune our network to take advantage of multiple labeled support images. As the network can make predictions for two images simultaneously, we can use at most $k^2$ image pairs to finetune our network. We randomly sample an image pair from the labeled support set at the evaluation stage to finetune our model. We kept the parameters in the Siamese encoder fixed and only finetuned the rest modules. Our experiment demonstrates that our finetuning-based methods can consistently improve results when more labeled support images are available. In contrast, previous works' fusion-based methods often get saturated when the number of support images increases.\n\n\n\\subsection{Training}\nWe choose a pair of images that belong to the same category as support image and the query image during the training process. Their roles are interchangeable. Our training objective includes four Binary Cross-Entropy losses (BCEloss), which are applied on the query predicted masks ( $L_{QM}$ and $L_{QMsub}$) and the support predicted masks ( $L_{SM}$ and $L_{SMsub}$). \n\\begin{equation}\nBCEloss = - (y*log(p) + (1-y)*(log(1-p)) ) \n\\end{equation}\n \\begin{equation}\n \\mathcal{L}= (\\mathcal{L}_{QM}+ \\mathcal{L}_{SM}) + \\lambda(\\mathcal{L}_{QMsub} + \\mathcal{L}_{SMsub}).\n \\label{loss_sum}\n \\end{equation}\n\nWhere \\textit{p} denotes the predicted probability, and $y$ denotes the ground truth label. $L_{QM}$ and $L_{SM}$ denote the losses with the final predicted query and support masks. $L_{QMsub}$ and $L_{SMsub}$ denote\nthe losses of the sub-predicted query and support masks discussed in Sec.~\\ref{sec:cross-ref}. We balance the two kinds of losses with hyper-parameter $\\lambda$. In this paper, we set $\\lambda$ to 0.1. \n\n\n\\input{Tables/abalation-condition-cr}\n\n\\input{Tables/ablation-refine-scale}\n\n\n\\input{Tables/ablation-condition}\n\n\n\\input{Tables/fuse-ft}\n\n\\input{Tables/PFENet-finetune}\n\n\\input{Tables/ablation-sigmoid}\n\\input{Tables/abalation-local-condition-mask}\n\\input{Tables/voc-soa-miou}\n\n\\input{Tables/voc-soa-fbiou}", "images": []}
|
|
|
|
interleaved/353218de-9e77-47e5-bc6e-a92ef77e1d8a.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/35f92248-c552-4a88-8d96-5177e36fa323.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/4e500ebb-cd91-423c-a953-8937940777d7.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/4e6a6eb2-7bdd-4b79-bf64-540c0b5baf04.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/4edabd45-dc9b-4fbf-b0db-3960b3e55dd1.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/51533f23-be07-46e7-aba8-0a8bb9876a31.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/53cdb676-2c21-40c3-94c7-751c004486ed.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/5633735f-7e43-4d1f-884b-6b5b094bbdeb.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/6435bf60-e53f-4d7e-a4a2-e46caaab519f.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/6617c785-e5e7-4d8b-bb2d-5b06a20c4af3.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/69f52948-906f-4d25-91be-ed91018fe5a8.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/71fce5d5-32d0-4fe2-ac14-04e5fd0e5c5d.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "\\label{sec:discussion}\nThe interest in developing soft grippers for aerial vehicles stems from the fact that by leveraging the properties of soft materials, soft grippers are a natural match for aerial grasping.\nIn contrast to their rigid counterparts, soft grippers are tolerant toward unknown object geometries and surfaces and do not require high positional accuracy for successful grasps.\nBy developing a lightweight soft jamming universal gripper attached to a UAV we further advanced the potential of soft aerial grasping. \nCompared to available soft grippers for aerial grasping, the developed system exhibits several distinguishing characteristics.\n\n\\textit{First}, the developed gripper is highly integrated and modular. \nThe tight integration of the electronics, software, sensors and mechanics leads to significant weight savings and enables a well-defined grasping procedure that can be automated for use in autonomous systems.\nThe modularity not only helps in iterating the design rapidly, but it also addresses some concerns typically associated with UGs.\nBy having an explicit interface between the grasping part (membrane module) and the supporting hardware, we assure that the membrane is quick and easy (toolless) to swap in case of damage.\nOther types of pneumatic grippers could also make use of this interface, e.g., suction cups.\n\nThanks to the specific characteristics of our UG's construction, it is particularly well suited for aerial vehicles, comparable to the typical multi-fingered soft grippers, but with some unique features (e.g., omnidirectionality, or the ability to use as landing gear).\n\n\n\n\\textit{Second}, TRIGGER is omnidirectional in contrast to other available soft aerial grasping systems (e.g., claws), which are sensitive to the angle the payload is approached.\nThe same applies to lateral position errors, where the UG tolerates displacements as large as \\SI{60}{\\percent} of its diameter.\nThis relaxes the requirements in terms of necessary grasping accuracy, which is especially advantageous for aerial systems that are subjected to external disturbances (e.g., wind gusts, ground effect) and sensor inaccuracies.\n\nDuring contact, the UAV retains most of its degrees of freedom due to the gripper's elasticity.\nAs such, it can still rotate (pitch and roll) and therefore preserve hover conditions, but at the same time, the translational degrees of freedom are soft-locked by the friction between the payload and the gripper.\nThis would address one concern associated with soft finger grippers. \nThe authors of \\cite{Mishra2018} stated that during their experiment, the multicopter had to land on the ground due to ground effects and the resulting lack of precise position control.\nDuring our aerial experiment, we did not observe such a problem as the UAV passively stayed locked in place during the grasp.\n\n\n\n\\textit{Third}, unlike soft fingers, our UG forms a rigid-like flat surface once a vacuum is established and thus enables the UAV to rest on it. \nThis feature makes our manipulating UAV system exceptional as it removes the need for a dedicated landing gear that also often interferes with the attached gripper resp. the sensors required for autonomous grasping.\nMoreover, using the UG as landing gear further reduces the weight of the aerial system.\n\nThe system is also able (within limits) to compensate for some terrain imperfections (e.g., slanted surfaces or small rocks), assuring optimal takeoff and land conditions.\nTraditional soft finger grippers often cannot prevent the payload from moving after the grasp is established, which can create further disturbances during flight.\nContrary to our UG, which forms a system behaving much more akin to a single rigid body due to the jamming of the granular material.\n\n\\textit{Forth}, hard shocks typically associated with the impact of two bodies are problematic both from a mechanical perspective, like the risk of damage, and also from a control perspective (e.g., potential instability).\nPassive mechanical compliance alleviates this problem by spreading the impact over a larger time interval.\n\nThe developed UG is completely soft during the first contact phase and is thus passively compliant and absorbs and dampens shocks.\nThis applies to both landing and grasping scenarios.\n\n\\textit{Fifth}, our gripper develops \\SI{15}{\\newton} of holding force on our test peg that do not allow for geometric interlocking and thus purely relied on friction and suction (to a much lesser extent), which is, therefore, a worst-case scenario.\nAs indicated in \\cite{Kapadia2012}, geometric interlocking can dramatically increase the holding force.\nComparisons with other UGs are hard to make due to the lack of a standardized test procedure.\nHowever, comparing our results with the work of \\cite{Kapadia2012}, \\cite{Mishra2021} and \\cite{GomezPaccapelo2021}, it can be said that the measured holding force for objects without geometric interlocking is in the same neighborhood, i.e., \\SI{10}{\\newton}-\\SI{30}{\\newton}, whilst being significantly lower power (less than $\\SI{10}{\\watt}$ against several hundreds of watts).\nConsequently, also the cycle times of our solution are longer (\\SI{11}{\\second} against \\SI{4}{\\second}) and have to be handled properly, and failure to do so will result in degraded or even unsuccessful grasps.\nTherefore, in the larger context of UGs, our results indicate that high-power pumps are not strictly required.\nIn practice, fitting larger, heavier pumps is limited by the payload capacity of the aerial platform.", "images": []}
|
|
|
|
interleaved/7574a956-538e-45fc-b320-95feb03ad24a.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/791770e7-09e9-4143-ad37-1ea7b380b96a.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/7abd84cd-4c9a-4e7a-9e49-d40137c8f19d.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/7af41fa6-8a38-4cbc-94f5-9be61687a40f.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/7c1a651f-7521-4d43-ac2d-89e7d6358012.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/86d16242-9603-42dd-bc6d-bdf9bdf9de04.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/8f9ee91b-a578-419a-8870-763dab8950b5.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/960eac75-a38c-45ba-93e3-a2e5c089d991.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "Survival analysis serves as an important tool in healthcare to assess the risk of events, such as onset of disease~\\citep{wilson1998prediction} or death~\\citep{pocock1982long}, rehospitalization~\\citep{patterson1998intensive} and discharge from hospital~\\citep{wang2020survival}. Survival modeling has been widely used in clinical applications, including improving the prognosis of cancer~\\citep{faradmal2012survival, goldstraw2016iaslc, wang2019clinicopathological, lin2021effects, wang2021survnet}, predicting the onset of septic shock~\\citep{henry2015targeted}, assessing the survival time of heart failure patients~\\citep{ahmad2017survival, kojoria2004outcomes, jones2019survival, yin2022survival} and estimating the graft survival rate of kidney transplant patients~\\citep{lee2019long, rodrigues2019survival}. \n\nGiven patients' electronic health records including lab tests, vitals, radiology results and clinical notes, doctors need to determine a level of treatment based on the level of risk. For example, WHO guidelines suggest more aggressive treatments for higher risk cardiovascular disease patients~\\citep{world2007prevention}. Therefore, an accurate model of risk is necessary. \n\nRisk in survival analysis is characterized by the conditional distribution of the event time given a patient's healthcare records. What distinguishes survival analysis \nfrom traditional regression problems is that event times can be censored, i.e., only known to lie within a certain range. For example, patients may remain healthy throughout a 10-year coronary artery disease study~\\citep{wilson1998prediction} so it is only known that such patients survive at least 10 years. Discarding censored times may introduce bias into estimates by underestimating the time until an event, because later times are more likely to be censored and thus thrown away.\n\nLikelihood-based methods are used to estimate survival models \\citep{kalbfleisch2011statistical}. \nIn addition to the usual mass or density computed in maximum likelihood problems, the survival likelihood for censored data includes the survival function, i.e., one minus the cumulative distribution function (CDF) of the distribution. For many distributions, CDF evaluations require explicitly integrating the density. Recent advances in deep learning provide opportunities for flexible survival modeling~\\citep{lecun2015deep, ranganath2016deep}. However, flexible distributions utilizing deep learning, such as those modeled by GANs~\\citep{goodfellow2014generative, chapfuwa2018adversarial}, may not yield efficient CDF computation.\n\nTo keep estimation tractable, traditional survival analysis techniques make distributional assumptions, e.g. log-normal density or proportional hazards \\citep{kalbfleisch2011statistical, cox1972regression}. But this limits the flexibility of the model. To move beyond this, discrete time models divide continuous times into a sequence of bins~\\citep{miscouridou2018deep, lee2018deephit, kvamme2019continuous} and can \napproximate arbitrary continuous distributions increasingly\nwell as the number of bins increases.\n However, the choice of bin boundaries is troublesome: it is unclear how best to set the time intervals for each bin, \nand the survival function for times within a bin is ill-defined. ODE-based continuous time models~\\citep{tang2020soden, tang2022survival} specify the time-to-event distribution through ODEs. However, the training of ODE-based models is slow due to expensive numerical integration requiring many neural network evaluations for each forward pass~\\citep{kelly2020learning}.\n\nIn this work, we propose Survival Mixture Density Networks (Survival MDN). Survival MDN builds off mixture density networks (MDN) \\citep{bishop1994mixture} to allow flexible modeling. Since the time-to-event is positive in survival modeling, we apply an invertible positive function to the samples from MDNs. The CDF of Survival MDNs can be obtained easily through the evaluation of the CDF of\nthe mixture components of the MDN, which is simple for mixture components like Gaussians. We evaluate Survival MDN and baselines on four clinical datasets: SUPPORT, METABRIC, GBSG, and MIMIC. On all datasets, Survival MDN performs better than, or as well as, the baselines on concordance, integrated Brier Score and integrated binomial log-likelihood. We also show that training Survival MDNs can be 100 times faster than the ODE-based model SODEN~\\citep{tang2020soden}.\\footnote{The code is available at https://github.com/XintianHan/Survival-MDN}\n\n\n\\subsection*{Generalizable Insights about Machine Learning in the Context of Healthcare}\nThe majority of flexible survival modeling relies on training with the Cox partial likelihood, discrete time modeling, or ordinary differential equations. Training with partial likelihood\nprecludes the use of stochastic gradient descent and is not scalable for large datasets. Discrete time models have issues with choosing bin boundaries and determining the survival probability for a particular time. ODE-based models use likelihood for training but are slow to train.\nOur proposed model Survival MDN have several advantages 1) It is a continuous time model 2) It makes fewer distributional assumptions\n3) It can be trained with stochastic gradient descent 4) It is easier to use than discrete models and faster than ODE-based models.\nIn this section, we introduce the mathematical foundation of survival analysis and summarize related works. We then describe how our work is distinguished from previous works.\n\\subsection{Foundation of Survival Analysis}\nSurvival analysis studies the distribution of event time $T$ given covariates $X$.\nFor example, we would like to know when a patient may die after the admission to ICU. The event time is called the failure time or survival time. We consider the common scenario of \\textit{right-censoring} in this work, where only a lower bound of the survival time is observed for some patients. We call the lower bound the \\textit{censored time} $C$. When $T > C$, only the censored time $C$ is observed; when $T\\leq C$,the failure time $T$ is observed. We use $\\Delta = I\\{T\\leq C\\}$ to indicate whether the event time is observed and $U = \\min\\{T, C\\}$ to denote the observed time.\n\nA central quantity that appears in the estimation and use of survival models is the survival function $S(t|X) = P (T>t|X)$, i.e., the probability a patient with covariates $X$ will survival until time $t$. By definition, $S(t|X) = 1 - \\text{CDF}(t|X)$. \n\nAssume we observe i.i.d. datapoints $\\{u_i, \\Delta_i, x_i\\}_{i=1}^N$ and censoring is random $T\\perp C |X$. Under these assumptions, \nand with $p(t|X)$ denotating the mass or probability density function (PDF) evaluated at $t$,\nthe survival likelihood function with a parameter $\\theta$ is proportional to~\\citep{kalbfleisch2011statistical}:\n\\[\n\\Pi_{i=1}^N p_\\theta(u_i|x_i)^{\\Delta_i}S_\\theta(u_i|x_i)^{1-\\Delta_i}.\n\\]\nIn this work, we use the log-likelihood as a training objective function.\n\\subsection{Related Work}\n\\paragraph{Traditional Survival Analysis} Traditionally, survival analysis makes distributional assumptions. The Cox model~\\citep{cox1972regression} makes the proportional hazard assumption. The accelerated failure time (AFT) model~\\citep{buckley1979linear,wei1992accelerated} assumes that $\\log(T) = X^T\\theta +\\epsilon$, where $\\epsilon$ follows a log-logistic distribution. Multiple variants of Cox and AFT models~\\citep{aalen1980model, bennett1983analysis, cheng1995analysis, lin1995semiparametric, kalbfleisch2011statistical, wu2019flexible} have been proposed to introduce time-varying functions or different distributions. However, these extensions only use linear or simple non-linear models which may not be flexible enough to model complex data distributions. \\citet{avati2020countdown} use deep networks\nto produce the parameters of a lognormal.\nThough this can capture nonlinear dependence of the lognormal's parameters on the input,\nthe lognormal assumption may not be appropriate, e.g., if the true\nconditional distribution has more than one mode.\n\n\\paragraph{Deep Cox Models}\nThe Cox model has been extended with deep networks\nin several ways. DeepSurv~\\citep{katzman2018deepsurv} uses a neural network to model the relative risk $g(X;\\theta)$. Cox-Time~\\citep{kvamme2019time} further allows the relative risk to depend on time $t$. \\citet{kvamme2019continuous} assume the hazard is constant in predefined time intervals. \\citet{nagpal2021deep} uses a mixture of Cox models parameterized by neural networks. These models optimize the partial likelihood function which does not require the access to survival functions. The partial likelihood is defined by\n\\[\n\\Pi_{i:\\Delta_i = 1}\\frac{\\exp(g(u_i, x_i;\\theta))}{\\sum_{j\\in R_i} \\exp(g(u_i, x_j;\\theta))},\n\\]\nwhere $R_i = \\{j: y_j \\geq y_i\\}$, called the risk set, denotes the set of patients who survive at least as long as the $i$-th patient. The goal of maximizing the partial likelihood is to make patient $i$'s relative risk at $u_i$ greater than that of the other patients who survive longer. \nWhen there are thousands of datapoints, stochastic gradients are\nan efficient alternative to gradient computation for maximizing likelihoods.\nHowever, risk sets require the whole dataset to evaluate since the risk set involves all patients. This disadvantage precludes the use of stochastic gradient descent for training. Though we can use mini-batches of patients to approximate the risk set $R_i$, there are no theoretical guarantees for convergence.\n\n\n\n\n\\paragraph{Deep Discrete Models} \nDeep categorical survival models~\\citep{miscouridou2018deep,fotso2018deep,goldstein2020x} divide the time axis into a sequence of bins and turn survival analysis into predicting a time's bin. These models use $K$ bins where the last\nbin includes all times greater than some value. DeepHit~\\citep{lee2018deephit} adds a rank-based loss and uses discrete models for competing risks. Nnet-survival~\\citep{biganzoli1998feed,gensheimer2019scalable} models the survival function by multiplications of conditional probabilities in previous time bins.\nThese discrete models can approximate\narbitrary smooth distributions with increasing fidelity as $K$ increases \\citep{miscouridou2018deep}.\n\nHowever, discrete models have their own problems. These models do not define what happens to the survival function estimation within a bin, at least without additional assumptions e.g. linearly interpolating the CDF. Next, it is challenging to choose the bin boundaries; it is unclear whether to set them by population percentiles or by regular intervals \\citep{kvamme2019continuous, tang2020soden, craig2021survival}. \nUsing regular intervals may lead times to concentrate into a small subset of bins. For percentiles, it is unclear whether we should include the censored times into the population. Percentiles of the observed failure times may not equal the percentiles of true failure times. Finally, deep discrete models are based on classification architectures, meaning that they may be overconfident and suffer the same poor calibration observed for deep classifiers \\citep{guo2017calibration}, as shown for survival analysis in \\cite{goldstein2020x}.\n\n\\paragraph{ODE-based Models} Recently, continuous time models with neural ODEs have been proposed\n\\citep{chen2018neural}. SODEN~\\citep{tang2020soden} considers the evolution of cumulative hazard functions as an ODE while \\citet{danks2022derivative} model the CDF by an ODE. \\citet{groha2020general} use ODEs for multi-state survival analysis. ODE-based models have tractable PDFs and CDFs. However, training neural ODEs is slow~\\citep{kelly2020learning} because of the expensive numerical integration inside ODE solvers. ODE-based models also involve extra hyperparameters related to ODE-solvers, including the solver type and tolerance level.\n\n\\paragraph{Other Deep Models} \\citet{chapfuwa2018adversarial}\nuse GANs for survival distribution modeling. But they do not use the likelihood as an objective for training since the PDF and CDF of GANs are intractable. The alternative, minimax training of GANs, is known to be unstable~\\citep{kodali2017convergence, bottou2018geometrical}. \\citet{ranganath2016deep} use deep exponential families~\\citep{ranganath2015deep} with Weibull likelihoods.\nThis approach necessitates the use of black-box variational inference with Monte Carlo gradients \\citep{ranganath2014black, mohamed2020monte}, which typically yields both a lower bound on the likelihood and noisier, slower optimizations. Survival stacking~\\citep{craig2021survival} casts the survival analysis as a classification task by predicting whether one patient is in other patients' risk sets. But for $N$ datapoints, survival stacking creates $O(N^2)$ classification problems which is not tractable for large datasets.\n\\paragraph{Our Model} In this work, we propose a new flexible survival model named Survival Mixture Density Networks. Survival MDNs utilize mixture density networks~\\citep{bishop1994mixture} to allow flexible modeling. With Gaussians as the base distributions, computing the model CDF and PDF requires the evaluation of standard functions and the error function. The error function can be obtained efficiently via common approximations~\\citep{abramowitz1988handbook} and Gaussian CDFs are implemented\nin most packages. Our simple approach can be trained through stochastic gradient descent and much faster than ODE-based models. We compare our model with previous approaches in \\cref{tab:comp}.\n\\begin{table}[h]\n \\centering\n \\begin{tabular}{ccccc}\n \\toprule\n Model & Flexible & Continuous-time & SGD & Without ODE-Solver\\\\ \\midrule\n Cox & \\xmark & \\cmark & \\xmark & \\cmark\\\\\n DeepSurv & \\xmark & \\cmark & \\xmark & \\cmark\\\\\n DeepHit & \\cmark & \\xmark & \\cmark & \\cmark\\\\\n Nnet-survival & \\cmark & \\xmark & \\cmark & \\cmark\\\\\n Cox-Time & \\cmark & \\cmark & \\xmark & \\cmark\\\\\n SODEN & \\cmark & \\cmark & \\cmark & \\xmark\\\\\n Survival MDN & \\cmark & \\cmark & \\cmark & \\cmark \\\\ \\bottomrule\n \\end{tabular}\n \\caption{Comparison of Different Models}\n \\label{tab:comp}\n\\end{table}\nIn summary, we propose a continuous-time model that can be trained with stochastic gradients, without numerical ODE solving, and that moves beyond common modeling restrictions (e.g. that the density is log-normal or Cox).\nOur purpose is to build a survival model that has the following properties:\n\\begin{enumerate}\n \\item It has a differentiable PDF which can be evaluated efficiently.\n \\item It has a differentiable CDF which can be evaluated efficiently.\n \\item It is flexible enough to approximate a broad class of conditional time-to-event distributions $p(t|x)$ with support over $\\mathbb{R}^+$.\n\\end{enumerate}\nThe first two properties enable efficient training\nusing maximum likelihood and using stochastic gradients.\nExamples of the last property are models that do not make assumptions like lognormality or proportional hazards, or that can capture multiple modes.\n\n\\subsection{Mixture Density Networks}\nMixture Density Networks (MDNs)~\\citep{bishop1994mixture} form the key part of Survival MDNs. For a given $x$, MDNs model the conditional distribution $p(y|x)$ by mapping $x$ through a neural network to produce the weights and parameters of a mixture model.\n Mixture density networks are flexible approximators; for any given $x$, with enough components, MDNs can approximate a broad class of conditional densities $p(y|x)$ as closely as desired~\\citep{bishop1994mixture}. \n\nIn this work, we use Gaussian mixtures~\\citep{reynolds2000speaker, reynolds2009gaussian}. A discussion on different base distributions can be found in \\cref{appsec:base}. Assume we have $K$ components with weights $\\{w_i\\}_{i=1}^K$, means $\\{\\mu_i\\}_{i=1}^K$ and standard deviations $\\{\\sigma_i\\}_{i=1}^K$\n such that $\\sum_{i=1}^K w_i = 1$. The PDF of the Gaussian Mixture Density Network is given by\n\\[\np\\left(y|\\{w_i,\\mu_i,\\sigma_i \\}_{i=1}^K\\right) = \\sum_{i=1}^K w_i \\mathcal{N}\\left(y|\\mu_i, \\sigma_i^2\\right),\n\\]\nwhere we denote $\\mathcal{N}(y|\\mu_i, \\sigma_i^2)$ as the density of a Gaussian distributed random variable with mean $\\mu_i$ and variance $\\sigma_i^2$. \n\nIn mixture density networks, we build the conditional distribution by mapping the covariates $x$ to parameters of the Gaussian Mixture Model through deep neural networks:\n\\[\n\\{w_i(x), \\mu_i(x), \\sigma_i(x)\\}_{i=1}^K = f_\\theta(x),\n\\]\nwhere $f_\\theta$ is a trainable neural network with parameters $\\theta$.\n\\subsection{Survival Mixture Density Networks}\nWe propose Survival Mixture Density Networks (Survival MDNs) to satisfy the properties we want for a survival model. \n\nThe sampling process for Survival MDN on a given input $x$ is \n\\begin{enumerate}\n \\item Calculate $\\{w_i(x), \\mu_i(x), \\sigma_i(x)\\}_{i=1}^K = f_\\theta(x)$.\n \\item Sample $y$ according to the PDF $\\sum_{i=1}^K w_i \\mathcal{N}\\left(y|\\mu_i, \\sigma_i^2\\right)$. To do so, first sample a component $i$ with probability equal to $w_i$ and then sample from $\\mathcal{N}(\\mu_i, \\sigma_i^2)$.\n \\item Map $y$ to the event time $t$ using $t = g(y) = \\log(1 + \\exp(y))$.\n\\end{enumerate}\nThe invertible \\texttt{softplus} function $g(y) = \\log(1 + \\exp(y))$ maps the sample from the mixture density network to the positive domain. Another common choice to map the input from $\\mathbb{R}$ to $\\mathbb{R}^+$ is \\texttt{exp}. We choose \\texttt{softplus} over \\texttt{exp} for the reason that \\texttt{exp} may place high density on very large times. \n\nNext, we show that the PDF and CDF of the Survival MDN is easy to compute. By the change of variables, the Survival MDN PDF at time $t$ for\ninput $x$ is:\n\\[\np(t|x)= \\Big|\\frac{d g^{-1}(t)}{dt}\\Big|\\Big(\\sum_{i=1}^K w_i(x) \\mathcal{N}\\left(g^{-1}(t)|\\mu_i(x), \\sigma_i^2(x)\\right)\\Big).\n\\]\nFor the simple choice of the \\texttt{softplus}, the absolute value term does not depend on the parameters of neural network $f_\\theta$ so this term does not contribute to gradients used for log-likelihood training. The Survival MDN CDF at time $t$ can be computed easily as well. Denote $F(\\cdot|\\mu_i, \\sigma_i^2)$ as the CDF of the $i$-th component in the Gaussian mixture model. Denote $F(t|x)$ as the CDF of the Survival MDN and $F_{\\text{MDN}}(y|x)$ as the CDF of the underlying MDN. Since \\texttt{softplus} is an increasing invertible function, we show that the CDF of the Survival MDN at time $t$ only requires evaluations\nof the underlying Gaussian CDFs:\n\n\\begin{align*}\n F(t|x) &= F_{\\text{MDN}}(g^{-1}(t)|x) \\\\\n &= \\int_{-\\infty}^{g^{-1}(t)} \\sum_{i=1}^K w_i(x) N\\left(y|\\mu_i(x), \\sigma_i^2(x)\\right) dy\\\\\n &= \\sum_{i=1}^K w_i(x) \\int_{-\\infty}^{g^{-1}(t)}N\\left(y|\\mu_i(x), \\sigma_i^2(x)\\right) dy \n\\\\\n &= \\sum_{i=1}^K w_i(x) F\\left(g^{-1}(t)|\\mu_i(x), \\sigma_i^2(x)\\right)\\\\ \n\\end{align*}\nThe evaluation of Gaussian CDFs can be done efficiently through the error function $\\texttt{erf}(\\cdot)$ which is the CDF of the standard normal distribution:\n\\[\nF\\left(g^{-1}(t)|\\mu_i(x), \\sigma_i^2(x)\\right) = \\texttt{erf}\\left(\\left(g^{-1}(t) - \\mu_i(x)\\right)/\\sigma_i(x)\\right).\n\\]\nThe \\texttt{erf} function can be computed efficiently via common approximations~\\citep{abramowitz1988handbook} and the Gaussian CDF is implemented in most packages. Now we have satisfied the first two desired properties (PDF and CDF). The last property, flexibility, follows since the Survival MDN maps time-to-event densities to densities over the reals via $y =g^{-1}(t)$ and a mixture density network with enough components and a wide and deep enough network can approximate a broad class of smooth densities $p(y|x)$ as closely as desired~\\citep{bishop1994mixture}. For tabular data, the network in MDN is a feedforward neural network. Other types of networks can also be used. For example, for image data, one can use convolutional neural networks and for text data one can use transformers. Instead of logits for classification, these models produce the parameters of the Gaussian Mixture at the last layer in MDNs.\nIn this simulation experiment, we test Survival MDN and SODEN on a dataset where the proportional hazard assumption does not hold. We follow the simple simulation setting in SODEN \\citep{tang2020soden}. There are two group of $x$'s, $x=0$ and $x=1$, and the ground truth survival function is:\n\\[\nS(t|x) = \\exp(-2t)\\cdot I\\{x=0\\} + \\exp(-2t^2) \\cdot I\\{x=1\\},\n\\]\nwhere $I$ is the indicator function. The survival curves of the two groups cross so this survival distribution does not obey the proportional hazard (PH) assumption. Therefore, models that require the PH assumption cannot fit this dataset well. We generate $x$ from a Bernoulli distribution with probability 0.5 and then generate $t$ using the inverse CDF method. We sample the censored time uniformly on $[0,2]$. Instead of simulating a fixed dataset, we use an ``online'' training method; in each iteration, we generate a new set of 1024 datapoints. We use the likelihood function for training. We train for 10,000 iterations for both SODEN and Survival MDN. \n\nWe show the resulting survival functions and ground truth in \\cref{fig:survival_func}. Both Survival MDN and SODEN's survival functions are close to the ground truth at both $x=0$ and $x=1$.\n\nIn this section, we compare Survival MDN with baselines Cox, DeepSurv, Cox-Time, Nnet-survival, DeepHit and SODEN. We use four different datasets: SUPPORT, METABRIC, GBSG and MIMIC. We evaluate all models on three different metrics: concordance, integrated binomial log-likelihood and integrated Brier score.\n\\subsection{Datasets}\nWe choose four different datasets: SUPPORT, METABRIC, GBSG and MIMIC. SUPPORT, METABRIC and GBSG are commonly used datasets for survival analysis, which can be found in the \\texttt{pycox} package. MIMIC is a dataset we preprocessed from MIMIC-iv~\\citep{johnson2020mimic} in PhysioNet~\\citep{goldberger2000physiobank}. We describe the details of the datasets here:\n\\begin{itemize}\n \\item SUPPORT: the Study to Understand Prognoses\nPreferences Outcomes and Risks of Treatment. It has 14 features. There are 8,873 datapoints, 32\\\n\\item METABRIC: the Molecular Taxonomy of\nBreast Cancer International Consortium. It has 9 features. There are 1,904 datapoints, 42\\\n\\item GBSG: The Rotterdam \\& German Breast Cancer Study Group. It has 7 features. There are 2,232 datapoints, 43\\\n\\item MIMIC: The Medical Information Mart for Intensive Care. The SODEN repository does not provide the data files for MIMIC. We choose patients that are alive 24 hours after admission to ICU. We define the event as mortality after admission. We define the censored time as the ICU discharged time. We collect time series features within the 24-hour window after the admission together with static features. For time series features, we use the minimum, mean and maximum within the window. We remove the features that are missing for more than half of the datapoints. Finally, we extract 65 features after preprocessing including common labs and vitals. There are 53,612 datapoints, 82\\\n\\end{itemize}\n\\begin{table}[h]\n \\centering\n \\begin{tabular}{cc|ccc}\n \\toprule\n $P(C>\\tau)$ & Model & $C_{\\tau}^{td}(\\uparrow)$ & IBLL$_{\\tau}$($\\uparrow$) & IBS$_{\\tau}$($\\downarrow$) \\\\ \\midrule\n $10^{-8}$ & Cox &0.596 $\\pm$ .002 & -0.568 $\\pm$ .001& 0.194 $\\pm$ .001 \\\\\n & DeepSurv &0.609 $\\pm$ .003& \\textbf{-0.559} $\\pm$ .002 & \\textbf{0.190} $\\pm$ .001\\\\\n & Cox-Time & 0.607 $\\pm$ .004 & 0.565 $\\pm$ .002 & 0.191 $\\pm$ .001 \\\\\n & Nnet-Survival & 0.624 $\\pm$ .003 & -0.570 $\\pm$ .004 &0.193 $\\pm$ .001 \\\\\n & DeepHit & \\textbf{0.631} $\\pm$ .003 & -0.583 $\\pm$ .006 & 0.197 $\\pm$ .001 \\\\\n & SODEN & 0.627 $\\pm$ .003 & -0.563 $\\pm$ .002 & 0.191 $\\pm$ .001\\\\\n & Survival MDN & 0.628 $\\pm$ .003 & \\textbf{-0.559} $\\pm$ .002 & \\textbf{0.190} $\\pm$ .002 \\\\ \\midrule\n $0.2$ & Cox &0.596 $\\pm$ .002 & -0.585 $\\pm$ .001 & 0.201 $\\pm$ .000\\\\\n & DeepSurv & 0.609 $\\pm$ .003 & -0.577 $\\pm$ .002 &0.197 $\\pm$ .001\\\\\n & Cox-Time & 0.606 $\\pm$ .004 & -0.583 $\\pm$ .002 &0.199 $\\pm$ .001 \\\\\n & Nnet-Survival & 0.623 $\\pm$ .003 & -0.586 $\\pm$ .003 & 0.201 $\\pm$ .001 \\\\\n & DeepHit & \\textbf{0.630} $\\pm$ .003 & -0.601 $\\pm$ .006 & 0.205 $\\pm$ .002 \\\\\n & SODEN & \\textbf{0.630} $\\pm$ .003 & -0.601 $\\pm$ .006 & 0.205 $\\pm$ .002 \\\\\n & Survival MDN & 0.628 $\\pm$ .003 & \\textbf{-0.575}$\\pm$ .002 & \\textbf{0.196} $\\pm$ .001 \\\\ \\midrule\n $0.4$ & Cox & 0.595 $\\pm$ .002 & -0.602 $\\pm$ .001 & 0.208 $\\pm$ .001 \\\\\n & DeepSurv & 0.608 $\\pm$ .002 & -0.595 $\\pm$ .002 &0.205 $\\pm$ .001 \\\\\n & Cox-Time & 0.605 $\\pm$ .004 & -0.601 $\\pm$ .002 & 0.207 $\\pm$ .001 \\\\\n & Nnet-Survival & 0.623 $\\pm$ .003 & -0.602 $\\pm$ .003 & 0.208 $\\pm$ .001 \\\\\n & DeepHit & \\textbf{0.630} $\\pm$ .003 & -0.619 $\\pm$ .007 & 0.212 $\\pm$ .002 \\\\\n & SODEN & 0.626 $\\pm$ .003 & -0.597 $\\pm$ .002 & 0.205 $\\pm$ .001\\\\\n & Survival MDN & 0.628 $\\pm$ .003 & \\textbf{-0.593} $\\pm$ .001 & \\textbf{0.204} $\\pm$ .001 \\\\ \\bottomrule\n \\end{tabular}\n \\caption{Evaluation of all models on SUPPORT with concordance ($C_{\\tau}^{td})$, integrated binomial log-likelihood (IBLL$_{\\tau}$) and integrated Brier score (IBS$_{\\tau}$). The \\textbf{bold} number indicates the best performance. We report mean $\\pm$ standard error on all metrics.}\n \\label{tab:support}\n\\end{table}\n\\subsection{Baselines}\nWe consider the following baseline models:\n\\begin{itemize}\n \\item Cox~\\citep{cox1972regression}: A linear model with the proportional hazards assumption.\n \\item DeepSurv~\\citep{katzman2018deepsurv}: A deep model with the linear function in Cox replaced by neural networks.\n \\item Cox-Time~\\citep{katzman2018deepsurv}: A continuous time model that allows the relative risk in Cox to depend on time.\n \\item Nnet-Survival~\\citep{gensheimer2019scalable}: A discrete time model that models the conditional hazard in each time interval.\n \\item DeepHit~\\citep{lee2018deephit}: A deep discrete time model that further adds a rank-based loss to the likelihood as the training objective.\n \\item SODEN~\\citep{tang2020soden}: An ODE-based continous time model.\n\\end{itemize}\nFor Cox, we use the implementation in the Python package \\texttt{lifelines}. For DeepSurv, Cox-Time, Nnet-Survival and DeepHit, we use the implementations in the Python package \\texttt{pycox}. For SODEN, we use the implementation from the SODEN repository.\n\n\\begin{table}[h]\n \\centering\n \\begin{tabular}{cc|ccc}\n \\toprule\n $P(C>\\tau)$ & Model & $C_{\\tau}^{td}(\\uparrow)$ & IBLL$_{\\tau}$($\\uparrow$) & IBS$_{\\tau}$($\\downarrow$) \\\\ \\midrule\n $10^{-8}$ & Cox &0.644 $\\pm$ .006 & -0.508 $\\pm$ .009& 0.169 $\\pm$ .002 \\\\\n & DeepSurv &0.635 $\\pm$ .007& -0.517 $\\pm$ .011 & 0.171 $\\pm$ .003\\\\\n & Cox-Time & 0.648 $\\pm$ .007 & -0.511 $\\pm$ .009 & 0.172 $\\pm$ .003 \\\\\n & Nnet-Survival & 0.666 $\\pm$ .005 & -0.510 $\\pm$ .007 &0.171 $\\pm$ .002 \\\\\n & DeepHit & \\textbf{0.674} $\\pm$ .006 & -0.514 $\\pm$ .004 & 0.174 $\\pm$ .002 \\\\\n & SODEN & 0.661 $\\pm$ .005 & -0.498 $\\pm$ .008 & 0.167 $\\pm$ .003\\\\\n & Survival MDN & 0.667 $\\pm$ .004 & \\textbf{-0.489} $\\pm$ .005 & \\textbf{0.165} $\\pm$ .002 \\\\ \\midrule\n $0.2$ & Cox & 0.639 $\\pm$ .006 & -0.521 $\\pm$ .006 &0.176 $\\pm$ .002\\\\\n & DeepSurv & 0.635 $\\pm$ .006 & -0.530 $\\pm$ .005 &0.179 $\\pm$ .002 \\\\\n & Cox-Time & 0.647 $\\pm$ .005 & -0.531 $\\pm$ .007 &0.179 $\\pm$ .002 \\\\\n & Nnet-Survival & 0.662 $\\pm$ .004 & -0.523 $\\pm$ .003 & 0.177 $\\pm$ .001 \\\\\n & DeepHit & \\textbf{0.671} $\\pm$ .004 & -0.533 $\\pm$ .003 & 0.182 $\\pm$ .001 \\\\\n & SODEN & 0.659 $\\pm$ .003 & -0.516 $\\pm$ .006 & 0.174 $\\pm$ .002 \\\\\n & Survival MDN & 0.662 $\\pm$ .004 & \\textbf{-0.510} $\\pm$ .003 & \\textbf{0.172} $\\pm$ .001 \\\\ \\midrule\n $0.4$ & Cox & 0.637$\\pm$ .006 & -0.521 $\\pm$ .006 & 0.175 $\\pm$ .002 \\\\\n & DeepSurv & 0.635 $\\pm$ .006 & -0.526 $\\pm$ .005 &0.178 $\\pm$ .002 \\\\\n & Cox-Time & 0.644 $\\pm$ .005 & -0.526 $\\pm$ .006 & 0.178 $\\pm$ .002 \\\\\n & Nnet-Survival & 0.660 $\\pm$ .003 & -0.519 $\\pm$ .003 & 0.176 $\\pm$ .001 \\\\\n & DeepHit &\\textbf{0.668} $\\pm$ .003 & -0.528 $\\pm$ .003 & 0.180 $\\pm$ .001 \\\\\n & SODEN & 0.658 $\\pm$ .004 & -0.528 $\\pm$ .003 & 0.180 $\\pm$ .001\\\\\n & Survival MDN & 0.660 $\\pm$ .002 & \\textbf{-0.508} $\\pm$ .003 & \\textbf{0.172} $\\pm$ .001 \\\\ \\bottomrule\n \\end{tabular}\n \\caption{Evaluation of all models on METABRIC with concordance ($C_{\\tau}^{td})$, integrated binomial log-likelihood (IBLL$_{\\tau}$) and integrated Brier score (IBS$_{\\tau}$). We report truncated metrics for $\\tau$'s satisfying $P(C>\\tau) = 10^{-8}, 0.2, 0.4$. The \\textbf{bold} number indicates the best performance. We report mean $\\pm$ standard error on all metrics.}\n \\label{tab:metabric}\n\\end{table}\n\\subsection{Evaluation Metrics}\nWe use the same evaluation metrics as SODEN~\\citep{tang2020soden}. They are concordance, integrated binomial log-likelihood and Brier score. The implementations can be found in the SODEN repository. We briefly describe the three metrics here and refer to \\citet{tang2020soden} for more detailed descriptions.\n\n\\paragraph{Concordance} The concordance index is originally proposed by \\citet{harrell1984regression}. It measures the probability that the relative order of the event time of two observations matches the predicted survival probabilities. \\citet{antolini2005time} further relaxes the proportional hazard assumption in Harrell's concordance to create time dependent concordance. \nBuilding off the inverse-weighting method in \\cite{cheng1995analysis},\n\\citet{uno2011c} \nintroduces inverse probability weighted concordance to remove the dependence on the censoring distribution. They use the survival function of the censoring time $G(t) = P(C>t)$ as the weight and the Kaplan-Meier estimator for $G(t)$. Under the completely random censoring assumption $C \\indep (T,X)$, the inverse probability weighted estimator is consistent. This assumption is\nroutinely made for evaluation, e.g.\nin \\cite{kvamme2019time, tang2020soden,han2021inverse}. Due to the limited number of observations, the estimator of the inverse weight $1/\\hat{G}(t)$ may be very large for some large-enough $t$. So \\citet{uno2011c} introduce a truncated version of the concordance estimator within a pre-specified time interval $[0,\\tau]$:\n\\[\nC_{\\tau}^{td} = \\frac{\\sum_{i:\\Delta_i=1, u_i < \\tau} \\sum_{j, u_i < u_j} I\\left(\\hat{S}(u_i|x_i) < \\hat{S}(u_i|x_j)\\right)/\\hat{G}^2(u_i)}{\\sum_{i:\\Delta_i=1, u_i <\\tau} \\sum_{j:u_i<u_j}1/\\hat{G}^2(u_i)},\n\\]\nwhere $I(\\cdot)$ is the indicator function. Here $\\tau$ is used to truncate the large times that have very small $\\hat{G}(t)$. We choose three $\\tau$'s that satisfy $\\hat{G}(\\tau) = 10^{-8}, 0.2, 0.4$. When $\\hat{G}(\\tau)=10^{-8}$, the truncated concordance is almost equal to the non-truncated version.\n\n\\paragraph{Integrated Brier Score} The Brier score (BS) measures the mean square error between the ground-truth label and the predicted probability for a binary classifier. It measures both the calibration and discriminative performance \\citep{degroot1983comparison}. In survival analysis, we evaluate the Brier score at a given time $t$. The label is whether the patient survives after time $t$ and the predicted probability is the survival function. We also consider an inverse probability weighted estimator~\\citep{graf1999assessment,gerds2006consistent} for the Brier score at time $t$:\n\\[\n\\text{BS}(t) = \\frac{1}{N}\\sum_{i=1}^N \\left\\{\\frac{\\hat{S}^2(t|x_i) I(u_i \\leq t , \\Delta_i=1)}{\\hat{G}(u_i)} + \\frac{(1-\\hat{S}(t|u_i))^2 I(u_i > t)}{\\hat{G}(t)}\\right\\}.\n\\]\n\nTo consider all times, we use an integrated BS (IBS) over time interval $[0, \\tau]$:\n\\[\n\\text{IBS}_{\\tau} = \\frac{1}{\\tau}\\int_0^{\\tau} \\text{BS}(t) dt.\n\\]\nTo avoid extreme inverse weights, we also report results for $\\tau$'s that satisfy $\\hat{G}(\\tau) = 10^{-8}, 0.2, 0.4$. When $\\hat{G}(\\tau) = 10^{-8}$, $\\tau$ is almost equal to the maximum time in the data.\n\\begin{table}[h]\n \\centering\n \\begin{tabular}{cc|ccc}\n \\toprule\n $P(C>\\tau)$ & Model & $C_{\\tau}^{td}(\\uparrow)$ & IBLL$_{\\tau}$($\\uparrow$) & IBS$_{\\tau}$($\\downarrow$) \\\\ \\midrule\n $10^{-8}$ & Cox &0.645 $\\pm$ .009 & -0.523 $\\pm$ .009& 0.177 $\\pm$ .004 \\\\\n & DeepSurv &0.663 $\\pm$ .007& -0.509 $\\pm$ .010 & \\textbf{0.172} $\\pm$ .004\\\\\n & Cox-Time & 0.654 $\\pm$ .007 & -0.521 $\\pm$ .009 & 0.176 $\\pm$ .003 \\\\\n & Nnet-Survival & 0.661 $\\pm$ .006 & -0.516 $\\pm$ .008 &0.174 $\\pm$ .005 \\\\\n & DeepHit & 0.665 $\\pm$ .008 & \\textbf{-0.504} $\\pm$ .017 & 0.176 $\\pm$ .005 \\\\\n & SODEN & 0.661 $\\pm$ .012 & -0.514 $\\pm$ .017 & 0.173 $\\pm$ .004\\\\\n & Survival MDN & \\textbf{0.668} $\\pm$ .007 & \\textbf{-0.504} $\\pm$ .006 & \\textbf{0.172} $\\pm$ .003 \\\\ \\midrule\n $0.2$ & Cox & 0.645 $\\pm$ .009 & -0.519 $\\pm$ .007 &0.176 $\\pm$ .003\\\\\n & DeepSurv & 0.663 $\\pm$ .007 & -0.505 $\\pm$ .008 &0.170 $\\pm$ .002 \\\\\n & Cox-Time & 0.654 $\\pm$ .007 & -0.517 $\\pm$ .006 &0.175 $\\pm$ .002 \\\\\n & Nnet-Survival & 0.661 $\\pm$ .006 & -0.509 $\\pm$ .006 & 0.170 $\\pm$ .003 \\\\\n & DeepHit & 0.665 $\\pm$ .008 & -0.510 $\\pm$ .008 & 0.172 $\\pm$ .004 \\\\\n & SODEN & 0.661 $\\pm$ .012 & -0.510 $\\pm$ .009 & 0.172 $\\pm$ .004 \\\\\n & Survival MDN & \\textbf{0.668} $\\pm$ .007 & \\textbf{-0.501} $\\pm$ .006 & \\textbf{0.168} $\\pm$ .002 \\\\ \\midrule\n $0.4$ & Cox & 0.645$\\pm$ .009 & -0.519 $\\pm$ .007 & 0.176 $\\pm$ .003 \\\\\n & DeepSurv & 0.663$\\pm$ .007 & -0.505 $\\pm$ .008 &0.170 $\\pm$ .002 \\\\\n & Cox-Time & 0.654 $\\pm$ .007 & -0.517 $\\pm$ .006 & 0.175 $\\pm$ .002 \\\\\n & Nnet-Survival & 0.661 $\\pm$ .006 & -0.509 $\\pm$ .007 & 0.170 $\\pm$ .003 \\\\\n & DeepHit & 0.665 $\\pm$ .008 & -0.510 $\\pm$ .008 & 0.172 $\\pm$ .004 \\\\\n & SODEN & 0.661 $\\pm$ .012 & -0.510 $\\pm$ .009 & 0.172 $\\pm$ .004\\\\\n & Survival MDN & \\textbf{0.668} $\\pm$ .007 & \\textbf{-0.500} $\\pm$ .006 & \\textbf{0.168} $\\pm$ .002 \\\\ \\bottomrule\n \\end{tabular}\n \\caption{Evaluation of all models on GBSG with concordance ($C_{\\tau}^{td})$, integrated binomial log-likelihood (IBLL$_{\\tau}$) and integrated Brier score (IBS$_{\\tau}$). We report truncated metrics for $\\tau$'s satisfying $P(C>\\tau) = 10^{-8}, 0.2, 0.4$. The \\textbf{bold} number indicates the best performance. We report mean $\\pm$ standard error on all metrics.}\n \\label{tab:gbsg}\n\\end{table}\n\n\\paragraph{Integrated Binomial Log-Likelihood} Another common metric for survival analysis is the integrated binomial log-likelihood (IBLL). Different from IBS, IBLL uses binomial (Bernoulli) log-likelihood at each time step $t$:\n\\[\n\\text{BLL}(t) = \\frac{1}{N} \\sum_{i=1}^N \\left\\{\\frac{\\log(1-\\hat{S}(t|x_i)I(u_i\\leq t, \\Delta_i=1)}{\\hat{G}(u_i)}+ \\frac{\\log(\\hat{S}(t|x_i)I(u_i > t) }{\\hat{G}(t)}\\right\\}.\n\\]\n\nThe IBLL is defined by:\n\\[\n\\text{IBLL}_{\\tau} = \\frac{1}{\\tau}\\int_0^{\\tau} \\text{BLL}(t) dt.\n\\]\n We also report results for $\\tau$'s satisfying $\\hat{G}(\\tau) = 10^{-8} , 0.2, 0.4$. \n\n\\subsection{Experimental Setup}\nWe randomly split datasets into\ntraining, validation, and testing sets. We use the validation set to choose the best epoch from training and hyperparameters and report the results on the test set. For SUPPORT/METABRIC/GBSG, we use 10 splits (8 for training, 1 for validation and 1 for test). For MIMIC, we use 5 splits (3 for training, 1 for validation, and 1 for test) since MIMIC is a larger dataset. We use random search to create 100 independent trials for different hyperparameters. We use the optimizer RMSProp~\\citep{tieleman2012lecture}. \n\n\n For Survival MDN, following \\citet{sudarshan2020deep}, we use a three-layer neural network that maps the features to a latent representation, and then from the latent representation we use three layers to output $w$'s, $\\mu$'s, $\\sigma$'s separately. We use a \\texttt{softmax} layer to ensure that the sum of $w$'s equals one and use an \\texttt{exp} function to ensure the standard deviations $\\sigma$'s are positive. We vary the number of mixture components from 5 to 20. Different architectures can be used depending on the input type.\n \n For other models, we vary the number of layers. Other hyperparameters include the hidden sizes, learning rate, batch normalization, momentum, dropout, and batch size. For DeepHit and Nnet-Survival, we vary the number of time intervals in addition. \n For other hyperparameters, we use the same tuning ranges as in \\cite{tang2020soden}. We show the tuning ranges in \\cref{sec:tune}.\n \n\\subsection{Results}\nWe report the results on the four datasets in \\cref{tab:support} (SUPPORT), \\cref{tab:metabric} (METABRIC), \\cref{tab:gbsg} (GBSG), and \\cref{tab:mimic} (MIMIC). For SUPPORT and METABRIC, we use the exact same splits as the SODEN repository so we use their results for the baselines.\n\nFor concordance, DeepHit has the best concordance on SUPPORT and METABRIC while the continuous time model Survival MDN has the best concordances on GBSG and MIMIC. For IBLL and IBS, Survival MDN has the best performance across all datasets. The IBLL and IBS care more about the exact survival probability prediction at each time. The discrete time model DeepHit may not yield an accurate estimate of the survival probability for a particular time since it does not distinguish the times inside one time interval. For the discrete time models, it is also challenging to choose the bin boundaries~\\citep{kvamme2019continuous, tang2020soden, craig2021survival}. The discrete models' concordance on MIMIC is worse than that of SODEN and Survival MDN. Continuous time models Survival MDN and SODEN have similar performance on concordance on the four datasets since they are both flexible continuous time models. There is little difference among $\\hat{G}(\\tau) = 10^{-8}, 0.2, 0.4$ for the concordance, IBLL and IBS on small datasets SUPPORT, METABRIC and GBSG, which is the same observation in SODEN~\\citep{tang2020soden}.\n\n\\begin{table}[h]\n \\centering\n \\begin{tabular}{cc|ccc}\n \\toprule\n $P(C>\\tau)$ & Model & $C_{\\tau}^{td}(\\uparrow)$ & IBLL$_{\\tau}$($\\uparrow$) & IBS$_{\\tau}$($\\downarrow$) \\\\ \\midrule\n $10^{-8}$ & Cox &0.642 $\\pm$ .002 & -0.211 $\\pm$ .001& 0.061 $\\pm$ .001 \\\\\n & DeepSurv &\\textbf{0.663} $\\pm$ .001& -0.212 $\\pm$ .003 & 0.061 $\\pm$ .001\\\\\n & Cox-Time & 0.653 $\\pm$ .001 & -0.210 $\\pm$ .003 & 0.061 $\\pm$ .001 \\\\\n & Nnet-Survival & 0.649 $\\pm$ .002 & -0.206 $\\pm$ .000 &0.061 $\\pm$ .001 \\\\\n & DeepHit & 0.647 $\\pm$ .002 & -0.206 $\\pm$ .001 & 0.061 $\\pm$ .001 \\\\\n & SODEN & 0.659 $\\pm$ .002 & \\textbf{-0.204} $\\pm$ .002 & 0.060 $\\pm$ .001\\\\\n & Survival MDN & 0.660 $\\pm$ .002 & \\textbf{-0.204} $\\pm$ .002 & \\textbf{0.059} $\\pm$ .001 \\\\ \\midrule\n $0.2$ & Cox & 0.711 $\\pm$ .004 & -0.473 $\\pm$ .133 &0.091 $\\pm$ .014\\\\\n & DeepSurv & 0.734 $\\pm$ .003 & -0.462 $\\pm$ .150 &0.089 $\\pm$ .015 \\\\\n & Cox-Time & 0.726 $\\pm$ .002 & -0.443 $\\pm$ .126 &0.061 $\\pm$ .001 \\\\\n & Nnet-Survival & 0.722 $\\pm$ .004 & -0.229 $\\pm$ .004 & 0.066 $\\pm$ .001 \\\\\n & DeepHit & 0.719 $\\pm$ .004 & -0.233 $\\pm$ .004 & 0.066 $\\pm$ .001 \\\\\n & SODEN & 0.733 $\\pm$ .002 & -0.229 $\\pm$ .004 & \\textbf{0.065} $\\pm$ .001 \\\\\n & Survival MDN & \\textbf{0.736} $\\pm$ .003 & \\textbf{-0.228} $\\pm$ .004 & \\textbf{0.065} $\\pm$ .001 \\\\ \\midrule\n $0.4$ & Cox & 0.780 $\\pm$ .002 & -0.588 $\\pm$ .136 & 0.071 $\\pm$ .031 \\\\\n & DeepSurv & 0.797 $\\pm$ .001 & -0.423 $\\pm$ .202 &0.045 $\\pm$ .018 \\\\\n & Cox-Time & 0.790 $\\pm$ .002 & -0.501 $\\pm$ .267 & 0.037 $\\pm$ .010 \\\\\n & Nnet-Survival & 0.784 $\\pm$ .003 & -0.082 $\\pm$ .003 & \\textbf{0.018} $\\pm$ .001 \\\\\n & DeepHit & 0.787 $\\pm$ .003 & -0.083 $\\pm$ .002 & 0.019 $\\pm$ .001 \\\\\n & SODEN & \\textbf{0.805} $\\pm$ .005 & -0.084 $\\pm$ .002 & 0.019 $\\pm$ .001\\\\\n & Survival MDN & \\textbf{0.805} $\\pm$ .001 & \\textbf{-0.078} $\\pm$ .002 & \\textbf{0.018} $\\pm$ .001 \\\\ \\bottomrule\n \\end{tabular}\n \\caption{Evaluation of all models on MIMIC with concordance ($C_{\\tau}^{td})$, integrated binomial log-likelihood (IBLL$_{\\tau}$) and integrated Brier score (IBS$_{\\tau}$). We report truncated metrics for $\\tau$'s satisfying $P(C>\\tau) = 10^{-8}, 0.2, 0.4$. The \\textbf{bold} number indicates the best performance. We report mean $\\pm$ standard error on all metrics.}\n \\label{tab:mimic}\n\\end{table}\nThe training time of SODEN is much longer than Survival MDN. We collect the training time of two models with the same hidden size 32 and number of layers 4 on METABRIC. We use the maximum number of components in the tuning range 20 for Survival MDN. We show the test concordance versus the training time for Survival MDN and SODEN on \\texttt{GeForce RTX 2080 Ti} in \\cref{fig:time_plot}. We can see that Survival MDN reached the peak of the test concordance much faster than SODEN. On average, each epoch of Survival MDN costs 0.20 seconds while each epoch of SODEN costs 23.82 seconds. Training Survival MDN is more than 100 time faster than SODEN.\n\n\nSurvival modeling plays an important role in risk estimation and clinical decision making. We propose Survival MDN, a simple flexible continuous time survival model. We combine two simple yet elegant tools---mixture densities and change of variables---to produce flexible survival models. While recent approaches achieve similar flexibility, it is achieved at the expense of training time, complexity, and inconvenient hyper-parameters. Without introducing such complexity, Survival MDNs achieve similar or better performance.\n\n\\paragraph{Limitations} Currently, the proposed model, Survival MDN, mainly considers Gaussian Mixtures. Though Gaussian Mixtures have universal approximation power, a combination of different base distributions, e.g. generalized logistics, in mixture density networks may improve performance. \nRegarding experimental evaluation, the marginal censoring assumption used in the reweighting estimators is common practice in the literature, but may not be appropriate.\nEvaluation with censored data is impossible without assumptions, but it could be possible to improve evaluation by making conditional censoring assumptions.\n\n\\acks{\nThis work was made possible by the following grants/awards:\n\\begin{itemize}\n \\item NIH/NHLBI Award R01HL148248 \n \\item NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science.\n \\item NSF CAREER Award 2145542\n\\end{itemize}\nThe authors thank Weijing Tang, Jiaqi Ma, Qiaozhu Mei and Ji Zhu for providing a great codebase. The authors thank Weijing Tang for a detailed explanation of the codebase.}\n\n\n\\bibliography{references}\n\n\\appendix\nMIMIC SQL code}\n\\begin{verbatim}\nselect\n-- ids\npat.subject_id as subject_id, adm.hadm_id as hadm_id,icu.stay_id as stay_id,\n-- demographics\nCASE WHEN pat.gender=\"M\" THEN 1 ELSE 0 END as is_male,\nCASE WHEN adm.ethnicity=\"WHITE\" THEN 1 ELSE 0 END as is_white,\nicu_detail.admission_age as age,\n-- weight height \nfdw.weight ,\nfdh.height ,\n-- LOS\nicu.los as los_icu_days,\nicu_detail.los_hospital as los_hosp_days,\n-- death\n--icu_detail.icu_intime as icu_intime,\n--icu_detail.dod as dod,\nTIMESTAMP_DIFF(icu_detail.dod, icu_detail.icu_intime, HOUR) / 24 as time_to_death,\ncase \n when icu_detail.dod is null then 0 \n else 1\nend \nas death, \n-- vitals labs min max mean\nvitals.*,\nlabs.*,\nsofa.*\nfrom `physionet-data.mimic_core.patients` pat \ninner join \n `physionet-data.mimic_core.admissions` adm \n on pat.subject_id=adm.subject_id\ninner join \n `physionet-data.mimic_icu.icustays` icu \n on adm.subject_id=icu.subject_id\n and \n adm.hadm_id=icu.hadm_id \ninner join \n `physionet-data.mimic_derived.first_day_height` fdh\n on \n adm.subject_id = fdh.subject_id and icu.stay_id = fdh.stay_id \ninner join \n `physionet-data.mimic_derived.first_day_weight` fdw\n on \n adm.subject_id = fdw.subject_id and icu.stay_id = fdw.stay_id \ninner join \n `physionet-data.mimic_derived.icustay_detail` icu_detail \n on \n adm.subject_id=icu_detail.subject_id\n and \n adm.hadm_id=icu_detail.hadm_id \n and \n icu.stay_id=icu_detail.stay_id \ninner join \n `physionet-data.mimic_derived.first_day_sofa` sofa\n on \n adm.subject_id=sofa.subject_id\n and \n adm.hadm_id=sofa.hadm_id \n and \n icu.stay_id=sofa.stay_id \n\ninner join \n `physionet-data.mimic_derived.first_day_vitalsign` vitals\n on \n adm.subject_id=vitals.subject_id\n and \n icu.stay_id=vitals.stay_id\ninner join \n `physionet-data.mimic_derived.first_day_lab` labs\n on \n adm.subject_id=labs.subject_id\n and \n icu.stay_id=labs.stay_id\nwhere icu_detail.los_icu > 1\n and pat.gender is not null \n and adm.ethnicity is not null\n and adm.ethnicity != \"UNABLE TO OBTAIN\"\n and adm.ethnicity != \"UNKNOWN\" \n\\end{verbatim}\nTuning Ranges of Hyperparameters}\nWe show the search range of hyperparameters in \n\\cref{tab:tune}.\n\\begin{table}[h]\n \\centering\n \\begin{tabular}{cc}\n \\\\\\toprule\n Batch size & $\\{32, 64, 128, 256\\}$ for METABRIC, GBSG \\\\\n & $\\{128, 256, 512\\}$ for SUPPORT \\\\\n & $\\{512, 1024\\}$ for MIMIC \\\\\n Number of layers & $\\{1,2,4\\}$ \\\\\n Hidden size & $[2^2, 2^7]$ \\\\\n Learning rate & $[10^{-4.5}, 10^{-1.5}]$ \\\\\n Weight decay & $[10^{-9}, 10^{-4}]$ \\\\\n Momentum & $[0.85, 0.99]$ \\\\\n Dropout & $\\{0, 0.1, 0.5\\}$ \\\\\n Batch normalization & $\\{\\text{True, False}\\}$ \\\\\n $\\alpha$ (Surrogate ranking loss in DeepHit) & $[0,1]$ \\\\\n $\\sigma$ (Surrogate ranking loss in DeepHit) & $\\{0.25, 1, 5\\}$ \\\\\n Number of intervals& \\{10, 50, 100, 200, 400\\} for SUPPORT, METABRIC, GBSG \\\\\n (DeepHit, Nnet-survival) & \\{50, 100, 200, 400, 800\\} for MIMIC \\\\\n \\bottomrule\n \\end{tabular}\n \\caption{Tuning ranges of hyperparameters}\n \\label{tab:tune}\n\\end{table}\n\n\\clearpage\nDiscussion on Different Base Distributions}\n Here we compare Gaussian base with an altenative base, the generalized logistic distribution, on marginal data generations. We use the following form of the generalized logistic distribution: \n \\[\nF(x; \\alpha) = 1 - \\frac{e^{-\\alpha x}}{(1 + e^{-x})^{\\alpha}}.\n \\]\nWe also shift the generalized logistic distribution using scale and location. In this generalized logistic distribution, we have one more parameter $\\alpha$ which can control the magnitude of the power. \n\nWe consider three different marginal data generation cases:\n \\begin{itemize}\n \\item LogNormal distribution with $\\mu = 0.1$ and $\\sigma = 0.1$. LogNormal distribution is a common one researchers use in survival analysis. The variance is small in this data generation distribution.\n \\item Student T distribution with degree of freedom one and transformed to positive values through softplus. Student T distribution has a heavy tail.\n \\item Gamma distribution with shape 0.1 and scale 1. When shape is smaller than one, the Gamma distribution put a lot of mass on values close to zero. This may be hard for a mixture model to fit.\n \\end{itemize}\n\n We sample the censored time uniformly from $[0,10]$. We still use an online training which generates a whole new batch data in every update step. \n\n The results of fitting LogNormal data is shown in \\cref{fig:lognormal}. The Gaussian base has survival functions overlapping with the ground truth but the generalized logistic base cannot fit it well. \n \n \n The result of fitting student T data is shown in \\cref{fig:studentT}. For heavy tailed student T, both Gaussian base and generalized logistic base can also fit it well with survival functions overlapping the ground truth.\n \n \n The result of fitting Gamma data is shown in \\cref{fig:gamma}. The generalized logistic base can fit the Gamma data well while there is some gap between the ground truth and the Gaussian base survival function. In Gamma data with a small shape, the generalized logistic base is a better choice. \n ", "images": []}
|
|
|
|
interleaved/9c83aa6b-5fdf-4aa7-94d5-986f6fe45433.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/9ccc1adf-0e6a-49c6-9ebd-1206072217c3.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/9fe3fdc9-6c09-4a04-8a84-eb04c9d865e8.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/a5f98735-69c2-4045-ae2a-cf0b7d14b293.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/a8d84fbd-67c2-4757-946c-4e588a705707.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "Hybrid tomographic techniques, which use two different physical signals to obtain enhanced images, have been extensively studied.\nPhotoacoustic tomography (PAT) is the most successful example of hybrid biomedical imaging that is based on the photoacoustic effect discovered by Bell \\cite{bell80}. It offers the advantages of both pure optical and ultrasound imaging.\nPure ultrasound imaging typically provides high-resolution images with a low contrast between the cancerous and healthy tissues, whereas optical or radio-frequency electromagnetic imaging offers high contrast with low resolution. \nPAT incorporates ultrasound and optical or radio-frequency electromagnetic waves and provides high-contrast and high-resolution images.\n\nIn PAT, the object of interest is irradiated by pulsed nonionizing electromagnetic energy, causing a small level of heating in the interior. \nA pressure wave is generated by the resulting thermoelastic expansion and propagates through the object. \nThe electromagnetic energy absorbed is significantly higher in cancerous cells than in healthy tissues, and this absorbed energy represents the initial pressure.\nSpecifically, the initial pressure $f$ contains highly useful diagnostic information.\nThe pressure $p$ is measured using acoustic transducers placed along a surface completely (or partially) surrounding the object (see \\cite{ammaribjk10,ammarigjn13,kuchment14book,kuchmentk08,xuw06} or references therein).\n\nIn this section, we describe the mathematical model underlying PAT and address the key mathematical problems associated with it.\nWe assume that point-like broadband ultrasound transducers are located along the observation surface $\\Gamma$. \nThus, the measured data represent the values of pressure $p(\\xx,t)$ along the observation surface $\\Gamma$, that is, $p(\\xx,t)|_{\\xx\\in S}$.\nIt is assumed that $\\Gamma$ is a unit sphere $S^{n-1}$ and \nthe object of interest is inside a unit ball $B$.\nWith the speed of sound $c(\\xx)$ at a location $\\xx$, the following model is usually used to describe the propagating pressure wave $p(\\xx,t)$ generated in PAT for $n=2,3$:\n\\begin{equation}\\label{eq:pdeofpatorgin}\n\\begin{array}{ll}\n\\partial_t^2 p(\\xx ,t)=c(\\xx)\\triangle_{\\xx }p(\\xx ,t)\\qquad(\\xx ,t)\\in B\\times[0,\\infty)\\\\\np(\\xx ,0)=f(\\xx ) \\quad\\mbox{and}\\quad\\partial _t p(\\xx ,0)=0 \\\\\n\\partial_\\nu p(\\xx,t)=0\\quad\\mbox{on}\\quad\\xx\\in S^{n-1}\\times[0,\\infty).\n\\end{array}\n\\end{equation}\nHere $f(\\xx)$ is the required PAT image and $\\nu$ denotes the outward normal to $S^{n-1}$ (\\cite{ammarikk12,ammaridkk15}).\n\n\n\nOne of the mathematical problems associated with PAT is the recovery of the initial function $f$ from the solution of the wave equation on $S^{n-1}$.\nWe define the wave forward operator as $\\mathcal W f(\\xx,t)=p(\\xx,t),$ $(\\xx,t)\\in B\\times [0,\\infty)$. \n\n\n\nThere are several methods for reconstructing $f$ from $\\mathcal Wf|_{S^{n-1}\\times[0,\\infty)}$ at constant speed \\cite{ammaribjk10,ammarigjn13} (several studies \\cite{anastasiozmr07,moonip18,dreierh20,finchhr07,finchpr04,moonjop16,kostlifbw01,kunyansky12,moonjmaa18,natterer12,nguyen09,sandbichlerkbbh15,moonzangerlip19} have focused on this topic, but $\n\\mathcal Wf $ satisfies the wave equation on the entire space without a boundary condition).\nIn this study, we obtain the singular value decomposition (SVD) of $\\mathcal W$ under the assumption that the speed of sound is a radial function, that is, $c(|\\xx|)$.\nIt is also reasonable to assume that the speed of sound is strictly positive, bounded away from $0$ and above, that is, $0<c_m< c(|\\xx|)< c_M$ for positive constants $c_m$ and $c_M$ \\cite{agranovskyk07,ammaridkk15,ammarikk12,stefanovu09}. \n\n\nSVD is a powerful tool used to analyze a compact operator \\cite{kazantsev15} and provides considerable insight into characterizing the range of the operator and inverting it for the corresponding inverse problems \\cite{louis84,natterer01}.\nAlthough the SVD of a spherical Radon transform has been studied previously \\cite{moonip20,quinto83}, to the best of our knowledge, this is the first study focusing on the SVD of the wave forward operator.\n\nIn Section \\ref{sec:SVD}, we present the SVD of the wave forward operator. To obtain the SVD, we find an orthonormal basis of $L^2(B, c(|\\xx|)^{1-n})$ consisting of the eigenfunctions of the following based on the Sturm--Liouville problem: \n $$\n c(|\\xx|)\\triangle_\\xx\\phi(\\xx)+\\mu^2_k\\phi(\\xx)=0 \\quad\\mbox{ and }\\quad\\partial_\\nu \\phi|_{S^{n-1}}=0.\n$$\n Section \\ref{sec:numerical} presents numerical simulations.\nThe continuous Galerkin finite element method (CG FEM) is used to determine the orthonormal basis $\\{\\phi\\}$ and demonstrate the validity of the inversion formula derived from the SVD.\nThis paper ends with a discussion of the wave forward operator with the Dirichlet boundary condition in Section \\ref{sec:discuss}.\n\n\n\n\\subsection{Preliminaries}\n\nIn this subsection, we introduce a certain non-separable Hilbert space on $S^{n-1}\\times [0,\\infty)$ that becomes the codomain of the wave forward operator. \nFirst, for two functions $g_1$ and $g_2$ on $[0,\\infty)$, \n$$\n\\langle g_1,g_2\\rangle_{H[0,\\infty)}=\\lim_{A\\to\\infty}\\frac2{A}\\intL_{0}^A g_1(t)\\overline{g_2(t)}{\\rm d}t\n$$\nis defined as an inner product that satisfies the following property:\n\\begin{prop}\\label{prop:hilbert1}\nFor $\\iota,\\iota'>0$,\n\\begin{equation}\\label{eq:expansionSVDH1}\n\\langle \\cos(\\iota\\cdot),\\cos(\\iota' \\cdot) \\rangle_{H[0,\\infty)}=\\delta_{\\iota,\\iota'}.\n\\end{equation}\n\\end{prop}\n\\begin{proof}\nDirect computation yields \n$$\n\\begin{array}{ll}\n\\langle \\cos(\\iota \\cdot),\\cos(\\iota \\cdot) \\rangle_{H[0,\\infty)}=\\displaystyle\\lim_{A\\to \\infty}\\frac2{A}\\intL_{0}^A \\cos^2(\\iota t){\\rm d}t=\\lim_{A\\to \\infty}\\left(1+\\frac{\\sin(2A \\iota)}{2 A\\iota}\\right)=1,\n\\end{array}\n$$\nand for $\\iota\\neq\\iota'$\n$$\n\\begin{array}{ll}\n\\langle \\cos(\\iota \\cdot),\\cos(\\iota' \\cdot) \\rangle_{H[0,\\infty)}\\displaystyle=\\lim_{A\\to \\infty}\\frac2{A}\\intL_{0}^A \\cos(\\iota t)\\cos(\\iota' t){\\rm d}t\n=\\lim_{A\\to \\infty}\\frac{2\\iota'\\cos(A\\iota)\\sin(A\\iota')-2\\iota\\cos(A\\iota')\\sin(A\\iota)}{(\\iota'^2-\\iota^2)A}=0.\n\\end{array}\n$$\n\\end{proof}\nFor $(l,\\iota)\\in\\mathbb Z\\times [0,\\infty)$, let \n$$\n\\Psi_{l,\\iota}(\\ttheta,t)=\\frac{e^{\\mathrm{i}l\\theta}\\cos(\\iota t)}{\\sqrt{2\\pi}} \\quad\\mbox{for}\\quad (\\ttheta,t)\\in S^1\\times [0,\\infty),\n$$\nand $X(S^1\\times[0,\\infty))$ be the complex vector space comprising all finite linear combinations of these functions $\\Psi_{l,\\iota}$.\nFor $f,g\\in X(S^1\\times[0,\\infty))$ and $a>0$,\n$$\n\\langle f,g \\rangle_H=\\lim_{A\\to\\infty}\\frac2{A}\\intL_{0}^A \\intL_{S^1}f(\\ttheta,t)\\overline{g(\\ttheta,t)}{\\rm d}S(\\ttheta){\\rm d}t\n$$\nis an inner product, which transforms $X(S^1\\times[0,\\infty))$ into a unitary space. Here, $\\bar z$\nis a complex conjugate of $z\\in\\mathbb C$. The completion of this space is a non-separable Hilbert space $H(S^1\\times[0,\\infty))$ such that $\\{\\Psi_{l,\\iota}\\}_{(l,\\iota)\\in \\mathbb Z\\times [0,\\infty)}$ is an orthonormal basis in $H(S^1\\times[0,\\infty))$ (see \\cite{rudin87}).\n\n\n\n\n\n\n\n\n\n\nTo obtain the SVD of the wave forward operator, we determine the eigenfunctions of \n $$\n c(|\\xx|)\\triangle_\\xx\\phi(\\xx)+\\mu^2_k\\phi(\\xx)=0\n\\quad\\xx\\in B\\quad\\mbox{ and }\\quad\\partial_\\nu \\phi|_{S^{n-1}}=0,\n$$\nwhich form an orthonormal basis for $L^2(B, c(|\\xx|)^{1-n})$, the $L^2$-space defined on $B$ with a weight $c(|\\xx|)^{1-n}$.\n\\label{sec:SVD}\n\nIn this section, we present the SVD of the wave forward operator $\\mathcal Wf$ with a radial variable speed.\n\n\\subsection{Two dimensions}\n\nFor $l\\in\\mathbb Z$, let $\\phi_{k,l}(\\xx)=h_{k,l}(|\\xx|)e^{\\mathrm{i}l\\theta_\\xx}/\\sqrt{2\\pi}$ and $\\tan\\theta_\\xx=x_1/x_2.$\nAs $\\triangle_\\xx\\phi =\\partial_{r_{}}^2\\phi+r_{ }^{-1}\\partial_{r_{}}\\phi+r_{ }^{-2}\\partial_{\\theta_{\\xx}}^2\\phi$, \n\\begin{equation}\\label{eq:helmholtz}\nc(|\\xx|)\\triangle_\\xx\\phi_k(\\xx)+\\mu^2_k\\phi_k(\\xx)=0\\quad\\mbox{on}\\quad \n \\xx=(r\\cos\\theta_{\\xx},r\\sin\\theta_{\\xx}) \\in B \\subset\\RR^2 \n\\end{equation}\ncan be transformed into \n$$\n c(r)\\left(\\partial_{r}^2h_{k,l}(r)e^{\\mathrm{i}l\\theta_{\\xx}}+r_{}^{-1}\\partial_{r}h_{k,l}(r)e^{\\mathrm{i}l\\theta_{\\xx}}-r_{}^{-2}l^2h_{k,l}(r)e^{\\mathrm{i}l\\theta_{\\xx}}\\right)+\\mu^2_{k,l}h_{k,l}(r)e^{\\mathrm{i}l\\theta_{\\xx}}=0,\n$$\nor equivalently\n\\begin{equation}\\label{eq:ode}\nr_{}^2\\frac{\\rm d^2}{{\\rm d}r^2}h_{k,l}(r_{})+r_{{}}^{}\\frac{\\rm d}{{\\rm d}r}h_{k,l}(r_{})+\\left(\\frac{r_{}^2\\mu_{k,l}^2}{c(r_{})}-l^2\\right)h_{k,l}(r_{})=0.\n\\end{equation}\nAfter removing the indices for convenience, \\eqref{eq:ode} can be written as\n\\begin{equation}\\label{eveq:ode}\n \\mathcal S(h)(r)= \\mu^2 r c(r)^{-1} h(r), \n\\end{equation}\nwhere \n\\begin{equation*}\n \\mathcal S(h)(r):=-\\frac{\\rm d}{{\\rm d}r} \\left(r\\frac{\\rm d}{{\\rm d}r} h(r) \\right)+\\frac{l^2}{r}h(r).\n\\end{equation*}\nIt must be noted that $ \\mathcal S$ is the Sturm--Liouville operator acting on a weighted $L^2$ space, $L^2((0,1], rc(r)^{-1})$, and \\eqref{eveq:ode} is its eigenvalue equation. Therefore, we can apply the theory of Sturm--Liouville operators. To ensure that the manuscript is concise as well as self-contained, the method of application is described in Appendix A of this paper. Using the method presented in Appendix A and applying the boundary condition, we obtain \n\\begin{equation}\\label{bc at 1}\n\\frac{\\rm d}{{\\rm d}r} h(1)=0\n\\end{equation}\nat the endpoint $r=1$. Then $\\mathcal S$ becomes a self-adjoint operator on $L^2((0,1], rc(r)^{-1})$ such that its spectrum is purely discrete (i.e., its eigenvalues are of finite multiplicities, and no continuous spectrum is observed); therefore, the set of eigenfunctions corresponding to $\\mathcal S$ forms an orthonormal basis on $L^2((0,1], rc(r)^{-1})$. \n\n\n\n\n\n\n\n\n\n\\begin{comment}\nConsider \\eqref{eq:T} in the following Sturm--Liouville problem with boundary condition \\eqref{eq:IC}:\n\\begin{equation}\n\\displaystyle r\\frac{\\rm d^2}{{\\rm d}r^2} h(r)+\\frac{\\rm d}{{\\rm d}r}h(r)+\\left(\\frac{\\mu_{k,l}^2r}{c(r)}-\\frac{l^2}r\\right)h(r)=0\\quad\\mbox{on}\\quad (0,1]\\label{eq:SLP}.\n\\end{equation}\n\\end{comment}\n\nLet $h_{k,l}$ be the eigenfunctions of \\eqref{eveq:ode} with eigenvalue $\\mu_{k,l}$ such that for a fixed $l$, $\\mu_{1,l}<\\mu_{2,l}<\\mu_{3,l}\\cdots$. It is well known that the dimension of the eigenspace is one because of the separated boundary condition. The detailed description is provided in a previous study (Section 3.6 of \\cite{folland92}).\nThus, \n$$\n\\phi_{k,l}(\\xx)= h_{k,l}(|\\xx|)e^{\\mathrm{i}l\\theta_\\xx}/\\sqrt{2\\pi}, (k,l)\\in\\mathbb N\\times\\mathbb Z\n$$\nare the eigenfunctions of \n$$\nc(|\\xx|)\\triangle_\\xx\\phi(\\xx)+\\mu^2_k\\phi(\\xx)=0 \\quad\\mbox{ and }\\quad\\partial_\\nu \\phi|_{S^{1}}=0\n$$ \nwith eigenvalue $\\mu_{k,l}$ and they form an orthonormal basis of $L^2_{}(B,c(|\\xx|)^{-1})$, because \n$$\n||\\phi_{k,l}||_{L^2(c(\\xx)^{-1})}^2=(2\\pi)^{-1}\\intL_{0}^{2\\pi}\\intL\\half |h_{k,l}(r)e^{\\mathrm{i}l\\theta}|^2rc(r)^{-1}{\\rm d}r{\\rm d}\\theta=1.\n$$\n\nThus, for $f\\in L^2_{}(B,c(|\\xx|)^{-1})$, \n$$\nf(\\xx)=\\sum_{l\\in\\mathbb Z}\\sum_{k=1}^\\infty \\langle f,\\phi_{k,l}\\rangle \\phi_{k,l}(\\xx)\\quad\\mbox{and}\\quad \\langle f,g \\rangle=\\intL_{ B}f(\\xx)\\overline{g(\\xx)}c(\\xx)^{-1}{\\rm d}\\xx.\n$$\n\\begin{rmk}\nIf $h_{k,l}(1)=0$, then because of the uniqueness of the initial value problem of the second linear differential equations, the boundary condition $\\frac{\\rm d}{{\\rm d}r} h_{k,l}(1)=0$ (from \\eqref{bc at 1}) yields $h_{k,l}\\equiv0$. Therefore, for nonzero values, $h_{k,l}(1)\\neq0$. \n\\end{rmk}\n\n\n\n\n\n\n\n\n\n\n\n\n\nFor two Hilbert spaces $X$ and $Y$, let $\\mathcal A: X \\to Y$ be a linear operator.\nThe triple $\\{\\Phi_l,\\Psi_l,\\sigma_l\\}_{l\\ge 0}$ is an SVD of the operator $\\mathcal A$ if\n\\begin{itemize}\n\t\\item $\\{\\Phi_l\\}_l$ is an orthonormal basis in $X$;\n\t\\item $\\{\\Psi_l\\}_l$ is an orthonormal set in $Y$;\n\t\\item $\\{\\sigma_l\\}_l$ is a set of non-zero constants, $\\mathcal A\\Phi_l=\\sigma_l \\Psi_l.$\n\\end{itemize}\n\n\n\\begin{thm}\\label{thm:co}\nLet $\\phi_{k,l}(\\xx)= h_{k,l}(|\\xx|)e^{\\mathrm{i}l\\theta_\\xx}/\\sqrt{2\\pi}$, where $h_{k,l}$ is the solution of \\eqref{eq:ode} (with \\eqref{bc at 1}) corresponding to the eigenvalue $\\mu_{k,l}$ and $||h_{k,l}||_{L^2((0,1],rc(r)^{-1})}=1$.\nThen, \n\\begin{enumerate}\n\\item \nfor $f\\in \\operatorname{span}\\{\\phi_{k,l}:k\\in\\mathbb N,l\\in\\mathbb Z\\}$, we obtain\n$$\n\\mathcal W_{}f(\\xx,t)=\\sum_{k,l} \\langle f,\\phi_{k,l} \\rangle \\phi_{k,l}(\\xx)\\cos(\\mu_{k,l}t).\n$$ \n\\item $\\{\\phi_{k,l},\\Psi_{l,\\mu_{k,l}},h_{k,l}(1)\\}$ is an SVD of $\\mathcal W_{}\\cdot|_{S^{1}\\times[0,\\infty)}:L^2_{}(B,c(|\\xx|)^{-1})\\to H(S^1\\times[0,\\infty))$.\n\\item For $f\\in \\operatorname{span}\\{\\phi_{k,l}:k\\in\\mathbb N,l\\in\\mathbb Z\\}$, we obtain \n\\begin{equation}\\label{eq:main}\n\\langle f,\\phi_{k,l} \\rangle = |h_{k,l}(1)|^{-2} \\langle \\mathcal W_{}f,\\phi_{k,l}(\\cdot)\\cos(\\mu_{k,l} \\cdot)\\rangle_H.\n\n\\end{equation}\n\\end{enumerate}\n\\end{thm}\n\\begin{proof}\n\\textit{1}. We can easily check this.\n\n\\textit{2}. It must be noted $\\mathcal W_{}\\phi_{k,l}(\\ttheta,t)=h_{k,l}(1)\\Psi_{l,\\mu_{k,l}}(\\ttheta,t)$ and $\\{\\Psi_{l,\\iota}\\}_{(l,\\iota)\\in \\mathbb Z\\times [0,\\infty)}$ is an orthonormal basis in $H(S^1\\times[0,\\infty))$, as obtained in Proposition \n\\ref{prop:hilbert1}.\n\n\n\\textit{3}. Considering the right-hand side of \\eqref{eq:main}, we obtain\n$$\n\\begin{array}{l}\n \\langle \\mathcal W_{}f,\\phi_{k,l}(\\cdot)\\cos(\\mu_{k,l} \\cdot)\\rangle_H\\\\\n \\qquad\\qquad =\\displaystyle \\sum_{l',k'}\\intL_{S^1} \\overline{\\phi_{k,l}(\\ttheta)}\\phi_{k',l'}(\\ttheta) {\\rm d}S(\\ttheta) \\langle f,\\phi_{k',l'} \\rangle \\langle\\cos(\\mu_{k',l'}\\cdot),\\cos(\\mu_{k,l} \\cdot)\\rangle_{H[0,\\infty)}\\\\\n \\qquad\\qquad=\\displaystyle\\sum_{l',k'} |h_{k,l}(1)|^2 \\delta_{l,l'}\\delta_{k,k'} \\langle f,\\phi_{k,l}\\rangle,\n\\end{array}\n$$\nwhere in the first equality, we used the condition {\\textit 1}, and in the second equality, we used Proposition \\ref{prop:hilbert1} and the orthogonality of $e^{\\mathrm{i}l\\theta}$.\n \n\\end{proof}\n\n\n\n \\subsection{Three dimensions}\nIn this section, we consider the case of three dimensions ($n=3$). In this case, the same procedure, as described above, is followed (except for the slightly different eigenvalue equation \\eqref{eq:ode 3D}). \n \nFor $l=0,1,2,\\cdots $ and $ k=-l,\\cdots,l$, let $\\tilde\\phi_{m,lk}(\\xx)=\\tilde h_{m,l}(|\\xx|)Y_{lk}\\left(\\frac{\\xx}{|\\xx|}\\right)$, where $Y_{lk}$ are spherical harmonics.\nBecause $\\triangle_\\xx\\phi =r^{-2}\\partial_{r_{}} (r^2\\partial_{r_{}}\\phi)+r_{ }^{-2}\\triangle_{S^2}\\phi$, the equation \n\\begin{equation*}\\label{eq:helmholtz3d}\nc(|\\xx|)\\triangle_\\xx\\phi_k(\\xx)+\\mu^2_m\\phi_m(\\xx)=0\\quad\\mbox{on}\\quad B\\subset\\RR^3\n\\end{equation*}\nbecomes for $l\\in\\mathbb Z$ and $ k=-l,\\cdots,l$,\n$$\n \\frac{c(r)}{r^{2}}\\left\\{\\frac{\\rm d}{{\\rm d}r}\\left(r^2\\frac{\\rm d}{{\\rm d}r} \\tilde h_{m,l}(r)\\right)Y_{lk}\\left(\\ttheta\\right)+r_{ }^{-2}\\tilde h_{m,l}(r)\\triangle_{S^2}Y_{lk}\\left(\\ttheta\\right)\\right\\}+\\tilde\\mu^2_{m,l}\\tilde h_{m,l}(r)Y_{lk}\\left(\\ttheta\\right)=0,\n$$\nor equivalently,\n\\begin{equation}\\label{eq:ode 3D}\n\\frac{\\rm d}{{\\rm d}r}\\left(r^2\\frac{\\rm d}{{\\rm d}r}\\tilde h_{m,l}(r_{})\\right)+\\left(\\frac{r_{}^2\\tilde\\mu_{m,l}^2}{c(r_{})}-l(l+1)\\right)\\tilde h_{m,l}(r_{})=0,\n\\end{equation}\nas $\\triangle_{S^2}Y_{lk}\\left(\\ttheta\\right)=-l(l+1)\\triangle_{S^2}Y_{lk}\\left(\\ttheta\\right)$.\n\n\n\n\nSimilar to the case when $n=2$ (see Remark in Appendix A), the operator related to \\eqref{eq:ode 3D} with the boundary condition $\\frac{\\rm d}{{\\rm d}r}\\tilde h(1)=0$ is a self-adjoint operator and purely discrete. Therefore, the corresponding eigenfunctions form an orthonormal basis in the weighted $L^2$-space $L^2((0,1],r^2c(r)^{-1})$. \n\nFurthermore, similar to the $n=2$ case, we can easily verify that $\\mathcal W_{}\\phi_{k,l}(\\ttheta,t)= \\tilde h_{m,l}(1)Y_{lk}(\\ttheta)\\cos(\\tilde\\mu_{m,l}t)$, and for $f\\in L^2_{}(B^1,c(|\\xx|)^{-1})$, we obtain\n$$\n\\mathcal W_{}f(\\xx,t)=\\sum_{k,l} \\langle f,\\tilde\\phi_{m,lk} \\rangle \\tilde \\phi_{m,lk}(\\xx)\\cos(\\tilde \\mu_{m,l}t)\n$$ \nand \n\\begin{equation*}\n\\langle f,\\tilde\\phi_{m,lk} \\rangle =\\displaystyle\\lim_{A\\to\\infty}\\frac{2}{ |\\tilde h_{m,l}(1)|^2 A}\\intL^{A}_0\\intL_{S^2} \\overline{\\tilde\\phi_{m,lk}(\\ttheta)}\\cos(\\tilde\\mu_{m,l} t)\\mathcal W_{}f(\\ttheta,t) {\\rm d}S(\\ttheta){\\rm d}t.\n\\end{equation*}\nFinally, it can be concluded that $\\{\\tilde\\phi_{m,l}(\\xx),Y_{lk}(\\ttheta)\\cos(\\tilde\\mu_{m,l}t),\\tilde h_{m,l}(1)\\}$ is the SVD of the wave forward operator $\\mathcal W_{}\\cdot|_{S^{3}\\times[0,\\infty)}:L^2_{}(B^1,c(|\\xx|)^{-1})\\to H(S^1\\times[0,\\infty))$.\n\\begin{comment}\nFor any $f\\in L^2((0,1], r^2c(r)^{-1})$, the operator related to \n\n\\begin{equation}\\label{eq:T for 3D}\n\\displaystyle \\frac{c(r)}{r^2}\\left(-\\frac{\\rm d}{{\\rm d}r}\\left(r^2\\frac{\\rm d}{{\\rm d}r}\\right)+l(l+1)\\right)h(r)=f(r)\\quad\\mbox{on}\\quad (0,1] \n\\end{equation}\nwith the boundary condition $h'(1)=0.$ \n\\end{comment}\n\\label{sec:numerical}\n\nIn this section, we present the reconstruction results in 2D with the SVD of the wave forward operator $\\mathcal W_{}$ with radial variable coefficients in Theorem \\ref{thm:co} using data $\\mathcal W_{}f$. To numerically solve \\eqref{eq:pdeofpatorgin} and \\eqref{eq:helmholtz}, we consider a circular domain $B$ of radius 3 in 2D. For space discretization, we divide the circular domain $B$ into triangulations with $h_{max} = 0.15$. It contains unstructured grid points in $B$ owing to its geometric characteristics. For time discretization, we set the maximum time as $T = 800$ with a time-step size ${\\rm d}t = 0.0016$. To compute the eigenfunction $\\phi_{}$ of \\eqref{eq:helmholtz}, we use the continuous Galerkin (CG) FEM on the 2D circular domain $B$. The discrete problem with the CG FEM is as follows \\cite{gross07, larsson03}:\n\\begin{equation}\\label{eq:helmholtz_fem}\na(\\phi, v_h) = \\mu^2 m(\\phi, v_h) \\quad \\text{for all} \\quad v_h\\in V_h ,\n\\end{equation}\nwhere the finite element space $V_h$ is composed of piecewise linear functions and \n$$\na(u, v)=\\intL_B \\nabla u(\\xx) \\cdot \\nabla v(\\xx) {\\rm d}\\xx\\quad \\mbox{ and }\\quad m(u, v)=\\intL_B c(|\\xx|)^{-1} u(\\xx) v(\\xx) {\\rm d}\\xx.\n$$\n\nTo generate data from the phantom distribution $f$ and orthonormal basis set $\\{\\Psi_k\\}$, we must numerically compute $\\mathcal W_{}f$ and $\\mathcal W \\phi$. Therefore, we consider the following discrete wave propagation using the CG FEM with a backward-Euler scheme \\cite{gross07, larsson03}:\n\\begin{equation}\\label{eq:pat_fem}\n\\begin{array}{rl}\n\\displaystyle c(\\xx)^{-1}\\frac{1}{\\Delta t ^2}\\left(p_h^{n+1}-2p_h^n+p_h^{n-1}, v_h\\right)+ a\\left( p_h^{n+1}, v_h\\right)&=0\\\\\n\\displaystyle p_h^0=f~ \\mbox{or}~ \\phi_k \\quad\\mbox{and}\\quad\n\\frac{1}{\\Delta t}\\left(p_h^1 - p_h^{0}\\right) &=0,\n\\end{array}\n\\end{equation}\nfor all $v_h \\in V_h$.\nHere, the bilinear form $a$ and finite element space $V_h$ are the same as those in \\eqref{eq:helmholtz_fem}. \n\n\n\nWe tested our methods on two different distributions of the coefficient $c(|\\xx|)$, as shown in Figure \\ref{fig:coef}. The coefficient in Figure \\ref{fig:coef}(a) is continuous and defined as $c_1(|\\xx|)=1/(1+|\\xx|^2)$. The coefficient in Figure \\ref{fig:coef}(b) takes the values one and five in the blue and red regions, respectively. These two coefficients reproduce the fact that the speed of the ultrasonic wave continuously decreases as it moves away from the center and it is discontinuous before and after passing through a specific object. \n \n\\begin{algorithm}\n\\caption{SVD Reconstruction algorithm}\\label{SVD algorithm}\n\\begin{algorithmic}[1]\n\\Procedure{SVD}{$f$}\\Comment{The phantom distribution $f$}\n \\State $\\mathbf{b}\\approx \\mathcal Wf|_{S^{1}\\times[0,\\infty)}$ \\Comment{Solving \\eqref{eq:pat_fem} with the initial condition $f$}\n \\State Find eigenfunctions $\\phi_{k}$ and eigenvalue $\\mu_k$ \\Comment{Solving \\eqref{eq:helmholtz_fem}}\n \\While{$k \\leq N$}\\Comment{$N$ is the number of eigenfunctions}\n \\State Compute $\\Psi_{k} \\approx \\mathcal W_{}\\phi_{k}|_{S^{1}\\times[0,\\infty)}$ \\Comment{Solving \\eqref{eq:pat_fem} with the initial condition $\\psi_k$}\n \\State $A(:,k) \\gets \\Psi_{k}$\n \\EndWhile\n \\State Solve $AX=\\mathbf{b}$\n \\State Reconstruction $\\tilde{f} = \\sum_{k=1}^{N} x_k \\phi_{k}$\n \\State \\textbf{return} $\\tilde{f}$ \\Comment{Reconstruction image of $f$ is $\\tilde{f}$}\n\\EndProcedure\n\\end{algorithmic}\n\\end{algorithm}\n\nAlgorithm \\ref{SVD algorithm} describes the procedure of the image reconstruction using the SVD of the wave forward operator $\\mathcal W$. \n\nTo obtain the data, we use the phantom distribution $f_1$ or $f_2$ as an initial condition, solve \\eqref{eq:pat_fem}, and then generate data for $\\mathbf{b}$ by restricting the value in the sensing region.\nTo generate the orthonormal basis $\\phi_k$ of Theorem \\ref{thm:co}, we solve the discrete problem in \\eqref{eq:helmholtz_fem}. Additionally, we solve \\eqref{eq:pat_fem} with the initial condition $\\phi_k$, which is similar to the process of generating data $\\mathbf{b}$ to obtain an orthonormal function $\\Psi_k$ defined in the sensing region. Following the creation of a matrix using the new function $\\Psi_k$ as a column vector, we solve the linear system $AX=\\mathbf{b}$. Finally, the image is numerically reconstructed as a linear combination of the solution $X$ and orthonormal basis $\\{\\phi_k\\}$. \n \nFigure \\ref{fig:recon} shows the reconstruction results of the SVD algorithm described in the previous subsection applied to two different data cases $f_1$ and $f_2$ for radial variable coefficients. The first initial distribution $f_1$ can verify the discontinuous properties of an object by matching the coefficient $c_2(|\\xx|)$, and the second $f_2$ can verify whether it is generally applicable to more diverse shapes. \nIn Figure \\ref{fig:recon}, the second column shows the reconstruction results for the continuous coefficient $c_1(|\\xx|)$, and the third column shows the reconstruction results for the discontinuous coefficients $c_2(|\\xx|)$. For the reconstruction of the phantom distributions $f_1$ and $f_2$, we used 1473 and 1533 orthonormal basis functions $\\Psi_k$, respectively. The reconstruction results indicate that the SVD algorithm yields good performance, regardless of the continuity of the coefficients. \n \n\n\\label{sec:discuss}\n\nIn this study, we determine the SVD of the wave forward operator with the Neumann boundary condition and radial variable speed.\nTo obtain the SVD, we first find an orthonormal basis $L^2_{}(B,c(|\\xx|)^{-1})$ consisting of the eigenfunctions of \n $ c(|\\xx|)\\triangle_\\xx\\phi(\\xx)+\\mu^2_k\\phi(\\xx)=0$ and $\\partial_\\nu \\phi|_{S^{n-1}}=0.$\n Subsequently, after obtaining $\\mathcal W \\phi$, we show that $\\mathcal W \\phi$ is orthogonal.\n\nThis strategy can be applied to the wave forward operator with the Dirichlet boundary condition and radial variable speed.\n\nSpecifically, the wave forward operator {\\color{red} $\\mathcal W_D$} satisfies the wave equations with $\\mathcal W_Df(\\xx ,0)=f(\\xx )$, $\\partial _t \\mathcal W_Df(\\xx ,0)=0$, and $\\mathcal W_Df(\\ttheta,t)=0.$ \nFinally, when the data can be represented as $\\partial_\\nu\\mathcal W_Df(\\ttheta,t)$, the SVD of $\\partial_\\nu\\mathcal W_D$ for $n=2$ can be obtained using the following steps.\n\\begin{enumerate}\n\\item It can be shown that there exist eigenfunctions $h_{k,l,D}(r)$ of \n\\begin{eqnarray*}\n&\\displaystyle\\frac{c(r)}r\\left(-\\frac{\\rm d}{{\\rm d}r}\\left(r\\frac{\\rm d}{{\\rm d}r}\\right)+\\frac{l^2}r\\right)h(r)=\\mu_{k,l,D}h(r) \\,\\, \\textrm{ with } \\,\\, h(1)=0\n\\end{eqnarray*}\n(with eigenvalues $\\mu_{k,l,D}$) such that they form an orthonormal basis of $L^2_{}((0,1],rc(r)^{-1})$.\n\\item For $\\phi_{k,l,D}=h_{k,l}(|\\xx|)e^{\\mathrm{i}l\\theta_\\xx}/\\sqrt{2\\pi}$, $\\mathcal W\\phi_{k,l,D}=\\frac{\\rm d}{{\\rm d}r}h_{k,l,D}(1)\\Psi_{l,\\mu_{k,l,D}}$. \n\\item For $f\\in L^2_{}(B^1,c(|\\xx|)^{-1})$, we obtain\n$$\n\\mathcal W_{D}f(\\xx,t)=\\sum_{k,l} \\langle f,\\phi_{k,l,D}\\rangle \\phi_{k,l,D}(\\xx)\\cos(\\mu_{k,l,D}t).\n$$ \n\\item $\\{\\phi_{k,l,D},\\Psi_{l,\\mu_{k,l,D}},\\frac{\\rm d}{{\\rm d}r}h_{k,l,D}(1)\\}$ is an SVD of $\\mathcal W_{D}\\cdot|_{S^{1}\\times[0,\\infty)}:L^2_{}(B^1,c(|\\xx|)^{-1})\\to H(S^1\\times[0,\\infty))$.\n\n\n\n\n\n\\end{enumerate}\nA similar procedure can be followed to obtain a similar result for $\\partial_\\nu\\mathcal W_D$ for $n=3$ steps.\n\n\n\\appendix\nIn this section, we show that there exist eigenfunctions $h_{k,l}(r)$ of \n\\begin{eqnarray}\n&\\displaystyle \\frac{\\rm d}{{\\rm d}r}\\left(r\\frac{\\rm d}{{\\rm d}r}\\right)h(r)+ \\frac {\\mu^2 r}{c(r)}h(r)-\\frac{l^2}r h(r)=0\\quad\\mbox{on}\\quad (0,1]\\label{eq:T1}\\\\\n& h(0+) \\mbox{ exists and is finite,} \\quad h(1_{})=0 \\quad[h'(1)=0].\\label{eq:IC}\n\\end{eqnarray}\n with eigenvalues $\\mu_{k,l}$ such that they form an orthonormal basis of $L^2_{}((0,1],rc(r)^{-1})$.\n\n{\\color{red}Let $\\phi(r;\\mu)$ and $\\psi(r;\\mu)$ be two solutions of \\eqref{eq:T1}, and $\\phi( 0+;\\mu)$ exist and be finite.??}\nThen, the Wronskian of $\\phi$ and $\\psi$ satisfies $W'[\\phi,\\psi](r;\\mu)=-W[\\phi,\\psi](r;\\mu)/r$. Thus, $W[\\phi,\\psi](r)=\\frac{C}r$ $C>0$.\nLet $v_0(r;\\mu)=C_0\\phi(r;\\mu)$ and \n$$v_1(r;\\mu)=\\phi(1;\\mu)\\psi(r;\\mu )-\\psi(1;\\mu)\\phi(r;\\mu )\\qquad [v_1(r;\\mu)=\\phi'(1;\\mu)\\psi(r;\\mu )-\\psi'(1;\\mu)\\phi(r;\\mu )].$$\nThen, $v_1(1;\\mu)=0 [v_1'(1;\\mu)=0]$, $v_1'(r;\\mu)=\\phi(1;\\mu)\\psi'(r;\\mu )-\\psi(1;\\mu)\\phi'(r;\\mu)$, and \n$$\n\\begin{array}{ll}\nW[v_0,v_1](r;\\mu)&\\displaystyle =v_0(r;\\mu)v_1'(r;\\mu)-v_0'(r;\\mu)v_1(r;\\mu)\\\\\n&\\displaystyle=C_0\\phi(r;\\mu)\\{\\phi(1;\\mu)\\psi'(r;\\mu )-\\psi(1;\\mu)\\phi'(r;\\mu)\\}-C_0\\phi'(r;\\mu)\\{\\phi(1;\\mu)\\psi(r;\\mu )-\\psi(1;\\mu)\\phi(r;\\mu )\\}\\\\\n&\\displaystyle =C_0\\phi(1;\\mu)\\{\\phi(r;\\mu)\\psi'(r;\\mu )-\\phi'(r;\\mu) \\psi(r;\\mu )\\} \\\\\n&= C_0\\phi(1;\\mu)W[\\phi,\\psi](r;\\mu)= CC_0\\phi(1;\\mu)r^{-1}.\n \\end{array}\n $$\n $$\n\\left[\\begin{array}{ll} W[v_0,v_1](r;\\mu)\n&\\displaystyle=C_0\\phi(r;\\mu)\\{\\phi'(1;\\mu)\\psi'(r;\\mu )-\\psi'(1;\\mu)\\phi'(r;\\mu)\\}-C_0\\phi'(r;\\mu)\\{\\phi'(1;\\mu)\\psi(r;\\mu )-\\psi'(1;\\mu)\\phi(r;\\mu )\\}\\\\\n\n&\n= CC_0\\phi'(1;\\mu)r^{-1}.\\end{array}\\right]\n$$\n{\\color{red} In addition, $v_0$ and $v_1$ are analytic.?? }\nIf $\\mu_*$ exists such that $W[v_0,v_1](r;\\mu_*)=0$, then $v_0$ and $v_1$ are constant multiples of one another. Such solutions are precisely the eigenfunctions for Eqs. \\eqref{eq:T1} and \\eqref{eq:IC}. Hence, we obtain $\\phi(1;\\mu_*)=0$ $[\\phi'(1;\\mu_*)=0]$.\nLet $\\mu_k$ be the eigenvalues of Eqs. \\eqref{eq:T1} and \\eqref{eq:IC}, with $\\mu_1\\le \\mu_2\\le \\cdots$.{\\color{red} (Uncountable? Identity theorem of analytics?)}\nThe Grren's function for \\eqref{eq:T1} is given by \n$$\n\\begin{array}{ll}\nG(r,s,\\mu)&\\displaystyle =\\frac{v_0(r_-;\\mu)v_1(r_+;\\mu)}{s W[v_0,v_1](s;\\mu)},\\qquad r_-=\\min\\{r,s\\},r_+=\\max\\{r,s\\} \\\\\n&\\displaystyle =\\frac{\\phi(r_-;\\mu )\\{\\phi(1;\\mu)\\psi(r_+;\\mu)-\\psi(1;\\mu)\\phi(r_+;\\mu)\\}}{CC_0\\phi(1;\\mu)}\\\\\n&\\displaystyle =-\\sum^\\infty_{k=1} \\operatorname{Res}_{\\zeta=\\mu_{k,l}^2}\\frac{\\psi(1;\\zeta^\\frac12)\\phi(r_-;\\zeta^\\frac12 )\\phi(r_+;\\zeta^\\frac12 )}{CC_0(\\mu^2-\\zeta)\\phi(1;\\zeta^\\frac12)}\\mbox{\\color{red}analytic? }\\\\\n&\\displaystyle =-\\sum^\\infty_{k=1}\\frac{\\psi(1;\\mu_{k,l})\\phi(r;\\mu_{k,l})\\phi(s;\\mu_{k,l})}{CC_0(\\mu^2-\\mu_{k,l}^2)\\partial_\\mu \\phi(1;\\mu_{k,l})\\color{red}\\partial_\\mu ?} \\mbox{(see \\cite[(10.15)]{folland94})},\n\\end{array}\n$$\n$$\n\\left[G(r,s,\\mu)\\displaystyle =-\\sum^\\infty_{k=1}\\frac{\\psi'(1;\\mu_{k,l})\\phi(r;\\mu_{k,l})\\phi(s;\\mu_{k,l})}{CC_0(\\mu^2-\\mu_{k,l}^2)\\partial_\\mu^2 \\phi(1;\\mu_{k,l})} \\right]\n$$\nwhere, in the first equality, we used the variation in the parameters.\nLet $h\\in C^2[0,1]$ with $h(0)=h(1)=0$. Suppose that $\\mu^2$ is not an eigenvalue and that $f(r)=\\frac{\\rm d}{{\\rm d}r}\\left(r\\frac{\\rm d}{{\\rm d}r}\\right)h(r)-\\frac{l^2}r h(r)+\\frac{\\mu^2 r}{c(r)} h(r)$. \nThen, we have \n\\begin{equation}\\label{eq:complete}\n\\begin{array}{ll}\n\\displaystyle h(r)=\\intL^1_0 G(r,s,\\mu) f(s){\\rm d}s&\\displaystyle =-\\sum^\\infty_{k=1}\\frac{\\psi(1;\\mu_{k,l})\\phi(r;\\mu_{k,l} )}{CC_0(\\mu^2-\\mu_{k,l})\\partial_\\mu \\phi(1;\\mu_{k,l})}\\intL^1_0 \\phi(s;\\mu_{k,l} )f(s){\\rm d}s.\\\\\n\\displaystyle &\\displaystyle \\left[=-\\sum^\\infty_{k=1}\\frac{\\psi'(1;\\mu_{k,l})\\phi(r;\\mu_{k,l} )}{CC_0(\\mu^2-\\mu_{k,l})\\partial_\\mu^2 \\phi(1;\\mu_{k,l})}\\intL^1_0 \\phi(s;\\mu_{k,l} )f(s){\\rm d}s.\\right]\n\\end{array}\n\\end{equation}\nBecause $\\phi(r;\\mu)$ satisfies \n$$\n\\displaystyle \\frac{\\rm d}{{\\rm d}r}\\left(r\\frac{\\rm d}{{\\rm d}r}\\right)\\phi(r;\\mu)-\\frac{l^2}r \\phi(r;\\mu)+ \\frac {\\mu^2 r}{c(r)}\\phi(r;\\mu)=\\frac {(\\mu^2-\\mu_{k,l}^2) r}{c(r)}\\phi(r;\\mu),\n$$\nintegration by parts yields\n$$\n\\intL^1_0 \\phi(s;\\mu_{k,l} )f(s){\\rm d}s=(\\mu^2-\\mu_{n,l}^2)\\intL^1_0 \\phi(s;\\mu_{k,l})h(s)\\frac{s}{c(s)}{\\rm d}s.\n$$\nThus, \\eqref{eq:complete} can be written as\n$$\n\\begin{array}{ll}\n\\displaystyle h(r)=\\sum^\\infty_{k=1}c_k \\phi(r;\\mu_{k,l})\\quad\\mbox{where}\\quad &\\displaystyle c_k=\\frac{2\\psi(1;\\mu_{k,l})}{CC_0\\partial_\\mu \\phi(1;\\mu_{k,l})}\\intL^1_0 \\phi(s;\\mu_{k,l})h(s)\\frac{s}{c(s)}{\\rm d}s.\\\\\n&\\displaystyle \\left[c_k=\\frac{\\psi'(1;\\mu_{k,l})}{CC_0\\partial_\\mu^2 \\phi(1;\\mu_{k,l})}\\intL^1_0 \\phi(s;\\mu_{k,l})h(s)\\frac{s}{c(s)}{\\rm d}s.\\right]\n\\end{array}\n$$\nA routine limiting argument then shows that this expansion is valid for an arbitrary $h\\in L^2_{}((0,1],rc(r)^{-1})$. {\\color{red}Orthonormality....}\n\\end{comment}", "images": []}
|
|
|
|
interleaved/b5daf6d0-df91-4fac-9835-4deefa717737.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "The Teichm\\\"{u}ller space $\\CT(S)$ of a surface $S$ of finite topological type, with no boundary and of negative Euler characteristic $\\chi(S)$ is the space of isotopy classes of (complete and finite volume) Riemannian metrics on $S$ of constant curvature $-1$. Teichm\\\"{u}ller space is not compact but Thurston showed in \\cite{Thu} how it can be compactified by the space $\\BP_+\\CM\\CL(S)$ of projective measured laminations on $S$. The starting point of Thurston's compactification is the embedding of $\\CT(S)$ into the space $\\BP_+(\\BR_+^{\\FC(S)})=\\Lfaktor{\\BR_+^{\\FC(S)}}{\\BR_+}$:\n\t\\begin{center}\n\t\t$\\begin{array}{ccccccccccc}\n\t\t\\ell & : & \\mathcal{T}(S) & \\to & \\BP_+(\\BR_+^{\\FC(S)}) \\\\\n\t\t& & X & \\mapsto & \\BR_+\\ell_X(\\cdot); \\\\\n\t\t\\end{array}$\n\t\\end{center}\n\tHere $\\ell_X$ is the length function associated to the hyperbolic structure $X$ on $S$ and $\\FC(S)$ is the set of free homotopy classes of essential closed curves of $S$. What Thurston did is to prove that the image of $\\ell$ is locally compact and to identify the boundary of $\\CT(S)$ in $\\BP_+(\\BR_+^{\\FC(S)})$ with $\\BP_+\\CM\\CL(S)$.\n\t\n\t\\begin{theo*}[Thurston's compactification]\n\t\tIf $S$ is a finite analytic type surface with negative Euler characteristic, then the accumulation points of $\\CT(S)$ in $\\BP_+(\\BR_+^{\\FC(S)})$ are the projective classes of functions $\\gamma\\mapsto i(\\lambda,\\gamma)$ where $\\lambda\\in\\CM\\CL(S)$ is a measured lamination on $S$.\n\t\\end{theo*}\n\tThurston's original proof is explained in \\cite{ast}. Some versions using real-trees are given by Morgan\u2013Shalen \\cite{MS}, Bestvina \\cite{Bestvina} or Paulin \\cite{Paulin2}. An overview of the different compactification methods is availlable in \\cite{Paulin} or \\cite{Ohshika}. A compactification for the set of flat-structures and using geodesic currents is done in \\cite{DLR}, note that this article is interested in both compact and non-compact surfaces. Here, we will be mostly interested in a very elegant argument, for closed surfaces, due to Bonahon \\cite{Bon88}. Let's sketch the proof. Recall that geodesic currents are $\\pi_1(S)$-invariant Radon measures on the set of bi-infinite geodesics of the universal cover of $S$. Bonahon embeds $\\CT(S)$ into the space $\\CC(S)$ of geodesic currents of $S$, sending each element $X\\in\\CT(S)$ of the Teichm\\\"{u}ller space to the associated Liouville current $L_X\\in\\CC(S)$. The Liouville current satisfies two important properties:\n\t\\begin{align}\n\t& i(L_X,\\gamma)=\\ell_X(\\gamma) \\quad \\text{for every essential closed curve } \\gamma\\text{, and} \\label{Prop1} \\\\\n\t& i(L_X,L_X)=\\pi^2|\\chi(S)|.\\label{Prop2}\n\t\\end{align}\n\tHere, $i : \\CC(S)\\times\\CC(S)\\to\\BR_ +$ is the intersection form, a continuous bilinear map extending the usual geometric intersection number between curves. Compactness of $S$ implies compactness of the space $\\BP_+\\CC(S)$ of projective currents. It follows that each sequence $(X_n)_{n\\in\\BN}$ in Teichm\\\"{u}ller space admits a subsequence, say the whole sequence, which projectively converges to a non-zero current $\\mu$, meaning that there are positive reals $\\varepsilon_n$ such that $\\lim\\limits_{n\\to\\infty}\\varepsilon_nL_{X_n}=\\mu$. The continuity of $i$ and property (\\ref{Prop1}) ensure that the length functions $\\ell_{X_n}(\\cdot)$ converge projectively to $i(\\mu,\\cdot)$. Moreover, $\\varepsilon_n$ tends to zero unless $X_n$ converges in $\\CT(S)$. Knowing that $\\varepsilon_n\\xrightarrow[n\\to\\infty]{}0$, property (\\ref{Prop2}) ensures that $i(\\mu,\\mu)=0$, meaning that $\\mu$ is a measured lamination, as we needed to prove. \n\t\n\t\n\tWe stress that Bonahon's argument, with all its simplicity, only applies to closed surfaces. We will come back later to this specificity and to the obstructions to a direct extension of his argument. Recently, Bonahon and \\v{S}ari\\'{c} have given another proof of this theorem using geodesic currents. The arguments in \\cite{Bon21} are geared to infinite type surfaces, it is worth noticing that working in such a general context implies the lost of the simplicity of Bonahon's original proof.\n\t\n\tOur goal here is to adapt Bonahon's original argument to be able to deal with non-compact surfaces of finite analytic type .\n\t\n\t\n\tLet's look at the difficulties that prevent the extension of Bonahon's proof to the non-compact case. The intersection form, especially its continuity, is the linchpin of Bonahon's original proof. However, continuity fails when the surface is not compact, even if it has finite area (see \\cite{DS} or Example \\ref{NCI} below). We will therefore change our point of view to allow us to benefit from the continuity of $i$. We will consider currents on $\\Sigma$ instead of $S$, where $\\Sigma$ is a compact hyperbolic surface with geodesic boundary whose interior is homeomorphic to $S$, that is $S=\\Sigma\\setminus\\partial\\Sigma$. The second key ingredient of Bonahon's proof is the existence of the Liouville current but, as we will see, when working with currents on $\\Sigma$ we lose the Liouville current.\n\t\n\t\\begin{named}{Proposition~\\ref{NLC}} Let $\\Sigma$ be a compact hyperbolic surface with non-empty boundary and $X$ a hyperbolic structure on $S=\\Sigma\\setminus \\partial\\Sigma$. There is no current $L_X$ on $\\Sigma$ which satisfies $i(L_X,\\gamma)=\\ell_X(\\gamma)$ for every essential closed curve $\\gamma\\in\\FC(\\Sigma)$.\n\t\\end{named}\n\tIn order to recover a version of properties (\\ref{Prop1}) and (\\ref{Prop2}), we will, for every hyperbolic structure $X$ on $S$, replace the Liouville current $L_X$ by specific sequences of random geodesics $(\\gamma^{(X)}_n)_{n\\in\\BN}$, that is sequences of essential closed geodesics whose associated probability measures in $T^1X$ converge to the Liouville measure with respect to the weak-$*$ topology. They will be chosen to satisfy (\\ref{Prop1}) and (\\ref{Prop2}) asymptotically, that is:\n\t\n\t\\begin{align}\n\t&\\lim\\limits_{n\\to\\infty} i\\left(\\frac{\\gamma_n}{\\ell_X(\\gamma_n)},\\gamma\\right)=\\frac{\\ell_X(\\gamma)}{\\pi^2|\\chi(S)|} \\quad \\text{for all essential closed curve }\\gamma,\\label{Prop1Bis}\n\t\\end{align}\n\t\n\t\\begin{align}\n\t&\\lim\\limits_{n\\to\\infty} i\\left(\\frac{\\gamma_n}{\\ell_X(\\gamma_n)},\\frac{\\gamma_n}{\\ell_X(\\gamma_n)}\\right)=\\frac{1}{\\pi^2|\\chi(S)|}. \\label{Prop2Bis}\n\t\\end{align}\n\tAs discussed in \\cite{ES}, any sequence of random geodesics $(\\gamma_n)_{n\\in\\BN}$ satisfies (\\ref{Prop1Bis}). Moreover, if the surface is compact then (\\ref{Prop2Bis}) is ensured for every sequence of random geodesics. However, for a non-compact surface, arbitrary sequences of random geodesics do not necessarily satisfy (\\ref{Prop2Bis}), see Example \\ref{DivSelfInt} below. Indeed, a large part of this article will be dedicated to building sequences of random geodesics satisfying this property for non-compact surfaces. \n\t\n\t\\begin{theo}\\label{Resultat}\n\t\tFor every complete and finite area hyperbolic structure $X$ on a finite analytic type surface $S$ of negative Euler characteristic $\\chi(S)$, there is a sequence $(\\Curve{X}{N})_{n\\in\\BN}$ of random geodesics such that:\n\t\t\\begin{align*}\n\t\t\\lim\\limits_{n\\to\\infty} i\\left(\\frac{\\Curve{X}{N}}{\\ell_X(\\Curve{X}{N})},\\frac{\\Curve{X}{N}}{\\ell_X(\\Curve{X}{N})}\\right)=\\frac{1}{\\pi^2|\\chi(S)|}.\n\t\t\\end{align*}\n\t\\end{theo}\n\t\\noindent Theorem~\\ref{Resultat} is actually part of a more technical result, Theorem~\\ref{Lemma}, that we will prove in section 3. The main additional content of Theorem~\\ref{Lemma} is to ensure that the convergence rates in (\\ref{Prop1Bis}) and (\\ref{Prop2Bis}) hold with no dependance on the structure $X$. This uniformity will be important to achieve the proof of Thurston's compactification in section 4. Moreover, the proof of the theorem also ensures that we can control the behavior of sequences of random geodesics into some cusp's neighborhoods. \n\t\n\t\n\t\n\t\\subsection*{Acknowledgements} \n\tI am grateful to Juan Souto for our discussions and for all his suggestions about this paper. I would like to thank people who gave me their feedbacks on the first version of this document, especially Francis Bonahon, Fr\u00e9d\u00e9ric Paulin, Beatrice Pozzetti, and Dylan Thurston for their remarks and Didac Martinez-Granado and Arya Vadnere for their questions.\n\tI also want to thank the PhD students of the IRMAR in ergodic theory for our debates on hyperbolic geometry, Barbara Schapira for her help with computations and Jing Tao for the time she gave me, her advices and her comments on my work. Finally, I thank the anonymous referee for their careful review of this document.\nIn this section, we give some technical results and definitions. We refer the reder to \\cite{Bon88}, \\cite{Bon86} and \\cite{ES} for details. From now on, let $S$ be a non-compact surface of finite analytic type, with negative Euler characteristic $\\chi=\\chi(S)<0$. We denote by $X,X',X_n...$ points in the Teichm\\\"{u}ller space of $S$, or maybe just the underlying complete and finite area hyperbolic structure. Note that, although not specified, all the hyperbolic structures are complete and finite area. Moreover, we will write $Z$ to refer indifferently to any finite area hyperbolic surface, possibly with punctures or with geodesic boundaries.\n\tIf $S$ is endowed with a hyperbolic structure $X$ then every free homotopy class of essential closed curves contains a unique geodesic representative, so we identify a class with its geodesic representative when the hyperbolic structure is fixed. We will denote by $\\FC(S)$ the set of free homotopy classes of essential closed curves -by essential we mean non-null-homotopic and non-peripheral- or equivalently the set of essential closed geodesics. Let also $\\Sigma$ be a compact hyperbolic surface with geodesic boundary whose interior is homeomorphic to $S$. We fix a homeomorphism between $S$ and $\\Sigma\\setminus\\partial\\Sigma$. This homeomorphism immediately induces a correspondance between the essential closed curves of $S$ and the ones of $\\Sigma$, that is \n\t\\begin{equation} \\label{CorresCurves}\n\t\\FC(S)=\\FC(\\Sigma).\n\t\\end{equation}\n\t\\p \n\tThe homeomorphism $S=\\Sigma\\setminus\\partial\\Sigma$ also gives an identification between measured laminations of $S$ and the ones of $\\Sigma$ supported by $\\Sigma\\setminus\\partial\\Sigma$:\n\t\\begin{equation}\\label{CorresLam}\n\t\\CM\\CL(S)=\\{\\lambda\\in\\CM\\CL(\\Sigma)|\\lambda \\text{ supported by } \\Sigma\\setminus\\partial\\Sigma\\}.\n\t\\end{equation}\n\tThe identifications (\\ref{CorresCurves}) and (\\ref{CorresLam}) will allow us to work on $\\Sigma$ rather than on $S$.\n\t\n\t\n\t\\subsection{Currents on surfaces}\n\tWe recall now a few properties of currents that we will need in the following. A \\textit{geodesic current} on $Z$ is a $\\pi_1(Z)$-invariant Radon measure on the set of bi-infinite geodesics on the universal cover $\\Tilde{Z}$ of $Z$ (even if the surface has non-empty boundary). The space $\\CC(Z)$ of geodesic currents on $Z$ was introduced by Bonahon in \\cite{Bon86} and is endowed with the weak-$*$ topology. For more information on currents we refer to \\cite{Bon86}, \\cite{Bon88}, \\cite{AL} and, \\cite[Chap. 3]{ES}.\n\t\n\tThe currents we will be mainly interested in are weighted multicurves and measured laminations and we will always consider currents on the compact surface $\\Sigma$. An advantage of doing so is that when $Z$ is compact, the topological space $\\CC(Z)$ is locally compact, and the associated projective space $\\BP_+\\CC(Z)=\\Lfaktor{\\CC(Z)\\setminus \\{0\\}}{\\BR_+}$ is compact. Moreover, in the compact case, the geometric intersection number between curves extends to a continuous bilinear map $i : \\CC(Z)\\times\\CC(Z)\\to\\BR_+$. It will be important later on to know that this form gives us a characterisation of the measured laminations as being the currents $\\mu\\in \\CC(Z)$ satisfying $i(\\mu,\\mu)=0$. We can also notice that the boundary curves are characterised by a zero intersection form with every current. As mentioned earlier, the reason why we want to work with the currents on the compact surface $\\Sigma$, rather than with the currents on $S$, is that the continuity of the intersection number fails in the latter case.\n\t\n\t\\begin{exam}[Discontinuity of the intersection form in the non-compact case]\\label{ObstructionPreuve}\n\t\t\\label{NCI}\n\t\t\n\t\tTake a hyperbolic surface with at least two cusps, fix an embedded horocycle around each of them, and a simple geodesic arc between those curves which meet them orthogonally. Note that this arc is part of a cusps-to-cusps geodesic arc $\\gamma$. Consider a sequence of closed curves $(\\gamma_n)_{n\\in\\BN}$, where $\\gamma_n$ is the geodesic homotopic to the closed curve which runs the geodesic arc mentioned above, turns $n$ times around the first cusp following the fixed horocycle, goes back along the geodesic arc and turns $n$ times around the second cusp as in \\cref{obstruction}. The self-intersection number of such a sequence is going to grow without bound. On the other hand, it approaches the weight 2 current associated to $\\gamma$ which has $0$ self-intersection number. \n\t\\end{exam}\t\n\t\n\tSee \\cite[Prop. 5.1]{DS} for a more detailed discussion on that obstruction to a continuous extension of the intersection number on the space of currents for non-compact surfaces.\n\t\n\tExample~\\ref{ObstructionPreuve} shows that there is no continuous extension of the intersection number for currents on $S$ --- it is the reason why we chose to work with currents on the compact surface $\\Sigma$ instead of the currents on $S=\\Sigma\\setminus\\partial\\Sigma$. This solves the problem of continuity of $i(\\cdot,\\cdot)$ but raises a new problem: we won't be able to consider the Liouville current anymore.\n\t\n\t\\begin{prop}\\label{NLC}\n\t\tLet $\\Sigma$ be a compact hyperbolic surface with non-empty boundary and $X$ a hyperbolic structure on $S=\\Sigma\\setminus \\partial\\Sigma$. There is no current $L_X$ on $\\Sigma$ which satisfies $i(L_X,\\gamma)=\\ell_X(\\gamma)$ for every essential closed curve $\\gamma\\in\\FC(\\Sigma)$.\n\t\\end{prop}\n\t\n\t\\begin{proof}\n\t\tIf $\\gamma$ is a closed geodesic and $\\mu$ a weighted multicurve of $\\Sigma$ then \n\t\t\\begin{equation} \\label{intersectionCourbes}\n\t\ti(\\gamma,\\mu)=\\min\n\t\t\\left\\{ \\sharp(\\gamma'\\cap\\mu) ,\n\t\t\\begin{split}\n\t\t\\gamma'\\text{ piecewise geodesic homotopic to } \\gamma \\\\ \\text{ in }\\mu\\text{-general position}\n\t\t\\end{split}\n\t\t\\right\\},\n\t\t\\end{equation}\n\t\twhere a piecewise geodesic homotopic to $\\gamma$ is in $\\mu$-general position if the set of geodesics passing through the corners has vanishing $\\mu$ measure.\n\t\t\n\t\t\n\t\t\n\t\tNow, consider $b_1$ and $b_2$ two boundary components of $\\Sigma$, maybe the same, and $\\gamma$ a non-trivial geodesic arc joining them. For every $k$, we define $\\gamma_k$ as the unique closed geodesic homotopic to the piecewise geodesic which follows $\\gamma$, turns $k$ times around $b_1$, follows back $\\gamma$ and turns $k$ times around $b_2$. We obtain from \\cref{intersectionCourbes} that for any weighted multicurve $\\mu$, $$i(\\gamma_k,\\mu)\\leq k\\sharp(b_1\\cap\\mu)+k\\sharp(b_2\\cap\\mu)+2\\sharp(\\gamma\\cap\\mu)=2\\sharp(\\gamma\\cap\\mu).$$\n\t\t\n\t\t\n\t\tWe want to extend the previous inequality for $\\mu$ a current, to do so we need a well-defined notion of intersection with $\\gamma$. For this purpose we can embed $\\Sigma$ into the closed doubled surface $D\\Sigma$, for more details about how to pass from $\\Sigma$ to $ D\\Sigma$ the reader can refer to \\cite{Tri}. Hence, $\\CC(\\Sigma)$ is a subset of $\\CC(D\\Sigma)$, the double $\\hat{\\gamma}$ of $\\gamma$ is a curve and in $\\CC(D\\Sigma)$ we have \n\t\t\\begin{align} \\label{majoration}\n\t\ti(\\gamma_k,\\mu)\\leq 2i(\\hat{\\gamma},\\mu), \n\t\t\\end{align}\n\t\tfor any $\\mu$ weighted multicurve of $\\Sigma$. Moreover, the weighted multicurves are dense in $\\CC(\\Sigma)$ and the intersection number is continuous in $\\CC(D\\Sigma)$ so \\cref{majoration} induces that\n\t\t\\begin{equation}\\label{intersectionCourant}\n\t\t\\forall \\nu\\in\\CC(\\Sigma),\\quad i(\\gamma_k,\\nu)\\leq2i(\\hat{\\gamma},\\nu)<\\infty.\n\t\t\\end{equation}\n\t\t\n\t\t\n\t\tHowever,$\\lim\\limits_{k\\to\\infty} \\ell_X(\\gamma_k)=\\infty$ for any hyperbolic structure $X$ on $S$, so \\cref{intersectionCourant} prevents any intersection with a fixed current to produce the length.\n\t\\end{proof}\n\t\n\t\n\t\\subsection{Cusps neighborhoods and intersection number}\n\t\n\tEverything in the next section relies on a good understanding of the behaviour of geodesics in cusps.\n\tMore precisely, if $X$ is a hyperbolic structure on $S$ then we denote by $H^i_{k}$ the embedded horosphere of length $1/k$ around the i-th cusp. The horosphere $H^i_{k}$ bounds the horoball $B^i_{k}$ of area $1/k$. We will refer to $H^i_{k}$ and $B^i_{k}$ as the horosphere and horoball of depth $k$. We also set $X^k$ the compact core of $X$ bounded by the horospheres $H^i_{k}$ and $\\CB^k$ its complement:\n\t\\begin{align}\\label{defXB}\n\tX^k=X\\setminus\\bigcup\\limits_{i}B^i_{k}, \\quad \\quad \\CB^k=\\bigcup\\limits_{i}B^i_{k}.\n\t\\end{align}\n\tThere is a direct link between the number of times a curve turns around a cusp and the depth it reaches \\cite[Prop. 3.4]{BPT}. It follows that every curve that goes deep into a cusp has a large self-intersection number. To make this link more clear we recall a notion introduced in \\cite[Def. 2.6]{ES}: the \\textit{peripheral self-intersection number}.\n\t\n\t\\begin{defi} \\label{SelfIntNumb}\n\t\tLet $Z$ be a hyperbolic surface (compact or not) and recall that a peripheral subgroup of $\\pi_1(Z)$ is nothing other than a cyclic subgroup generated by a non-essential closed curve. The \\textbf{peripheral self-intersection number} $i_{per}(\\gamma,\\gamma)$ of $\\gamma\\in\\FC(Z)$ is the supremum over all maximal peripheral subgroups $G \\subset \\pi_1(Z)$ of the maximal number of times that the image of a lift $\\tilde{\\gamma}$ of $\\gamma$ under $\\tilde{Z}\\to\\Lfaktor{\\Tilde{Z}}{G}$ meets itself transversely.\n\t\\end{defi}\n\t\n\tThe peripheral self-intersection number is a topological invariant. It is thus independent of the metric on $S$, or more specifically, whether one considers the curves on $S$ or on $\\Sigma$. Moreover, for every compact subset $K$ of $Z\\setminus \\partial Z$ there is a upper bound for the peripheral self-intersection number of the closed geodesics contained in $K$. Conversely, for every $N>0$ there is a compact subset $K_N$ of $Z\\setminus \\partial Z$ that contains all the geodesics $\\gamma$ with $i_{per}(\\gamma,\\gamma)\\leq N$ \\cite[Lem. 2.7]{ES}. In the absence of boundary, one can easily quantify this property.\n\t\n\t\n\t\\begin{lemm}\\label{ipbound}\n\t\tLet $X$ be a non-compact finite topological type surface with no boundary, and $\\gamma$ be an essential closed curve on $X$, this curve has support on $X^k$ if and only if $i_{per}(\\gamma,\\gamma)\\leq 4k$.\n\t\\end{lemm} \n\t\n\t\\begin{proof}\n\t\tIf we think of the curves of $\\pi_1(X)$ as deck transformations then a peripheral subgroup of $\\pi_1(X)$ is a subgroup generated by a parabolic element. Let's study a given cusp $C_i$, we can assume that the correspondence between $\\tilde{X}$ and $\\BH^2$ is such that an associated maximal parabolic element is $z\\mapsto z+1$. In that case, $H_k^i$ lifts to the horizontal line $\\{\\Im(z)=k\\}$ and if $\\gamma$ is a closed geodesic of $X$ then the number of times that the image of a lift $\\tilde{\\gamma}$ under $\\tilde{X}\\to\\Lfaktor{\\Tilde{X}}{<z\\mapsto z+1>}$ meets itself transversely is $\\sharp\\{n\\in\\BZ\\setminus\\{0\\} | \\tilde{\\gamma}\\cap(\\tilde{\\gamma}+n)\\neq\\emptyset\\}$. However, $\\gamma$ stays in $X^k$ around $C_i$, if and only if its lifts stay below the line $\\{\\Im(z)=k\\}$, if and only if its lifts are half circles of radius at most $k$. Such a geodesic of $\\BH^2$ meets at most $4k$ translations of itself ($n=\\pm 1,\\pm 2 ... \\pm 2k$). The same process applies for every cusps and then to every maximal parabolic subgroup and we obtain the lemma.\n\t\\end{proof}\nIn this section we prove that for all non-compact hyperbolic surfaces of finite volume with no boundary there are sequences of random geodesics satisfying (\\ref{Prop2Bis}). However, we will first see with Example~\\ref{DivSelfInt} that in the non-compact case not all the sequences of random geodesics have this property.\n\t\n\t\n\t\\subsection{Sequences of random geodesics}\n\tAs we saw in Proposition~\\ref{NLC}, the Liouville current does not exist anymore in our setting. However, for every (complete and finite area) hyperbolic structure $X$ on $S$ the Liouville measure on $T^1X$ still exists. Recall that the \\textit{Liouville measure} $\\CL_X$ is the measure on the unit tangent bundle $T^1X$, obtained by pushing forward the Haar measure on $\\PSL_2(\\BR)$ and normalized so that $\\CL_X(T^1X)=2\\pi \\vol_X(S)=4\\pi^2|\\chi(S)|$. We are going to consider geodesics approximating the Liouville measure in the following sense.\n\t\n\t\\begin{defi}A sequence $(\\gamma_n)_{n\\in\\BN}$ of essential closed geodesics on $X$ is a \\textbf{sequence of random} \\textbf{geodesics} if the associated probability measures converge to $\\CL_X$ with respect to the weak-$*$ topology, meaning that:\n\t\t\\begin{equation*}\\medint\\int_{T^1X}f\\dfrac{d\\gamma_n}{\\ell_X(\\gamma_n)} \\underset{n\\to+\\infty}{\\longrightarrow} \\medint\\int_{T^1X}f\\dfrac{d\\CL_X}{4\\pi^2|\\chi(S)|},\n\t\t\\end{equation*}\n\t\tfor every $f \\in C_c^0(T^1X)$ continuous and compactly supported function on $T^1X$. \\end{defi} \n\t\n\t\\begin{rema}\n\t\tWe will generally use the notation $\\hat{\\gamma}$ for the renormalisation $\\dfrac{\\gamma}{\\ell_X(\\gamma)}$.\n\t\\end{rema}\n\t\n\tThe Birkhoff ergodic theorem, together with the ergodicity of the geodesic flow, implies the existence of such sequences of geodesics. We refer to \\cite[Chap. 2]{ES} for some facts about sequences of random geodesics that we will use here.\n\t\n\tThe construction of the Liouville measure ensures that for a compact subsurface $K$ of $X$ we have $\\CL_X(T^1K)=2\\pi\\vol_X(K)$. Then, if the boundary of $K$ is smooth, the Portmanteau Theorem implies that for every sequence of random geodesics $(\\gamma_n)_{n\\in\\BN}$ we have\n\t\\begin{equation*}\n\t\\dfrac{\\ell_X(\\gamma_n\\cap K)}{\\ell_X(\\gamma_n)} \\underset{n\\to+\\infty}{\\longrightarrow} \\dfrac{\\vol_X(K)}{2\\pi|\\chi(S)|}.\n\t\\end{equation*}\n\tApplying this property to our compact core $X^k$ we have\n\t\\begin{equation} \\label{areaComp}\n\t\\dfrac{\\ell_X(\\gamma_n\\cap X^k)}{\\ell_X(\\gamma_n)} \\underset{n\\to+\\infty}{\\longrightarrow} \\dfrac{\\vol_X(X^k)}{2\\pi|\\chi(S)|},\n\t\\end{equation}\n\tand hence, \n\t\\begin{equation} \\label{areaHoro}\n\t\\dfrac{\\ell_X(\\gamma_n\\cap \\CB^k)}{\\ell_X(\\gamma_n)} \\underset{n\\to+\\infty}{\\longrightarrow} \\dfrac{\\vol_X(\\CB^k)}{2\\pi|\\chi(S)|}.\n\t\\end{equation}\n\t\n\t\n\tWhat is much more surprising is that sequences of random geodesics can also be used to compute lengths. More concretely, we have\n\t\\begin{equation}\n\t\\label{lengthSeg} \\dfrac{i(\\gamma_n,I)}{\\ell_X(\\gamma_n)} \\underset{n\\to+\\infty}{\\longrightarrow} \\frac{\\ell_X(I)}{\\pi^2|\\chi(S)|},\\\\\n\t\\end{equation}\n\tfor every compact geodesic segment $I$ in $X$. This property is basically due to Bonahon \\cite[Prop. 14]{Bon88}, we also refer the reader to \\cite[Prop. 2.4]{ES} for details. A direct consequence of (\\ref{lengthSeg}) is that we can use random geodesics $(\\gamma_n)_{n\\in\\BN}$ to compute the length of any essential geodesic $\\gamma\\in\\FC(S)$:\n\t\\begin{equation}\\label{length}\n\t\\dfrac{i(\\gamma_n,\\gamma)}{\\ell_X(\\gamma_n)} \\underset{n\\to+\\infty}{\\longrightarrow} \\frac{\\ell_X(\\gamma)}{\\pi^2|\\chi(S)|}.\n\t\\end{equation}\n\tNote that in this equation the curve $\\gamma$ is fixed. Meaning that a priori, equation \\eqref{length} does not say anything about $i(\\gamma_n,\\gamma_n)$. However, for compact sets (\\ref{lengthSeg}) holds uniformly. As a consequence, cutting the geodesics $\\gamma_n$ into geodesic segments we have \n\t\\begin{equation}\n\t\\label{compact}\n\ti\\left(\\dfrac{\\gamma_n}{\\ell_X(\\gamma_n)},\\dfrac{\\gamma_{n|K}}{\\ell_X(\\gamma_{n|K})}\\right) \\underset{n\\to+\\infty}{\\longrightarrow} \\dfrac{1}{\\pi^2|\\chi|}.\n\t\\end{equation}\n\tfor $K$ any fixed compact subsurface of $X$. \n\t\n\tAll those considerations about sequences of random geodesics apply to compact surfaces, hence, if $S$ were compact, applying (\\ref{compact}) to $K=S$, then we would immediatly have that every sequence of random geodesics satisfies (\\ref{Prop2Bis}). However, that is not necessarily true in general.\n\t\\begin{exam}\\label{DivSelfInt}\n\t\tFirst, note that an excursion of length $\\ell$ into some $\\CB_i^k$ has between $ke^{\\ell/2}-2$ and $4ke^{\\ell/2}$ self-intersections. Consider now a sequence of random geodesics $(\\gamma_n)_{n\\in\\BN}$. Add to $\\gamma_n$ an excursion of length $6\\log(\\ell_X(\\gamma_n))$ at depth $k_n\\xrightarrow[n\\to\\infty]{}\\infty$ and pull it tight into a new geodesic $\\gamma_n'$. If we add the excursions in a well-chosen way (for example, gluing it at the deepest point of an excursion) then the $(\\gamma_n')_{n\\in\\BN}$ are still random geodesics and \n\t\t\\[ \\frac{i(\\gamma'_n,\\gamma'_n)}{\\ell_X(\\gamma'_n)^2} \\approx \\frac{i(\\gamma_n,\\gamma_n)+k_n\\ell_X(\\gamma_n)^3}{(\\ell_X(\\gamma_n)+6\\log(\\ell_X(\\gamma_n)))^2}\\underset{ +\\infty}{\\sim} \\frac{i(\\gamma_n,\\gamma_n)}{\\ell_X(\\gamma_n)^2}+k_n\\ell_X(\\gamma_n)\\xrightarrow[n\\to\\infty]{}\\infty.\\]\n\t\tOne can can also refer to the arguments in Lemma~\\ref{RandomGeod} below to prove that such sequences of random geodesics exist. \n\t\\end{exam}\n\tIn \\cite[Cor. 11.2]{Lalley2} or \\cite{Lalley1}, Lalley gives a construction of random geodesics that justifies the use of the term \"random\": if for all $n$ the geodesic $\\gamma_n$ is randomly chosen among the geodesics of length at most $n$ then $(\\gamma_n)_{n\\in\\BN}$ is a sequence of random geodesics with probability 1. Hence, we wonder which proportion of sequences of random geodesics satisfies (\\ref{Prop2Bis}). This problem might be linked to the study of the length of cusp excursions for random geodesics, see for example \\cite{Haas}, \\cite{Poll} or \\cite{Sull} and the references therein.\n\t\n\t\n\tAnyway, the above example makes clear that to obtain (\\ref{Prop2Bis}) in the non-compact case we have to control the excursions of the sequences of random geodesics into cusps neighborhoods. We will do it through the cutting process described below. \n\t\n\t\n\t\\subsection{Cutting process} Suppose that $X$ is a fixed complete and finite area hyperbolic structure for $S$. Recall that $X^t$ denotes the compact core of $X$ bounded by the horospheres of length $1/t$ around the cusps of $S$ and that $\\CB^t=X\\setminus X^t$ is its complement. Given two parameters $k\\in\\BN$ and $0<\\theta<\\pi/4$, and a curve $\\gamma$ we want to cut the excursions of $\\gamma$ in $\\CB^k$ in order to prevent $\\gamma$ from leaving $X^{k/\\sin(\\theta)}$. To do so, we will study $\\gamma$ through its lifts in the universal cover $\\tilde{X}$ of $X$. We focus here on a given cusp but we apply the same construction around each cusps of $X$. For $t\\geq 1$ we denote by $H_t$ the horosphere of depth $t$ around this cusp and $B_t$ the horoball it bounds. Since $X$ is a hyperbolic surface endowed with a complete hyperbolic metric, its universal cover identifies with $\\BH^2$, and we can suppose that the parabolic element associated to the cusp we are interested in is $z\\mapsto z+1$. With this normalization $H_t$ lifts to the horizontal line $\\{\\Im(z)=t\\}$ and we have that if a curve enters $H_t$ with some angle $\\alpha\\in\\left[0,\\pi/2\\right)$ then it reaches the horosphere $H_{k/\\sin(\\alpha)}$ (we measure the non-oriented angle with the normal to the horosphere). We want to cut $\\gamma$ in order to replace its long excursions into $B_k$ (\\textit{ie.} the ones which cross $H_{k/\\sin(\\theta)}$) by short ones (excursions staying between $H_{k/\\sin(2\\theta)}$ and $H_{k/\\sin(\\theta)}$). To make it explicit we make a description of the process on the universal cover.\n\t\n\t\n\t\n\t\n\tIf $\\gamma$ makes excursions in $B_k$ we are going to modify $\\gamma$ explaining the process on a fixed lift $\\tilde{\\gamma}$ which makes an excursion in the horoball $\\{\\Im(z)>k\\}$ bounded by $\\{\\Im(z)=k\\}$ but the same process applies to all lifts of $B_ k$. First, if $\\tilde{\\gamma}$ enters with an angle greater than $\\theta$ then we don't change it. On the other hand, if it enters with an angle smaller than $\\theta$ then we replace this arc by a geodesic arc $I$ which enters with angle between $\\theta$ and $2\\theta$ and whose exit point coincides with the exit point of a different lift $\\tilde{\\gamma}'$ of $\\gamma$ (see \\cref{Figure1}). This is always possible as long as $2k\\cotan(\\theta)-2k\\cotan(2\\theta)\\geq 1$. If we apply the same process to all the excursions of $\\gamma$ around every cusp then $\\gamma$ is replaced by a closed piecewise geodesic $\\gamma'$. \n\t\n\tNow, pulling $\\gamma'$ tight we obtain a closed geodesic $\\gamma^*$: we refer to $\\gamma^*$ as the \\textit{geodesic obtained by cutting process of parameters $k$ and $\\theta$ from $\\gamma$}. Note that if $\\theta$ is small then $\\gamma'$ and $\\gamma^*$ have basically the same length, more precisely, they can be mapped one to each other through a homotopy with small displacement and without disturbing to much the lengths. For the lengths, it is easy to see that there is some $e_\\theta\\xrightarrow[\\theta\\to 0]{}0$, idependent from $X$, such that for every $k\\geq 1$ and $\\theta$ small \n\t\\begin{equation} \\label{ratio}\n\t\\ell_X(\\gamma')\\leq(1+e_\\theta)\\ell_X(\\gamma^*).\n\t\\end{equation}\n\tHere $\\ell_X(\\gamma')$ refer to the arc length of $\\gamma'$, we will use again this abuse of notation but its meaning is clear from the context.\n\t\n\t\n\t\n\t\\subsection{Construction of controled sequences of random geodesics}\n\t\n\t\\begin{lemm} \\label{CompLength}\n\t\tThere is some $\\theta_0>0$ such that if $(\\gamma_n)_{n\\in\\BN}$ is a sequence of random geodesics on $X$ and $(\\gamma_{n}^*)_{n\\in\\BN}$ is obtained from the $\\gamma_n$ applying the cutting process of parameters $k>1$ and $\\theta>\\theta_0$ then there is $\\mu_n\\xrightarrow[n\\to\\infty]{}0$ such that \n\t\t\\begin{equation*}\n\t\t1\\leq\\dfrac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma_{n}^*)}\\leq (1+\\mu_n)\\dfrac{\\vol_X(S)}{\\vol_X(X^{k})}(1+e_\\theta),\n\t\t\\end{equation*}\n\t\tfor every $n$. Here, $e_\\theta$ is as in \\eqref{ratio}.\n\t\\end{lemm}\n\t\n\t\\begin{proof}\n\t\tWe use the same notation as in the description of the cutting process, and, as above, we denote by $\\ell_X(\\gamma_n')$ the arc length of the piecewise geodesics.\n\t\t\n\t\tWe take $\\theta_0$ small enougth such that \\eqref{ratio} occurs.\n\t\tThe $\\gamma_n$ being random geodesics, (\\ref{areaComp}) ensures that we can find a sequence $\\mu_n\\xrightarrow[n\\to\\infty]{}0$ such that \t$\\frac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma_{n|X^{k}})}=(1+\\mu_n)\\frac{\\vol_X(S)}{\\vol_X(X^{k})}$. The construction of $\\gamma_n'$ ensures that $\\gamma_{n|X^{k}}=\\gamma'_{n|X^{k}}$, thus $\\frac{\\ell_X(\\gamma_{n|X^{k}})}{\\ell_X(\\gamma_n')}\\leq 1$ and if $\\theta>\\theta_0$ then $\\frac{\\ell_X(\\gamma_n')}{\\ell_X(\\gamma^*_{n})}\\leq(1+e_\\theta)$. The upper bound follows from those three inequalities.\n\t\t\n\t\tNow, $\\gamma_n$ and $\\gamma'_n$ coincide on $X^k$ but $\\gamma'_n$ has shorter excursions than $\\gamma_n$ in $\\CB^k$, hence, $\\frac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma'_n)}\\geq 1$. The geodesic $\\gamma_{n}^*$ is the unique geodesic representative of the free homotopy class of $\\gamma'_n$ which proves that $\\frac{\\ell_X(\\gamma'_n)}{\\ell_X(\\gamma_{n}^*)}\\geq 1$ and the lower bound follows.\n\t\\end{proof}\n\t\n\t\n\t\\begin{lemm}\\label{RandomGeod}\n\t\tLet $(\\gamma_n)_{n\\in\\BN}$ be a sequence of random geodesics. If $(\\gamma_{n}^*)_{n\\in\\BN}$ is obtained from $(\\gamma_n)_{n\\in\\BN}$ applying the cutting processes of parameters $k_n\\xrightarrow[n\\to\\infty]{}\\infty$ and $\\theta_n\\xrightarrow[n\\to\\infty]{} 0$, then $(\\gamma_{n}^*)_{n\\in\\BN}$ is a sequence of random geodesics.\n\t\\end{lemm}\n\t\\begin{proof}\n\t\tIn this proof, we denote by $\\tilde{\\gamma}$ the canonical lift of a geodesic $\\gamma$ to the unit tangent bundle of $X$.\n\t\t\n\t\tLet $f\\in C^0_c(T^1X)$ be a continuous and compactly supported function on $T^1X$, there is $K$ a compact core of $X$ such that $\\supp(f)\\subset T^1K$. Since $k_n\\xrightarrow[n\\to\\infty]{} \\infty$ then there is $n_0\\in\\BN$ such that for all $n\\geq n_0, \\quad\\gamma_{n|K}=\\gamma'_{n|K}$. \n\t\tThe homotopy between $\\gamma'_n$ and $\\gamma^*_n$ induces that the arcs of $\\gamma_{n|K}$ are freely homotopic to geodesic arcs of $\\gamma^*_n$. Such a homotopy induces a projection from $\\gamma_{n|K}$ to $\\gamma^*_{n}$ and lifts to $\\Psi_n : \\tilde{\\gamma}_{n|K} \\to \\tilde{\\gamma}^*_{n}$, which is a homeomorphism on its image. The homotopy can be chosen to have low displacement, that is $d(p,\\Psi_n(p))\\leq\\varepsilon_n\\xrightarrow[n\\to\\infty]{}0$ for every $p\\in\\tilde{\\gamma}_{n|K}$, and not to distort too much the lengths. Moreover, we can find $\\varphi_n : [0,\\ell_X(\\gamma_{n|K})] \\to \\BR_+$ a piecewise smooth reparametrization of $[0,\\ell_X(\\gamma_{n|K})]$ such that for all $t\\in[0,\\ell_X(\\gamma_{n|K})]$, $\\Psi_n(\\tilde{\\gamma}_{n|K}(t))=\\tilde{\\gamma}^*_{n}(\\varphi_n(t))$. The homotopy between $\\gamma'_n$ and $\\gamma^*_n$ does not distort too much the lengths, hence, we have some $\\delta_n \\xrightarrow[n\\to\\infty]{}0$ such that $1-\\delta_n\\leq \\varphi_n'\\leq 1+\\delta_n$ where it is defined.\n\t\t\n\t\t\n\t\tFix some $\\mu>0$. A compactly supported continuous function is uniformly continuous, thus, there is $\\varepsilon_\\mu>0$ such that if $d(p,q)\\leq \\varepsilon_\\mu$ then $|f(p)-f(q)|\\leq \\mu$. We can suppose that for every $n\\geq n_0$, $\\varepsilon_n\\leq \\varepsilon_\\mu$. We have \n\t\t\\begin{equation*}\n\t\t\\int\\limits_{T^1X}fd\\gamma^*_{n}= \\int_0^{\\ell_X(\\Psi_n(\\gamma_{n|K}))}f\\circ\\tilde{\\gamma}^*_{n}(t)dt=\\int_0^{\\ell_X(\\gamma_{n|K})}f\\circ\\tilde{\\gamma}^*_{n}(\\varphi_n(s))\\varphi_n'(s)ds,\n\t\t\\end{equation*}\n\t\tit follows that\n\t\t{\\tiny\n\t\t\t\\begin{align*}\n\t\t\t&\\quad(1-\\delta_n)\\int_0^{\\ell_X(\\gamma_{n|K})}f(\\Psi_n(\\tilde{\\gamma}_{n|K}(s))ds\\leq\\int\\limits_{T^1X}fd\\gamma^*_{n}\\leq (1+\\delta_n)\\int_0^{\\ell_X(\\gamma_{n|K})}f(\\Psi_n(\\tilde{\\gamma}_{n|K}(s))ds\\\\\n\t\t\t&\\Rightarrow (1-\\delta_n)\\left( \\int\\limits_{T^1X}f d\\gamma_n -\\mu\\ell_X(\\gamma_{n|K})\\right)\\leq \\int\\limits_{T^1X}fd\\gamma^*_{n} \\leq(1+\\delta_n)\\left(\\int\\limits_{T^1X}f d\\gamma_n +\\mu\\ell_X(\\gamma_{n|K})\\right)\\\\\n\t\t\t&\\Rightarrow(1-\\delta_n)\\frac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma^*_{n})} \\left(\\int\\limits_{T^1X}f d\\hat{\\gamma}_n -\\mu \\right)\\leq \\int\\limits_{T^1X}fd\\hat{\\gamma}^*_{n} \\leq (1+\\delta_n)\\frac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma^*_{n})}\\left( \\int\\limits_{T^1X}f d\\hat{\\gamma}_n +\\mu\\right)\n\t\t\t\\end{align*}}\n\t\tAdapting the proof of Lemma~\\ref{CompLength} we have $\\dfrac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma^*_{n})}\\xrightarrow[n\\to\\infty]{}1$, and passing to the limit in $n$ we obtain \n\t\t{\\small\n\t\t\t\\begin{equation*}\n\t\t\t\\int_{T^1X}f\\dfrac{d\\CL_X}{4\\pi^2|\\chi(S)|}-\\mu\\leq\\underline{\\lim}_n\\int\\limits_{T^1X}fd\\hat{\\gamma}^*_{n}\\leq \\overline{\\lim}_n \\int\\limits_{T^1X}fd\\hat{\\gamma}^*_{n}\\leq \\int_{T^1X}f\\dfrac{d\\CL_X}{4\\pi^2|\\chi(S)|}+\\mu.\n\t\t\t\\end{equation*}}\n\t\tThis is true for all $\\mu$, hence, $\\lim\\limits_{n\\to\\infty} \\medint\\int\\limits_{T^1X}fd\\hat{\\gamma}^*_{n} =\\medint \\int_{T^1X}f\\dfrac{d\\CL_X}{4\\pi^2|\\chi(S)|}$ and we have proved that $(\\gamma^*_{n})_{n\\in\\BN}$ is a sequence of random geodesics.\n\t\\end{proof}\n\t\n\t\n\tNow, for every hyperbolic structure $X$ on $S$, we will be able to build sequences $(\\Curve{X}{n})_{n\\in\\BN}$ of random geodesics satisfying (\\ref{Prop2Bis}). Moreover, we will build them in such a way that neither the converging rates in (\\ref{Prop2Bis}) and (\\ref{length}), nor the peripheral self-intersection numbers $i_{per}(\\Curve{X}{n},\\Curve{X}{n})$ depend on $X$.\n\t\\begin{theo}\\label{Lemma}\n\t\tFor every complete and finite area hyperbolic structure $X$ on a finite analytic type surface of negative Euler characteristic $S$, there is a sequence $(\\Curve{X}{n})_{n\\in\\BN}$ of random geodesics such that :\n\t\t\\begin{align*}\n\t\t\\lim\\limits_{n\\to\\infty} i\\left(\\frac{\\Curve{X}{n}}{\\ell_X(\\Curve{X}{n})},\\frac{\\Curve{X}{n}}{\\ell_X(\\Curve{X}{n})}\\right)=\\frac{1}{\\pi^2|\\chi(S)|}.\n\t\t\\end{align*}\n\t\tMore precisely, they can be chosen such that\n\t\t\\begin{enumerate}\n\t\t\t\\item $i(\\hatCurve{X}{n},\\hatCurve{X}{n})\\leq \\dfrac{1}{\\pi^2\\left|\\chi(S)\\right|}\\left(1+\\dfrac{1}{n}\\right),\\,\\forall n \\in\\BN,$\n\t\t\t\\item $\\forall \\alpha \\in\\FC(S),\\,\\exists n_\\alpha\\in\\BN:\\,\\left|i(\\hatCurve{X}{n},\\alpha)\\left(\\frac{\\ell_X(\\alpha)}{\\pi^2|\\chi|}\\right)^{-1}-1\\right|\\leq \\dfrac{3}{n},\\, \\forall n\\geq n_\\alpha,$\n\t\t\t\\item $ i_{per}(\\Curve{X}{n},\\Curve{X}{n})\\leq C_n,\\, \\forall n \\in\\BN,$\n\t\t\\end{enumerate}\n\t\twhere $C_n$ and $n_\\alpha$ do not depend on $X$. \n\t\\end{theo}\n\t\n\t\\begin{proof}\n\t\tTo obtain the desired sequence $(\\Curve{X}{n})_{n\\in\\BN}$ we start with an arbitrary sequence of random geodesics $(\\gamma_n)_{n\\in\\BN}$. For every $p$ we set $k_p=e^{p/2}$ and $\\theta_p=e^{-p/2}$, if we apply the cutting process with parameters $k_p$ and $\\theta_p$ to the sequence $(\\gamma_n)_{n\\in \\BN}$ then we obtain a sequence $(\\Tilde{\\gamma}_n^p)_{n\\in \\BN}$ of piecewise geodesics and by pulling it tight a sequence $(\\gamma_n^p)_{n\\in\\BN}$ of geodesics. We will chose the $(\\Curve{X}{N})_{N\\in\\BN}$ among the $\\gamma_n^p$.\n\t\t\n\t\t\n\t\t\n\t\tFirst, study the self-intersection number of those $\\gamma_n^p$. As $\\gamma_n^p$ is the geodesic representative of $\\Tilde{\\gamma}_n^p$, its self-intersection number is lower than the number of self-intersections of $\\Tilde{\\gamma}_n^p$. To count it, we divide $X$ into two parts, the compact core $X^{k}$ and its complement $\\CB^{k}$. On $X^{k}$, the geodesic arcs $\\Tilde{\\gamma}_{n|X^k}^p$ and $\\gamma_{n|X^k}$ are identical so $\\Tilde{\\gamma}_n^p$ has $i(\\gamma_{n|X^k},\\gamma_n)$ self-intersections. On the complement, we count the self-intersections of $\\tilde{\\gamma}_n^p$ considering its different excursions in $\\CB^k$: \n\t\t\\begin{align*} i(\\gamma_n^p,\\gamma_n^p) & \\leq i(\\gamma_n\\cap X^k,\\gamma_n) + \\sum\\limits_{I,J \\text{ excursions in } \\CB^k} i(I,J). \n\t\t\\end{align*}\n\t\tWe can distinguish two types of pairs $(I,J)$: the ones where at least one of the excursions stays in $\\CB^k\\cap X^{k/\\sin(2\\theta)}$, and the ones where both $I$ and $J$ reach $\\CB^{k/\\sin(2\\theta)}$. In the first case, $I$ and $J$ meet at most as many times as the corresponding excursions of $\\gamma_n$ and then:\n\t\t\\begin{align*} i(\\gamma_n^p,\\gamma_n^p) & \\leq i(\\gamma_n\\cap X^{k/\\sin(2\\theta)},\\gamma_n) + \\sum\\limits_{\\substack{I,J \\text{ excursions in } \\CB^k\\\\ \\text{which reach } \\CB^{k/\\sin(2\\theta)}}} i(I,J). \n\t\t\\end{align*}\n\t\t\\p \n\t\tMoreover, an excursion of $\\tilde{\\gamma}_n^p$ in $\\CB^k$ which reaches $\\CB^{k/\\sin(2\\theta)}$ has a length of at least $\\ln(1/\\theta)$, a lower bound for the length of the geodesic arc which enters with angle $2\\theta$. It follows that there is at most $\\frac{\\ell_X(\\gamma_n\\cap \\CB^k)}{\\ln(1/\\theta)}$ such excursions. Also, the intersection number of two excursions reaching $\\CB^{k/\\sin(2\\theta)}$ is at most $4k/\\theta$, the self-intersection number of the excursion which enters with angle $\\theta$. All in all, \n\t\t\\begin{align*} i(\\gamma_n^p,\\gamma_n^p) & \\leq i(\\gamma_n\\cap X^{k/2\\theta},\\gamma_n) + \\left( \\frac{\\ell_X(\\gamma_n\\cap \\CB^k)}{\\ln(1/\\theta)} \\right)^2 \\frac{4k}{\\theta}.\n\t\t\\end{align*}\n\t\t\\p \n\t\tApplying equations (\\ref{compact}) and (\\ref{areaHoro}) we have \n\t\t\\begin{align*}\n\t\t& i(\\gamma_n\\cap X^{k/2\\theta} ,\\gamma_n)=(1+\\varepsilon_n^p)\\frac{\\ell_X(\\gamma_n)\\ell_X(\\gamma_n\\cap X^{k/2\\theta})}{\\pi^2|\\chi|} \\text{ and} \\\\\n\t\t& \\ell_X(\\gamma_n\\cap \\CB^k)=(1+\\delta_n^p)\\frac{\\ell_X(\\gamma_n)}{2\\pi|\\chi|}\\frac{C}{k} \\quad \\text{where $C$ is the number of cusps of $S$,} \n\t\t\\end{align*}\n\t\twhere $\\varepsilon_n^p\\xrightarrow[n\\to\\infty]{}0$ and $\\delta_n^p\\xrightarrow[n\\to\\infty]{}0$ depend on $X$.\n\t\tAs a consequence, \n\t\t$$ i(\\gamma_n^p,\\gamma_n^p)\\leq (1+\\varepsilon_n^p)\\frac{\\ell_X(\\gamma_n)^2}{\\pi^2|\\chi|}+\\left((1+\\delta_n^p)\\frac{C\\ell_X(\\gamma_n)}{2\\pi|\\chi|\\cdot k \\cdot \\ln(1/\\theta)}\\right)^2 \\frac{4k}{\\theta} $$\n\t\tand we obtain a upper bound for the self-intersection number of the normalized curves:\n\t\t\\begin{align}\n\t\ti\\left(\\frac{\\gamma_{n}^p}{\\ell_X(\\gamma_{n}^p)},\\frac{\\gamma_n^p}{\\ell_X(\\gamma_{n}^p)}\\right)\\leq \\frac{1}{\\pi^2|\\chi|} \\left((1+\\varepsilon_n^p)+(1+\\delta_n^p)^2\\frac{C^2}{|\\chi|}\\frac{4}{p^2} \\right)\\left(\\frac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma_n^p)}\\right)^2.\\label{autoi}\n\t\t\\end{align}\n\t\t\n\t\tWe next study the intersection number of the $\\gamma_n^p$ with closed curves. The set $\\FC(S)$ is infinite and can be enumerated with $\\FC(S)=\\{\\alpha_q|q\\in\\BN\\}$ in such a way that $i_{per}(\\alpha_q,\\alpha_q)\\leq 4q$ for every $q$. This enumeration is fixed whatever the structure $X$. Recall that for every $p$ we have $k=k_p=e^{p/2}$ and $\\theta=\\theta_p=e^{-p/2}$. Hence, when $p$ is big enough, for $q\\leq p$ the curve $\\alpha_q$ is included in $X^q\\subset X^{k}$. However in $X^{k}$ we have $i(\\gamma_n,\\cdot)=i(\\gamma_n^p,\\cdot)$ thus \n\t\t\\begin{equation}\n\t\ti(\\hat{\\gamma}_n^p,\\alpha_q)=\\dfrac{\\ell_X(\\gamma_n)}{\\ell_X(\\gamma_n^p)}i(\\hat{\\gamma}_n,\\alpha_q).\\label{EgualIntersection}\n\t\t\\end{equation}\n\t\t\n\t\t\n\t\tNow, applying Lemma~\\ref{CompLength}, for every $p$ there is $\\mu_n^p\\xrightarrow[n\\to\\infty]{}0$, depending on $X$, such that\n\t\t\n\t\t\\begin{equation}1\\leq \\frac{\\ell_X(\\gamma_{n})}{\\ell_X(\\gamma^p_{n})}\\leq\n\t\t(1+\\mu_n^p)\n\t\t\\frac{\\vol_X(S)}{\\vol_X(X^{k})}(1+e_p). \\label{RappLong} \\end{equation}\n\t\twith $e_p=e_{e^{-p/2}}$ with the notation of \\eqref{ratio}.\n\t\t\n\t\t\n\t\tTherefore, there are $m_p$ large enough such that $\\varepsilon_{m_p}^p,\\, \\delta_{m_p}^p,\\, \\mu_{m_p}^p\\leq\\frac{1}{p}$, and $\\left|\\dfrac{i(\\hat{\\gamma}_{m_p},\\alpha_q)}{\\ell_X(\\alpha_q)/\\pi^2|\\chi| }-1\\right|\\leq \\frac{1}{p}$ for every $q\\leq p$. Thus (\\ref{RappLong}) and (\\ref{autoi}) give us\n\t\t\n\t\t\\begin{align}\n\t\t& 1\\leq \\frac{\\ell_X(\\gamma_{m_p})}{\\ell_X(\\gamma^p_{m_p})}\\leq\n\t\t\\frac{\\vol_X(S)}{\\vol_X(X^{k})}\n\t\t(1+\\frac{1}{p})(1+e_p)\\xrightarrow[p\\to\\infty]{}\\, 1, \\label{conv1}\n\t\t\\end{align}\n\t\t\\begin{multline}\n\t\ti(\\hat{\\gamma}^p_{m_p},\\hat{\\gamma}^p_{m_p})\\leq \\frac{1}{\\pi^2|\\chi|}\\left(1+(1+\\frac{1}{p})\\frac{4C^2}{p^2|\\chi|} \\right)\n\t\t\\frac{\\vol_X(S)}{\\vol_X(X^{k})}\n\t\t(1+\\frac{1}{p})^2(1+e_p) \\label{conv2}\\\\\n\t\t\\xrightarrow[p \\to \\infty ]{} \\frac{1}{\\pi^2|\\chi|}.\n\t\t\\end{multline}\n\t\t\\p\n\t\tThe terms on the right in inequalities (\\ref{conv1}) and (\\ref{conv2}) do not depend on $X$ anymore so, for $N$ an integer there is $p_N$, independent from $X$ and with $p_N> p_{N-1}$, such that $1\\leq\\dfrac{\\ell_X(\\gamma_ {p_N})}{\\ell_X(\\gamma^{p_N}_{m_{p_N}})}\\leq 1+\\frac{1}{N}$ and $i(\\hat{\\gamma}^{p_N}_{m_{p_N}},\\hat{\\gamma}^{p_N}_{m_{p_N}})\\leq \\frac{1}{\\pi^2|\\chi|}(1+\\frac{1}{N})$. As a consequence, we can take $\\Curve{X}{N}=\\gamma^{p_N}_{m_{p_N}}$. \n\t\t\n\t\t\\p The previous constructions ensure that $i(\\hatCurve{X}{N},\\hatCurve{X}{N})\\leq \\frac{1}{\\pi^2|\\chi|}(1+\\frac{1}{N})$, and we have proved (1) in the statement of the theorem.\n\t\t\n\t\t\\p Applying Proposition \\ref{ipbound} we have $i_{per}(\\Curve{X}{N},\\Curve{X}{N})\\leq 4e^{p_N}$ where $p_N$ does not depend on $X$, which gives us the third point.\n\t\t\n\t\t\\p At last, (\\ref{EgualIntersection}) and the choice of $p_N$ and $m_p$ induces that \n\t\t\\begin{equation*}\n\t\t1-\\frac{3}{N}\\leq (1-\\frac{1}{N})\\leq \\frac{i(\\hatCurve{X}{N},\\alpha_q)}{\\ell_X(\\alpha_q)/\\pi^2|\\chi|}\\leq(1+\\frac{1}{N})^2\\leq 1+\\frac{3}{N},\\quad \\forall q\\leq N,\n\t\t\\end{equation*}\n\t\thence, we obtain the second point with $n_\\alpha=q$ when $\\alpha=\\alpha_q$.\n\t\t\n\t\t\n\t\tMoreover, up to passing to a subsequence, the $(\\Curve{X}{N})_{N\\in\\BN}$ are built from the sequence $(\\gamma_N)_{N\\in\\BN}$ of random geodesics through cutting processes of parameters $k_N=e^{p_N/2}\\xrightarrow[N\\to\\infty]{}\\infty$ and $\\theta_N=e^{-p_N/2}\\xrightarrow[N\\to\\infty]{}0$. As a consequence, Lemma~\\ref{RandomGeod} ensures that we have built a sequence of random geodesics. At last, for $K$ a compact subsurface of $X$ we have\n\t\t\\begin{equation*}\n\t\ti\\left(\\hatCurve{X}{n},\\frac{\\Curve{X}{n|K}}{\\ell_X(\\Curve{X}{n|K})}\\right)\\leq i(\\hatCurve{X}{n},\\hatCurve{X}{n}) \\leq \\dfrac{1}{\\pi^2|\\chi|}(1+\\frac{1}{n}), \n\t\t\\end{equation*}\n\t\tand if we pass to the limit, using (\\ref{compact}), we obtain that \n\t\t$$\\lim\\limits_{N\\to\\infty}i(\\hatCurve{X}{N},\\hatCurve{X}{N})=\\dfrac{1}{\\pi^2|\\chi|}.$$\n\t\\end{proof}\nArmed with Theorem~\\ref{Lemma}, we are now able to prove Thurston's compactification. As we already mentioned in the introduction, the starting point of this compactification is the embedding of $\\CT(S)$ and $\\BP_+\\CM\\CL(S)$ into the space $\\BP_+(\\BR_+^{\\FC(S)})$:\n\t\\begin{center} \n\t\t$\\begin{array}{ccccccccccc}\n\t\t\\ell & : & \\mathcal{T}(S) & \\to & \\BP_+(\\BR_+^{\\FC(S)}) & \\quad \\\\\n\t\t& & X & \\mapsto & \\BR_+\\ell_X(\\cdot), \\\\\n\t\t\n\t\t\\iota & : & \\BP_+\\CM\\CL(S) & \\to & \\BP_+(\\BR_+^{\\FC(S)}) \\\\\n\t\t&& \\lambda & \\mapsto & \\BR_+i(\\lambda,\\cdot). \\\\\n\t\t\\end{array}$\n\t\\end{center}\n\tThe image of $\\CT(S)$ in $\\BP_+(\\BR_+^{\\FC(S)})$ is included into a compact set (use \\cref{MajorLength} for instance), thus, the closure $\\overline{\\CT}(S)$ of $\\CT(S)$ is compact. The boundary of this set is given by the following theorem. \n\t\\begin{theo*}[Thurston's compactification]\\label{Thurston}\n\t\tIf $S$ is a finite analytic type surface with negative Euler characteristic then the accumulation points of $\\CT(S)$ in $\\BP_+(\\BR_+^{\\FC(S)})$ are the projective classes of functions $\\gamma\\mapsto i(\\lambda,\\gamma)$ where $\\lambda\\in\\CM\\CL(S)$ is a measured lamination on $S$.\n\t\\end{theo*}\n\t\n\tOur arguments apply to the compact case, but for the sake of concreteness we will focus on non-compact surfaces.\n\t\n\t\n\tLet $X_k\\in\\CT(S)$ be a sequence which converges in $\\BP_+(\\BR_+^{\\FC(S)})$ and leaves all compact sets of $\\CT(S)$, meaning that there are a non-zero element $F$ of $\\BR_+^{\\FC(S)}$ and a sequence $(\\varepsilon_k)_{k\\in\\BN}$ of positive real numbers such that $\\lim\\limits_{k\\to\\infty} \\varepsilon_k\\ell_k(\\cdot)=F$ pointwise (we have written $\\ell_k$ for $\\ell_{X_k}$). We will prove that $F$ is given by taking the intersection number with a suitable measured lamination. \n\t\n\tFix a filling curve $\\beta$ on $S$, that is a closed curve such that the connected components of $S\\setminus \\beta$ are balls and annular neighborhoods of the cusps. Such a curve gives us a bound on the length of every curve $\\gamma\\in\\FC(S)$, namely, \n\t\\begin{equation} \\label{MajorLength}\n\t\\ell_X(\\gamma)\\leq \\ell_X(\\beta) i(\\gamma,\\beta)(1+i(\\gamma,\\gamma))\n\t\\end{equation}\n\tfor every hyperbolic structure $X$ \\cite[Lem. 2.1]{STH}. Since $F=\\lim\\limits_{k\\to\\infty} \\varepsilon_k\\ell_k(\\cdot)$ is non-zero, there is $\\gamma\\in\\FC(S)$ with $F(\\gamma)\\neq 0$. We obtain from (\\ref{MajorLength}) that $0<F(\\gamma)\\leq F(\\beta)(1+i(\\gamma,\\gamma))i(\\gamma,\\beta)$ and hence that $F(\\beta)\\neq 0$. Since we are only interested in convergence in $\\BP_+(\\BR_+^{\\FC(S)})$, we can assume that $F(\\beta)=1$, meaning that\n\t\\begin{align*}\n\t\\lim\\limits_{k\\to\\infty}\\delta^k\\frac{\\ell_k(\\cdot)}{\\pi^2|\\chi|}=F,\n\t\\end{align*}\n\twhere $\\delta^k=\\frac{\\pi^2|\\chi|}{\\ell_k(\\beta)}$.\n\t\n\tWe will now prove that $F$ is of the form $i(\\mu,\\cdot)$ where $\\mu$ is a measured lamination on $S$. \n\t\n\t\n\t\n\tApplying Theorem~\\ref{Lemma} to each $X_k$, we obtain some sequences of essential closed geodesics $(\\Curve{k}{n})_{n\\in\\BN}=(\\hatCurve{X_k}{n})_{n\\in\\BN}$ with $\\lim\\limits_{n\\to\\infty}i(\\Curve{k}{n}/\\ell_k(\\Curve{k}{n}),\\cdot)=\\ell_k(\\cdot)/\\pi^2|\\chi|$. As all along, let $\\Sigma$ be a compact complete hyperbolic surface with boundary whose interior is homeomorphic to $S$ and let's identify $\\FC(S)$ with $\\FC(\\Sigma)$. In particular, we can consider the weighted curves $\\hatCurve{k}{n}=\\Curve{k}{n}/\\ell_k(\\Curve{k}{n})$ as currents of $\\Sigma$. The space $\\BP_+\\CC(\\Sigma)$ being compact each $(\\hatCurve{k}{n})_{k\\in\\BN}$ projectively converges to a non-zero current $\\mu_n\\in\\CC(\\Sigma)$. \n\t\n\t\n\tWe first want to show that the $\\mu_n$ are measured laminations. Consider the sequence $(\\hatCurve{k}{n})_{k\\in\\BN}$ for $n$ fixed, there are some $\\varepsilon^k_n>0$ such that $\\varepsilon^k_n\\hatCurve{k}{n}$ tends to $\\mu_n$ up to a subsequence in $k$. So, by diagonal extraction we can suppose that $\\varepsilon^k_n\\hatCurve{k}{n}\\xrightarrow[k\\to\\infty]{}\\mu_n$ for every $n$. What we have to show is that $\\lim\\limits_{k\\to\\infty}\\varepsilon_n^k = 0$ for every $n$. \n\t\\p \n\tThe sequence $(X_k)_{k\\in\\BN}$ leaves every compact set of $\\CT(S)$ so there is a simple closed curve $\\alpha$ such that $\\lim\\limits_{k\\to\\infty} \\ell_k(\\alpha)=\\infty$. Recall that to prove Theorem~\\ref{Lemma} we have enumerated $\\FC(S)=\\{\\alpha_n|n\\in\\BN\\}$ such that $i_{per}(\\alpha_n,\\alpha_n)\\leq 4n$, since $\\alpha$ is a simple curve we can suppose that $\\alpha=\\alpha_1$. The $\\Curve{k}{n}$ come from Theorem~\\ref{Lemma} thus $\\left| i(\\hatCurve{k}{n},\\alpha)\\left(\\frac{\\ell_k(\\alpha)}{\\pi^2|\\chi|}\\right)^{-1}-1 \\right|\\leq\\dfrac{3}{n}$ whatever $k$ and $n$. By hypothesis $\\ell_k(\\alpha)\\xrightarrow[k\\to\\infty]{}\\infty$ and we can suppose, up to a shift in $n$, that for every $n$, $\\left| i(\\hatCurve{k}{n},\\alpha)\\left(\\frac{\\ell_k(\\alpha)}{\\pi^2|\\chi|}\\right)^{-1}-1 \\right|<\\dfrac{1}{2}$. As a consequence $i(\\hatCurve{k}{n},\\alpha)\\xrightarrow[k\\to\\infty]{}\\infty$. However, $\\infty>i(\\mu_n,\\alpha)\n\t=\\lim\\limits_{k\\to\\infty}\\varepsilon^k_n i(\\hatCurve{k}{n},\\alpha)$ thus $\\varepsilon^k_n\\xrightarrow[k\\to\\infty]{}0$ for every $n$, and $i(\\hatCurve{k}{n},\\hatCurve{k}{n})$ is bounded independently from $k$ and $n$, hence, $i(\\mu_n,\\mu_n)=\\lim\\limits_{k\\to\\infty}(\\varepsilon^k_n)^2i(\\hatCurve{k}{n},\\hatCurve{k}{n})=0$ and $\\mu_n$ is a measured lamination on $\\Sigma$. By construction, $i_{per}(\\Curve{k}{n},\\Curve{k}{n})\\leq C_n$ for every $k$ and $n$, as mentioned earlier (or in \\cite[Lem. 2.7]{ES}) it ensures that for $n$ fixed the $\\Curve{k}{n}$ are all included in the same compact subsurface of $\\Sigma\\setminus\\partial\\Sigma$. It follows that $\\mu_n$ is supported on a compact set of $\\Sigma\\setminus\\partial\\Sigma$ and by (\\ref{CorresLam}) it is a measured lamination of $S$.\n\t\n\tRecall that $\\beta$ is a filling curve of $S$, as a consequence, $i(\\mu_n,\\beta)\\neq 0$ and hence, we can suppose that $i(\\mu_n,\\beta)=1$ for every $n$ and we obtain\n\t\\begin{equation}\\label{convergence}\n\t\\lim\\limits_{k\\to\\infty} \\delta^k_n \\hatCurve{k}{n} =\\mu_n \\text{ in } \\CC(\\Sigma), \n\t\\end{equation}\n\twhere $\\delta^k_n=\\frac{1}{i(\\beta,\\hatCurve{k}{n})}$ is well-defined.\n\t\n\t\n\tTo sum up, we have the following convergence diagram, where all the convergences are pointwise.\n\t\\begin{center}\n\t\t\\begin{tabular}{ccccccc}\n\t\t\t$\\delta^1\\dfrac{\\ell_{X_1}(.)}{\\pi^2|\\chi|}$ & $\\delta^2\\dfrac{\\ell_{X_2}(.)}{\\pi^2|\\chi|}$ & $\\cdots$ & $\\delta^k\\dfrac{\\ell_{X_k}(.)}{\\pi^2|\\chi|}$ & $\\cdots$ & $\\xrightarrow[\\quad]{ } $ & $F\\in\\BR_+^{\\FC(S)}$ \\\\\n\t\t\t$\\big\\uparrow$ & $\\big\\uparrow$ & $\\cdots$ & $\\big\\uparrow$ & $\\cdots$ & & $\\big\\uparrow$? \\\\\n\t\t\t$\\vdots$ & $\\vdots$ & $\\cdots$ & $\\vdots$ & $\\cdots$ & & $\\vdots$ \\\\\n\t\t\t$\\delta^1_ni(\\hatCurve{1}{n},\\cdot)$ & $\\delta^2_ni(\\hatCurve{2}{n},\\cdot)$ & $\\cdots$ & $\\delta^k_ni(\\hatCurve{k}{n},\\cdot)$ & $\\cdots$ & $\\xrightarrow{\\quad}$ & $i(\\mu_n,\\cdot)$ \\\\\n\t\t\t$\\vdots$ & $\\vdots$ & $\\cdots$ & $\\vdots$ & $\\cdots$ & $\\vdots$ & $\\vdots$ \\\\\n\t\t\t$\\delta^1_2i(\\hatCurve{1}{2},\\cdot)$ & $\\delta_2^2i(\\hatCurve{2}{2},\\cdot)$ & $\\cdots$ & $\\delta^k_2i(\\hatCurve{k}{2},\\cdot)$ & $\\cdots$ & $\\xrightarrow{\\quad}$ & $i(\\mu_2,\\cdot)$ \\\\\n\t\t\t$\\delta_1^1i(\\hatCurve{1}{1},\\cdot)$ & $\\delta^2_1i(\\hatCurve{2}{1},\\cdot)$ & $\\cdots$ & $\\delta^k_1i(\\Curve{k}{1},\\cdot)$ & $\\cdots$ & $\\xrightarrow{\\quad}$ & $i(\\mu_1,\\cdot)$ \\\\\n\t\t\\end{tabular}\n\t\\end{center}\n\tWe want $F$ to be the pointwise limit of $(i(\\mu_n,\\cdot))_{n\\in\\BN}$. To prove it, it is sufficient to show that the convergence $\\delta^k_ni(\\hatCurve{k}{n},\\gamma)\\xrightarrow[n\\to \\infty]{ }\\delta^k\\frac{\\ell_k(\\gamma)}{\\pi^2|\\chi|}$ is uniform in $k$ when $\\gamma\\in\\FC(S)$ is fixed.\n\t\n\tIf $\\gamma\\in\\FC(S)$ is fixed then Theorem~\\ref{Lemma} ensures that $\\left| \\dfrac{\\delta^k_n}{\\delta^k} -1\\right|\\leq \\epsilon_n$ and $ \\left| i(\\hatCurve{k}{n},\\gamma)\\left(\\dfrac{\\ell_k(\\gamma)}{\\pi^2|\\chi|}\\right)^{-1}-1 \\right|\\leq\\epsilon_n$ for every $k$ and for $n$ large enough ($n_\\gamma$ and $n_\\beta$ do not depend on $k$) with $\\epsilon_n \\xrightarrow[n\\to\\infty]{}0$. Moreover, fixing $\\gamma$ we know that $\\delta^k\\frac{\\ell_k(\\gamma)}{\\pi^2|\\chi|}\\xrightarrow[k\\to\\infty]{}F(\\gamma)$ hence the sequence $(\\delta^k\\frac{\\ell_k(\\gamma)}{\\pi^2|\\chi|})_{k\\in\\BN}$ is bounded by some $d_\\gamma$ and we obtain\n\t\\begin{equation*}\n\t\\left| \\delta^k_ni(\\hatCurve{k}{n},\\gamma)-\\delta^k\\frac{\\ell_k(\\gamma)}{\\pi^2|\\chi|}\\right| \\leq v_nd_\\gamma\\xrightarrow[n\\to\\infty]{}0.\n\t\\end{equation*}\n\tHence, the convergence holds uniformly in $k$, and $\\lim\\limits_{n\\to\\infty}\\lim\\limits_{k\\to\\infty} \\delta^k_ni(\\hatCurve{k}{n},\\gamma) = \\lim\\limits_{k\\to\\infty}\\lim\\limits_{n\\to\\infty} \\delta^k_ni(\\hatCurve{k}{n},\\gamma)$, which implies that $F(\\gamma)=\\lim\\limits_{n\\to\\infty}i(\\mu_n,\\gamma)$. Moreover, $\\CM\\CL(S)$ is a closed subset of $\\BR_+^{\\FC(S)}$ hence $F(\\cdot)=\\lim\\limits_{n\\to\\infty}i(\\mu_n,\\cdot)$ is of the form $F(\\cdot)=i(\\mu,\\cdot)$ where $\\mu\\in\\CM\\CL(S)$, which was what we needed to prove. \\qed\n\t\n\t\\bibliographystyle{plain}\n\t\\bibliography{Bibliography.bib}", "images": []}
|
|
|
|
interleaved/b9950160-41cc-4149-9024-2a76cc042ff4.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/ba26c516-c466-4b3d-9a9f-7394934d40e5.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/c02f9657-d44c-438e-98b2-8ec7fda3b4b6.json
DELETED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/c5a529f5-8540-4ff6-8e70-ccf5b47cf642.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "Binomial edge ideals were introduced by Herzog, Hibi, Hreinsd\u00f3ttir, Kahle and Rauh \\cite{HHHKR} and, independently, by Ohtani \\cite{Oh} as a generalization of determinantal ideals. The main purpose was to understand the interplay between the combinatorial invariants of a graph and the algebraic invariants of its associated binomial edge ideal. It is worth mentioning that before binomial edge ideals were defined, other ways to encode combinatorial objects into ideals had been introduced. In this context, one of the greatest contributions came from Stanley and Reisner who established a bijection between simplicial complexes and squarefree monomial ideals. Also, Villarreal \\cite{Vi} introduced the notion of edge ideals of graphs, which are ideals generated by monomials $x_ix_j$ corresponding to the edges of $G$.\\\\\n\n Let $R=\\mathbb{K}[x_1, \\ldots,x_n, y_1, \\ldots,y_n]$ and let $G$ be a graph on $n$ vertices with edges $E(G)$. We write $J_G$ to denote the \\emph{binomial edge ideal} of $G$, that is\n\\begin{equation*}\nJ_G= \\left( f_{ij}:=x_i y_j-x_j y_i \\mid \\{i,j\\} \\in E(G)\\right).\n\\end{equation*}\n\n\nIn other words, $J_G$ is the ideal generated by the $2$-minors of the generic matrix\n$$ X_{n}=\n\\begin{bmatrix}\n x_1 & x_2 & x_3 & \\dots & x_n \\\\\n y_1 & y_2 & y_3 & \\dots & y_n\n\\end{bmatrix}\n$$\n\nwhose column indices are given by the edges of $G$. \\par \nIf we take $G=K_n$ to be the complete graph on $n$ vertices, then it is clear from the definition that its binomial edge ideal $J_{K_n}$ is the ideal of $2$-minors of $X_n$. This is why binomial edge ideals can be considered a generalization of determinantal ideals.\\par\n\n\n\n\n\n\nIn \\cite{HHHKR} the authors study the algebraic properties of binomial edge ideals in terms of combinatorial invariants of the underlying graph. Among other things, they prove that binomial edge ideals are radical, and they give a combinatorial description of their minimal primes (see \\cite[Theorem 3.2]{HHHKR}). Furthermore, they find out that the only graphs whose binomial edge ideals have a quadratic Gr\\\"obner basis (with respect to a diagonal term order) are those such that if $\\{i,k\\} \\in E(G)$ then $\\{i,j\\} \\in E(G)$ and $\\{j,k\\} \\in E(G)$ for all integers $1 \\leq i <j<k\\leq n$. They called them \\emph{closed graphs}.\n\\begin{thm}\\cite[Theorem 1.1]{HHHKR}\\label{thmHHH}\n$G$ is a closed graph if and only if the natural generators of $J_G$ form a Gr\\\"obner basis with respect to a diagonal term order.\n\\end{thm}\n\nA generalization of this theorem to determinantal facet ideals of simplicial complexes has recentely been found in \\cite[Theorem 82 and 87]{BSV}. These results correct a theorem stated in \\cite{EHHM} and provide a partial answer to a question by Almousa\u2013Vandebogert \\cite{AV}.\\\\\n\n\n\nLater, Matsuda \\cite{Ma} extends the algebraic approach introduced by Herzog et al. in \\cite{HHHKR} to a larger class of graphs, that he called \\emph{weakly closed graphs}.\n\\begin{defn} Let $G$ be a simple graph on $[n]$. $G$ is said to be \\emph{weakly closed} if there exists a labeling of the vertices such that for all integers $1 \\leq i <j<k\\leq n$, if $\\{i,k\\} \\in E(G)$ then $\\{i,j\\} \\in E(G)$ or $\\{j,k\\} \\in E(G)$.\n\\end{defn}\n\nWeakly closed graphs are a generalization of closed graphs. In fact, while the definition of closed graphs requires that both $\\{i,j\\}$ and $\\{j,k\\}$ are edges of $G$, for weakly closed graphs it is enough that one of them is an edge of $G$.\n\n\n\\begin{oss}\nIt is worth pointing out that closed graphs and weakly-closed graphs were already well-known and widely studied in combinatorics where it is costumary to refer to them as \\emph{unit-interval graphs} and \\emph{co-comparability graphs} (i.e. graphs whose complement is the comparability graph of a poset \\cite[Theorem 1.9]{Ma}). So, it would be more accurate to say that these graphs were re-discovered by Herzog et al. and Matsuda from an algebraic perspective. \n\\end{oss}\n\nWith the above in mind, it is easy to see that complete multipartite graphs and interval graphs are weakly closed and that weakly closed graphs are perfect (see \\cite{Ma} for more details).\\\\\n\nIn \\cite{Ma} Matsuda began the study of binomial edge ideals of weakly-closed graphs. In particular, assuming that $\\mathbb{K}$ has positive characteristic, he generalized Othani's theorem about $F$-purity of binomial edge ideals associated with complete multipartite graphs (see \\cite[Theorem 3.1]{Oh}).\n\n\\begin{thm}\\cite[Theorem 2.3]{Ma}\\label{Mafp}\nLet $G$ be a weakly closed graph and let $J_G$ be the binomial edge ideal associated with $G$. Then $R/J_G$ is $F$-pure.\n\\end{thm}\n\nThe above result, together with Theorem \\ref{thmHHH}, motivates us to continue the work of Matsuda on binomial edge ideals associated with weakly-closed graphs, with a special focus on their interaction with Knutson ideals (for a short overview of Knutson ideals, see Section 2). In particular, similarly to what has been done in \\cite{HHHKR} for closed graphs, we give an algebraic characterization of weakly closed graphs in terms of their binomial edge ideals.\n\n\\begin{mthm*}[Theorem \\ref{cf-wc}] $G$ is a weakly closed graph on $[n]$ if and only if (there exists a labeling such that) $J_G$ is a Knutson ideal associated with $f=y_1 f_{12} \\ldots f_{n-1n}y_n \\in R$.\n\\end{mthm*}\n\nTo prove this theorem, we start off by proving that binomial edge ideals of weakly closed graphs are Knutson ideals. By the properties of Knutson ideals, the following result also gives an alternative proof of Theorem \\ref{Mafp} in positive characteristic:\n\n\\begin{propA}[Proposition \\ref{wc-cf}]\\label{introwc-cf}\nLet $G$ be a weakly closed graph on $[n]$. Then its binomial edge ideal $J_G$ is a Knutson ideal. \\par \nIn particular, if $\\mathbb{K}$ has positive characteristic, then $\\mathbb{K}[X]/J_G$ is $F$-pure.\n\\end{propA}\n\n\n\n\nMoreover, the proof of this result suggests the following characterization of weakly closed graphs in terms of minimal primes of their binomial edge ideals.\n\n\\begin{propA}[Proposition \\ref{psps}] \\label{introprimes}$G$ is a weakly closed graph if and only if the minimal primes of its binomial edge ideal can be written as a sum of determinantal ideals on (disjoint) adjacent columns.\n\\end{propA}\n\nThe advantage of this alternative proof of Matsuda's theorem is that it easily extends to a larger class of ideals, introduced by Rauh \\cite{Ra} and called \\emph{generalized binomial edge ideals}. Thus we get the following:\n\n\\begin{propA}[Proposition \\ref{wc-cf-gen}]\nLet $G$ be a weakly closed graph on $[n]$. Then its generalized binomial edge ideal $\\mathfrak{J}_G$ is a Knutson ideal. \\par \nIn particular, if $\\mathbb{K}$ has positive characteristic, then $\\mathbb{K}[X]/\\mathfrak{J}_G$ is $F$-pure.\n\\end{propA}\n\n\n\nIn Section 4 we use Proposition \\ref{introprimes} to prove that the converse of Proposition \\ref{introwc-cf} is still true. This completes the proof of the main theorem, i.e. the set of binomial edge ideals in $\\mathcal{C}_f$ coincides with that of binomial edge ideals of weakly closed graphs (Theorem \\ref{cf-wc}).\\par \nFor this purpose, we will also need the following description of all minimal primes of Knutson ideals associated with $f=y_1 f_{12} f_{23} \\ldots f_{n-1n}y_n \\in R$.\n\n\\begin{propA}[Proposition \\ref{PrIdCf}]\\label{introMP} Let $I$ be a Knutson ideal associated with $f$ and let $P \\in \\Min (I)$. Then\n$$P= \\left(\\left(y_1,\\ldots,y_{k-1}\\right)+\\left(x_u\\right)_{U \\subset \\{1,\\ldots, k-1\\}}\\right)+L+\\left(\\left(x_{l+1},\\ldots,x_n\\right)+\\left(y_v\\right)_{V \\subset \\{l+1,\\ldots, n\\}}\\right)$$\nwhere $L\\subset \\mathbb{K}[x_k,x_{k+1},\\ldots, x_l]$ is a minimal prime of the binomial edge ideal of a weakly closed graph and each of the three summands may possibly be the zero ideal.\n\\end{propA}\n\n\nThe proof of Proposition \\ref{introMP} is quite technical. However, it is worth noticing that knowing all minimal primes of Knutson ideals associated with $f$ is a quite strong result in the study of Knutson ideals of generic matrices. In fact, Knutson ideals are radical and minimal prime ideals can be considered as the \\lq \\lq building blocks\" of radical ideals. Hence, heuristically, having a characterization of all minimal primes of Knutson ideals is not so far from having a characterization of Knutson ideals themselves. In this sense, this is a step forward towards a partial solution of the problem of finding a characterization of all Knutson ideals of generic matrices (see \\cite{Se2}), in the specific case of $2 \\times n$ generic matrices. The general result could have interesting consequences on Gr\\\"obner bases of determinantal-like ideals.\\\\\n\n\\textbf{Acknowledgements.} I would like to thank Matteo Varbaro for several enlightening and helpful discussions. I am also grateful to Bruno Benedetti for some observations on the combinatorial aspects of the topic, and on the connection with determinatal facet ideals.\nKnutson ideals were first introduced by Conca and Varbaro in \\cite{CV} and they were named after Knutson's work \\cite{Kn} on compatibly split ideals and degeneration.\n \n \\begin{defn}[Knutson ideals] \\label{K.I.} Let $f \\in S= \\mathbb{K}[x_1,\\ldots,x_n]$ be a polynomial such that its leading term $\\lt (f)$ is a squarefree monomial for some term order $\\prec$ .\nDefine $\\mathcal{C}_f$ to be the smallest set of ideals satisfying the following conditions:\n\\begin{enumerate}\n\\item[1.] $(f) \\in \\mathcal{C}_f$;\n\\item[2.] If $I \\in \\mathcal{C}_f$ then $I:J \\in \\mathcal{C}_f$ for every ideal $J \\subseteq S$;\n\\item[3.] If $I$ and $J$ are in $\\mathcal{C}_f$ then also $I+J$ and $I \\cap J$ must be in $\\mathcal{C}_f$.\n\\end{enumerate} \nIf $I$ is an ideal in $\\mathcal{C}_f$, we say that I is a \\emph{Knutson ideal associated with} $f$. More generally, we say that $I$ is a \\emph{Knutson ideal} if $I \\in \\mathcal{C}_f $ for some $f$.\n \\end{defn}\n \n By the assumption on its initial form, if $\\deg f=n$, then $f$ defines a splitting map on $S$. Therefore, Knutson ideals turn out to be compatibly split ideals with respect to this map, in the sense of Schwede \\cite{Sc}. As a consequence, they define $F$-pure rings in positive characteristic. In addition, Knutson ideals\nhave square-free initial ideals (see \\cite{Kn},\\cite{Se1}). Hence, the extremal Betti numbers, Castelnuovo-Mumford regularity, and depth of Knutson ideals coincide\nwith the corresponding numerical invariants of their initial ideals (see \\cite{CV}). Lastly, Gr\\\"obner bases of Knutson ideals behave \\lq\\lq well\\rq \\rq with respect to sums. This fact makes computations of Gr\\\"obner bases easier in many cases. Hence, Knutson ideals also provide a useful tool to solve problems in applied algebra. \\par \nFor all these reasons, these ideals are objects of particular interest in computational algebra, combinatorial commutative algebra, and tight closure theory. More details about the properties of Knutson ideals in any characteristic can be found in \\cite{Se1}. These properties were first proved by Knutson \\cite{Kn} in the case $\\mathbb{K}=\\mathbb{Z}/p\\mathbb{Z}$. \n\n\\begin{oss}\n It is useful to point out that since every ideal of $\\mathcal{C}_f$ is radical, the second condition in Definition \\ref{K.I.} can be replaced by the following:\n \\begin{itemize}\n \\item[$2^\\prime .$] If $I \\in \\mathcal{C}_f$ then $\\mathcal{P} \\in \\mathcal{C}_f$ for every $\\mathcal{P} \\in \\Min(I)$.\\\\\n \\end{itemize}\n \\end{oss}\n \n It has already been shown that some interesting classes of ideals are Knutson ideals; for instance determinantal ideals of Hankel matrices \\cite{Se1} and generic matrices \\cite{Se2}. In this paper we prove that also binomial edge ideals of weakly closed graphs are Knutson ideals. Actually, we prove a stronger result: a graph is weakly closed if and only if its binomial edge ideal is a Knutson ideal.\\\\\n\nIn order to prove the main theorem of this paper, we first need to specify the polynomial $f$ defining the family of Knutson ideals we are considering. Since binomial edge ideals are a generalization of the ideal of 2-minors of a generic matrix of size $2 \\times n$, the choice of $f$ is going to be the standard one for generic matrices of any size (see \\cite[Theorem 2.1]{Se2}), as explained below. \\\\\n\nGiven a graph $G$ on $n$ vertices and $X_{n}$ the generic matrix of size $2 \\times n$ \n\n\\[ X_{n}=\n\\begin{bmatrix}\n x_1 & x_2 & x_3 & \\dots & x_n \\\\\n y_1 & y_2 & y_3 & \\dots & y_n\n\\end{bmatrix}\n\\]\n\nwe define \n$$f=y_1 f_{12} f_{23} \\cdots f_{n-1 n} x_n \\in R.$$\n\nIn other words $f$ is the product of all the minors of $X$ whose diagonals are the diagonals of $X$. Thus, if we equip $R$ with a diagonal term order, we get\n\n$$\\lt (f)= \\prod \\limits_{i=1}^{n} x_i y_i.$$\n\nWe can then construct the Knutson family of ideals associated with this $f$. This choice of $f$ allows us to apply some known results from \\cite{Se2}. In particular, in the next section we make use of the following lemma.\n\n\n\n\\begin{lem} \\cite[Theorem 2.1]{Se2}\\label{lemSe2}\n Let $X=X_n$ and $f$ be as in the previous discussion. Denote by $X_{[a,b]}$ the submatrix of $X$ with adjacent columns from $a$ to $b$. \n\n Then $$I_{t}(X_{[a,b]}) \\in \\mathcal{C}_f$$ for every $1 \\leq a \\leq b \\leq n$ and $t \\in \\{1,2\\}$ .\n\\end{lem}\n\nMore generally, this result can be stated for any generic matrix of size $m \\times n$ and for any size of the minors: given a generic matrix, all determinantal ideals on adjacent columns (or adjacent rows) are Knutson ideals for the standard choice of $f$ (see \\cite[Theorem 2.1]{Se2} for more details).\nWe start off by proving that the binomial edge ideal attached to a weakly closed graph is a Knutson ideal.\n\n\\begin{prop}\\label{wc-cf} Let $G$ be a weakly closed graph on the vertex set $\\{1,\\ldots,n\\}$ and let $f= y_1 f_{12} f_{23}\\cdots f_{n-1 n} x_n \\in R$. Then (there exists a labeling of the vertices such that) $$J_G \\in \\mathcal{C}_f.$$\nIn particular, if $\\mathbb{K}$ has positive characteristic then $R/J_G$ is $F$-pure.\n\\end{prop}\n\nThe strategy to prove Proposition \\ref{wc-cf} is to write each of the minimal primes of the binomial edge ideal $J_G$ as a sum of determinantal ideals on adjacent columns, so that we can apply Lemma \\ref{lemSe2}.\n\n\\begin{proof}[Proof of Proposition \\ref{wc-cf}]\nFor each subset $S\\subset [n]$ and $T= [n] \\setminus S$, define $G_1, \\ldots, G_{c(S)}$ to be the connected components of $G_T$ (i.e. the restriction of $G$ to $T$) and let $\\widetilde{G}_1, \\ldots,\\widetilde{G}_{c(S)}$ be the corresponding complete graphs on their vertices. Set \n$$P_S:= \\left( \\bigcup_{i \\in S} \\{x_i,y_i\\}\\right)+ J_{\\widetilde{G}_1}+\\ldots+ J_{\\widetilde{G}_{c(S)}} .$$ \\par\n$P_S$ is a prime ideal and it has been shown (see \\cite{HHHKR}) that the primary decomposition of the binomial edge ideal associated with $G$ is given by\n$$J_G= \\bigcap_{S \\subseteq [n]} P_S.$$\\par\nIf we prove that $P_S$ is a Knutson ideal for each $S$, we get that $J_G \\in \\mathcal{C}_f$.\\par\nFirst of all, notice that by Lemma \\ref{lemSe2}, we know that $\\left( x_s,y_s \\right) \\in \\mathcal{C}_f$ for every $s \\in S$, because it is the ideal of $1$-minors on column $s$. So their sum $\\left( \\bigcup_{i \\in S} \\{x_i,y_i\\}\\right)$ is also in $\\mathcal{C}_f$. Unfortunately, the lemma does not apply to $J_{\\widetilde{G}_{i}}$ because the vertices of $G_i$ might not be consecutive. Thus, we need to reduce to the case where each of the $J_{\\widetilde{G}_{i}}$'s is an ideal of minors on adjacent columns of $X$ (equivalently, it is the binomial edge ideal of a complete graph on consecutive vertices) so that we can apply Lemma \\ref{lemSe2}.\\par \n For this purpose, fix $i \\in \\{1, \\ldots,c(S)\\}$ and let $V(G_i)=V(\\widetilde{G}_{i}):= \\left\\lbrace j_1, \\ldots, j_{t_i}\\right\\rbrace$.\\par\nIf $j_{k+1}=j_{k}+1$ for every $k=1, \\ldots, t_i-1$, that is, if the vertices of $G_i$ are consecutive, then $J_{\\widetilde{G}_{i}}=I_2 (X_{[j_1,j_{t_i}]}) \\in \\mathcal{C}_f$ by Lemma \\ref{lemSe2}. \\par \nAssume instead that $j_k-j_{k-1} >1$ for some $k$ and let $l$ be a vertex of $G$ such that $j_{k-1}<l<j_k$. Since $G_i$ is connected, there exist $m,n \\in V(G_i)$ such that $m<l<n$ and $\\{m,n\\} \\in E(G_i)\\subset E(G)$. This implies that either $\\{m,l\\} \\in E(G)$ or $\\{l,n\\} \\in E(G)$, because $G$ is weakly closed. Assume that $l \\notin S$. Then $l$ must be in the same connected component of $m$ and $n$. This would imply that $l \\in V(G_i)$, a contradiction. In other words, we have just shown that if there is a \\lq\\lq gap\" between the vertices of the connected component $G_i$, every vertex $l$ in this gap must be in $S$. But then, we can add all these missing vertices to $V(G_i)$ and replace $J_{\\widetilde{G}_i}$ with $\\overline{J}_{\\widetilde{G}_i}:=I_2(X_{[j_1, j_{t_i}]})$ without changing our prime ideal $P_S$. Indeed, in doing so, we are adding some $2$-minors to our original prime ideal, but these minors were already contained in $P_S$, since they are contained in the ideal $\\left( \\bigcup_{i \\in S} \\{x_i,y_i\\}\\right)$.\\par \nIn conclusion, if we replace each of the $J_{\\widetilde{G}_i}$'s with its corresponding $\\overline{J}_{\\widetilde{G}_i}:=I_2 ( X_{[a_i, b_i]})$, we get \n$$P_S= \\left( \\bigcup_{s \\in S} \\{x_s,y_s\\}, J_{\\widetilde{G}_1}, \\ldots, J_{\\widetilde{G}_{c(S)}} \\right)= \\left( \\bigcup_{s \\in S} \\{x_s,y_s\\}, I_2 ( X_{[a_1, b_1]}), \\ldots, I_2 ( X_{[a_{c(S)}, b_{c(S)}]}) \\right).$$\nFinally, we can apply Lemma \\ref{lemSe2} and we obtain that $P_S \\in \\mathcal{C}_f$ for every $S \\subset [n]$, because it is the sum of ideals of minors on adjacent columns. This completes the proof.\n\\end{proof}\n\nUsing the same notation as in the proof of Proposition \\ref{wc-cf}, we set \n$$\\overline{P}_S:= \\left( \\bigcup_{s \\in S} \\{x_s,y_s\\}, \\overline{J}_{\\widetilde{G}_1}, \\ldots, \\overline{J}_{\\widetilde{G}_{c(S)}} \\right)$$ \nwhere each of the $\\overline{J}_{\\widetilde{G}_i}$'s is an ideal of 2-minors on adjacent columns. It has just been shown that if $G$ is a weakly-closed graph, each of the minimal primes of its binomial edge ideal is a sum of determinantal ideals on adjacent columns, that is $P_S=\\overline{P}_S$ for every minimal prime of $J_G$. Actually, it turns out that the only binomial edge ideals whose minimal primes can be written in this way are those arising from weakly closed graphs.\\\\\n\n We have the following characterization of weakly closed graphs in terms of minimal primes of their binomial edge ideals.\n\n\\begin{prop}\\label{psps}Let $G$ be a connected graph and let $J_G= \\bigcap_{S\\subseteq [n]} P_S$ be the primary decomposition of the binomial edge ideal associated with $G$. The following are equivalent\n\\begin{itemize}\n\\item[(1)]$G$ is weakly closed. \n\\item[(2)]There exists a labeling of the vertices of $G$ such that $P_S=\\overline{P}_S$ for every minimal prime of $J_G$.\n\\end{itemize}\n\\end{prop}\n\n\\begin{proof}\n$(1) \\Rightarrow (2)$ has already been proved. \\par \n$(2) \\Rightarrow (1)$ Let us assume by contradiction that $G$ is not weakly closed. We will show that for every labeling of the vertices it is possible to find $S\\subseteq [n]$ such that $P_S \\neq \\overline{P}_S$ and we are done.\\par\nSince $G$ is not weakly closed, for each labeling of the vertices there exist $k,l,m \\in V(G)$ with $k<l<m$ such that $\\{k,m\\}\\in E(G)$ and $\\{k,l\\},\\{l,m\\} \\notin E(G)$. Nonetheless, there are a finite number of paths of length $\\geq 2$ connecting $k$ to $l$ in $G \\setminus \\{m\\}$ and $m$ to $l$ in $G \\setminus \\{l\\}$. We will denote them as follows:\n\n\\begin{align*}\np_1: \\quad k,p_{11}&,p_{12},\\ldots,l\\\\\np_2: \\quad k,p_{21}&,p_{22},\\ldots,l\\\\\n & \\vdots\\\\\np_r: \\quad k,p_{r1}&,p_{r2},\\ldots,l\\\\\nq_1:\\quad m,q_{11}&,q_{12},\\ldots,l\\\\\nq_2: \\quad m,q_{21}&,q_{22},\\ldots,l\\\\\n&\\vdots \\\\\nq_t: \\quad m,q_{t1}&,q_{t2},\\ldots,l.\n\\end{align*} \nNow, take $S$ to be the set of the first vertices of the previous paths, that is\n $$S=\\{ p_{11},p_{21}, \\ldots, p_{r1},q_{11},q_{21}, \\ldots,q_{t1}\\}.$$ Then $P_S$ is a minimal prime of $J_G$ and we claim that $P_S \\neq \\overline{P}_S$.\\par \n By definition of $S$, $\\{k,m\\}$ and $l$ do not belong to the same connected component of $G_{[n] \\setminus S}$. Without loss of generality we can assume that $G_1$ is the connected component of $\\{k,m\\}$ and $G_2$ is the connected component of $l$. Observe that $x_l,y_l \\notin \\bigcup_{s \\in S} \\{x_s,y_s\\}$. Thus, it is straightforward to see that \n $$\\left( \\bigcup_{s \\in S} \\{x_s,y_s\\}, J_{\\widetilde{G}_1}, J_{\\widetilde{G}_2} \\right) \\neq \\left( \\bigcup_{s \\in S} \\{x_s,y_s\\}, \\overline{J}_{\\widetilde{G}_1}, \\overline{J}_{\\widetilde{G}_{2}} \\right).$$\n This shows that $P_S \\neq \\overline{P}_S$.\n\\end{proof}\n\nThe proof of previous results relies on two main ingredients: the primary decomposition of binomial edge ideals, and Lemma \\ref{lemSe2} on determinantal ideals on adjacent columns. Actually, these two facts hold in a more general setting. Thus Propositions \\ref{wc-cf} and \\ref{psps} extend to a larger class of ideals, called \\emph{generalized binomial edge ideals}. These ideals were introuced by Rauh \\cite{Ra} as a generalization of binomial edge ideals. \\par \nLet $G$ a graph on $n$ vertices and let $X$ be the generic matrix of size $m \\times n$ with entries $x_{ij}$. We can attach to $G$ an ideal in the polynomial ring $\\mathbb{K}[X]=\\mathbb{K}[x_{ij} \\mid 1 \\leq i \\leq m, 1 \\leq j \\leq n]$ defined as follows, \n\\begin{equation*}\n\\mathfrak{J}_G= \\sum \\limits_{\\{i,j\\} \\in E(G)} I_2 \\begin{pmatrix}\n x_{1i} & x_{1j} \\\\\n x_{2i} & x_{2j}\\\\\n \\vdots & \\vdots\\\\\n x_{mi} & x_{mj}\n\\end{pmatrix} .\n\\end{equation*}\n$\\mathfrak{J}_G$ is the \\emph{generalized binomial edge ideal of }$G$. If $m=2$ this definition boils down to the usual definition of binomial edge ideal.\\par\nIn \\cite{Ra} the author proves that generalized binomial edge ideals are radical and the primary decomposition is analogous to that of classical binomial edge ideals. Furthermore, as already said, Lemma \\ref{lemSe2} holds for any generic matrix of size $m \\times n$ and for any size of the minors (see \\cite[Theorem 2.1]{Se2}). Hence, if $G$ is a weakly closed graph, the proofs of Proposition \\ref{wc-cf} and \\ref{psps} easily generalize to $\\mathfrak{J}_G$.\n\n\\begin{prop}\\label{wc-cf-gen} Let $G$ be a weakly closed graph on $[n]$. Define $f \\in \\mathbb{K}[X]$ to be the product of all the minors of $X$ corresponding to all the diagonals of $X$ (see \\cite[Theorem 2.1]{Se2} for further details). Then $\\mathfrak{J}_G \\in \\mathcal{C}_f$. \\par \nIn particular, if $\\mathbb{K}$ has positive characteristic, then $\\mathbb{K}[X]/\\mathfrak{J}_G$ is $F$-pure.\n\\end{prop}\n\n\n\\begin{prop}\\label{psps-gen}Let $G$ be a connected graph and let $\\mathfrak{J}_G= \\bigcap_{S\\subseteq [n]} \\mathfrak{P}_S$ be the primary decomposition of the generalized binomial edge ideal associated with $G$. Then $G$ is weakly closed if and only if there exists a labeling of the vertices of $G$ such that $\\mathfrak{P}_S$ is a sum of determinantal ideals on adjacent columns.\n\\end{prop}\nThe goal of this section is to use the characterization of weakly closed graphs found in Proposition \\ref{psps} to prove the converse of Proposition \\ref{wc-cf}. Thereby we find an algebraic characterization of weakly closed graphs.\n\n\\begin{thm}\\label{cf-wc}\n$G$ is weakly closed $\\Leftrightarrow$ ($\\exists$ a labeling of the vertices such that) $J_G \\in \\mathcal{C}_f$.\n\\end{thm}\n\nSo far, we have the following:\n\n\n\n\n\n\n\n $$\\xymatrix@=3em{\nG \\text{ weakly closed}\\ar@{<=>}[rr]\\ar@{=>}[d]\\ar@{=>}[rrd]^{ \\ \\ \\text{ Matsuda }}_{\\car \\mathbb{K}=p>0}&& P_S=\\overline{P}_S, \\ \\forall P_S \\in \\Min (J_G)\\\\\nJ_G \\in \\mathcal{C}_f\\ar@{=>}[rr]_{\\car \\mathbb{K}=p>0}&& J_G \\text{ $F$-pure}.\n}$$\n\n\n\nTo prove Theorem \\ref{cf-wc} we first characterize all the minimal primes of the ideals in $\\mathcal{C}_f$.\n\\begin{prop} \\label{PrIdCf}\nLet $I\\in \\mathcal{C}_f$ and let $P \\in \\Min (I)$. Then\n$$P= \\left(y_1,\\ldots,y_{k-1}\\right)+\\left(x_u\\right)_{U \\subset \\{1,\\ldots, k-1\\}}+L_S +\\left(x_{l+1},\\ldots,x_n\\right)+\\left(y_v\\right)_{V \\subset \\{l+1,\\ldots, n\\}}$$\nwhere $L_S=\\overline{L}_S\\subseteq \\mathbb{K}[x_{k},y_{k},\\ldots,x_{l},y_{l}]$ is a minimal prime of a weakly closed graph, and each of the three summands may possibly be the zero ideal.\n\\end{prop}\n\nThis characterization, together with \\cite[Theorem 3.2]{HHHKR}, enables us to prove Theorem \\ref{cf-wc}. In fact, it shows that among these minimal primes, those which are minimal primes of binomial edge ideals have the property described in Proposition \\ref{psps}. Hence, the underlying graphs must be weakly-closed.\\\\\n\nSince the proof of Proposition \\ref{PrIdCf} is fairly technical, we defer it until after the proof of the main theorem of this section.\n\n\\begin{proof}[Proof of Theorem \\ref{cf-wc}]\nWe have already proved in Proposition \\ref{wc-cf} that if $G$ is weakly closed, $J_G$ is a Knutson ideal associated with $f$. It remains to prove that if $J_G$ is a Knutson binomial edge ideal, then $G$ is a weakly closed graph. \\par \nLet $J_G$ be a binomial edge ideal in $\\mathcal{C}_f$. By \\cite[Theorem 3.2]{HHHKR}, we know that a primary decomposition of $J_G$ is given by \n$$J_G= \\bigcap_{S \\subseteq [n]} P_S$$\nwhere $P_S:= \\Bigl( \\bigcup_{i \\in S} \\{x_i,y_i\\}\\Bigr)+ J_{\\widetilde{G}_1}+\\ldots+ J_{\\widetilde{G}_{c(S)}}.$ On the other hand, by Proposition \\ref{PrIdCf}, we know that $P_S =\\overline{P}_S$ for every minimal prime of $J_G$. Hence, the thesis follows from Proposition \\ref{psps}.\n\\end{proof}\n\nIn view of the proof of Proposition \\ref{PrIdCf} we first recall a property of Knutson ideals that we are going to use throughout this section in some more or less evident form .\n \\begin{oss}\\label{sumint}\n If $I,J,K$ are Knutson ideals, then sum distributes over intersection:\n $$I +(J \\cap K)=(I+J) \\cap (I+K).$$\n This fact easily follows from \\cite[Remark 1]{Se1} and from the fact that the union of Gr\\\"obner bases of Knutson ideals associated with $f$ is a\nGr\\\"obner basis of their sum.\n \\end{oss}\n\nBy definition, $\\mathcal{C}_f$ is constructed from $(f)$ by taking its minimal primes, their sums, their intersections, and iterating. Since by Remark \\ref{sumint} finite sums and intersections commute, we only need to prove Proposition \\ref{PrIdCf} for sums of minimal prime ideals of Knutson ideals.\\par \nFurthermore, by definition, the sum of two Knutson ideals is again a Knutson ideal. Since we know that binomial edge ideals of weakly closed graphs are in $\\mathcal{C}_f$, so are their minimal primes and the sums of these minimal primes. However, the sum of two prime ideals could be not prime. Hence, it is clear that one of the first thing we need to check in order to prove Proposition \\ref{PrIdCf} is the following:\n\\begin{lem}\\label{lemmaP+Q}\nAssume that $P$ and $Q$ are minimal primes of the binomial edge ideals of two weakly closed graphs, so that $P=\\overline{P}$ and $Q=\\overline{Q}$. Then every minimal prime $L$ of the sum $P+Q$ has the property $L=\\overline{L}$ described in the previous section.\n\\end{lem}\n\nBefore proving the lemma, we make the following observation about the structure of binomial edge ideals with two associated primes.\n\n\\begin{oss}\nLet $P$ be a minimal prime of a binomial edge ideal on $n$ vertices, then it has the form \n$$P=\\left( \\bigcup_{i \\in S} \\{x_i,y_i\\}, J_{G_1}, \\ldots, J_{G_t} \\right)$$ where each $G_i$ is a complete graph with vertex set $V_i$. Denote by $\\widetilde{V}$ the set of vertices that do not appear in $P$. Then $I_2(X_{[1,n]}) \\cap P$ is the primary decomposition of the binomial edge ideal of the graph\n$$G= K_{S} \\cup K_{S,\\widetilde{V}}\\cup K_{S,V_1} \\cup \\ldots \\cup K_{S,V_t}\\cup G_1 \\cup \\ldots \\cup G_t $$ \n where $K_S$ denote the complete graph on $S$ and $K_{S,V_i}$ denote the complete bipartite graph on $S$ and $V_i$.\\par \n Actually, in \\cite{Sh} Sharifan proved that if $G$ is a connected graph on $n$, then\n $\\Ass|(J_G)|=2$ if and only if $G$ is the join of a complete graph $G_1$ and a graph $G_2$ which is a disjoint union of complete graphs.\\par \n Let us take as an example $P= (x_3,y_3, I_2(X_{[4,6]}))$. Then $S=\\{3\\}$, $V_1=\\{4,5,6\\}$ and $\\tilde{V}= \\{1,2\\}$. Hence,\n $$G=K_{3,V_1} \\cup K_{3,\\widetilde{V}} \\cup K_{[4,6]}$$\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nand $P \\cap I_2(X_{[1,6]})= J_{G}=([3,6],[3,5],[3,4],[2,3],[1,3],[5,6],[4,6],[4,5])$. \n \\end{oss}\n\nWith this in mind we can now prove Lemma \\ref{lemmaP+Q}.\n\n\\begin{proof}\nBy assumption, there exist $G_1$ and $G_2$ weakly closed graphs such that $P \\in \\Min (J_{G_1})$ and $Q \\in \\Min (J_{G_2})$.\nLet $P+Q=L_1\\cap \\ldots\\cap \\L_t$ be the minimal primary decomposition of $P+Q$. We want to prove that the $L_i$ are minimal prime ideals of the binomial edge ideal of a weakly closed graph, so that $L_i=\\overline{L_i}$ for every $i$.\\par\nNote that we can always choose $n$ big enough such that $L_i\\nsupseteq I_2(X_{[1,n]})$ for every $i\\in \\{1, \\ldots,t\\}$. Let $X=X_{[1,n]}$.\\par \nAssume for the moment that $P,Q \\subseteq I_2(X)$, then $P$ and $Q$ must not contain variables, that is\n\\begin{align*}\nP&= J_{\\overline{G}_P}\\\\\nQ&=J_{\\overline{G}_Q}\n\\end{align*}\n\nwhere $\\overline{G}_P$ and $\\overline{G}_Q$ are unions of disjoint complete graphs on consecutive vertices (because $G_1$ and $G_2$ are weakly closed). Hence\n$$P+Q= J_{\\overline{G}_P}+J_{\\overline{G}_Q}= J_{\\overline{G}_P\\cup \\overline{G}_Q}.$$\nBeing $\\overline{G}_P\\cup \\overline{G}_Q$ a weakly closed graph, $L=\\overline{L}$ for every ideal $L \\in \\Min (P+Q)$.\\par\nNow assume without loss of generality that $P \\nsubseteq I_2(X)$. \n By Theorem \\ref{wc-cf}, we know that $J_{G_1}, J_{G_2} \\in \\mathcal{C}_f$ and so are $P$ and $Q$. Hence $P+Q \\in \\mathcal{C}_f$. Now we consider the intersections \n\\begin{align*}\nI_2(X) &\\cap P \\\\\nI_2(X) &\\cap Q.\n\\end{align*}\nBy the previous remark, we know that these are binomial edge ideals. Furthermore, being $P=\\overline{P}$ and $Q=\\overline{Q}$, these intersections are binomial edge ideals of two weakly closed graphs, say $\\widetilde{G}_1$ and $\\widetilde{G}_2$. Again by Theorem \\ref{wc-cf}, $J_{\\widetilde{G}_1}$ and $J_{\\widetilde{G}_2}$ are Knutson ideals of $f$, so $$J_{\\widetilde{G}_1 \\cup \\widetilde{G}_2}=J_{\\widetilde{G}_1}+J_{\\widetilde{G}_2} \\in \\mathcal{C}_f.$$\nBy Remark \\ref{sumint}\n \n\\begin{equation*}\n\\begin{split}\nJ_{\\tilde{G}_1 \\cup \\tilde{G}_2}=J_{\\tilde{G}_1}+J_{\\tilde{G}_2}&=( I_2(X) \\cap P)+( I_2(X) \\cap Q)\\\\\n&=I_2(X) \\cap (I_2(X) +Q) \\cap (I_2(X) +P) \\cap (P+Q)\\\\\n&= I_2(X) \\cap (P+Q)\\\\\n&= I_2(X) \\cap (L_1 \\cap \\ldots \\cap L_t)\n\\end{split}\n\\end{equation*}\n\nIf this were the minimal primary decomposition of $J_{\\tilde{G}_1 \\cup \\tilde{G}_2}$, then $L_1 , \\ldots,L_t$ would be minimal prime ideals of a binomial edge ideal of a weakly closed graph, hence the thesis.\\par \nSince we have assumed that $P \\nsubseteq I_2(X)$, clearly $I_2(X) \\nsupseteq L_1 \\cap \\ldots\\cap L_t=P+Q$. Moreover $L_i \\nsupseteq I_2(X) \\cap L_1 \\cap \\ldots\\cap L_{i-1} \\cap L_{i+1} \\cap \\ldots\\cap L_t $, otherwise $L_i$ would contain either $I_2(X)$ or $L_j$ for some $j\\neq i$, but this is impossible by the choice of $X$ and the fact that the $L_i$ are minimal primes of $P+Q$. This shows that $I_2(X) \\cap (L_1 \\cap \\ldots \\cap L_t)$ is the minimal primary decomposition of $J_{\\tilde{G}_1 \\cup \\tilde{G}_2}$ and we are done.\n\\end{proof}\n\n\nTo construct $\\mathcal{C}_f$, we start from the ideal $$(f)=(x_n f_{12}\\cdots f_{n-1,n} y_1)$$ and we take its minimal primes. Thus we obtain the following ideals:\n$$(x_n), (f_{12}),(f_{23}),\\ldots,(f_{n-1n}),(y_1).$$\nAmong them the only binomial edge ideals are those of the form $(f_{i,i+1})$, which corresponds to the graph with exactly one edge, namely $\\{i,i+1\\}$, which is clearly weakly closed. If we take the sum of these binomial edge ideals, we obtain binomial edge ideals of (union of) paths on consecutive vertices. We can then consider their associated primes, the sum of these primes, and the intersections. Iterating this procedure, by Lemma \\ref{lemmaP+Q} and Remark \\ref{sumint}, the prime ideals we obtain are always of the form \n$$P_S=\\overline{P}_S= \\left( \\bigcup_{s \\in S} \\{x_s,y_s\\}, I_2 ( X_{[a_1, b_1]}), \\ldots, I_2 ( X_{[a_{c(S)}, b_{c(S)}]}) \\right).$$\nIn particular, if the intersection of these primes is a binomial edge ideal $J_G$, $G$ must be weakly closed by Proposition \\ref{psps}.\\\\\n\nIt remains to investigate the case when we start from an ideal that contains $(x_n)$ or $(y_1)$ and we iteratively take its minimal primes, their sums and the minimal primes of these sums. It can be shown that in this case we obtain prime ideals which can be written as the sum of an ideal generated by variables and an ideal $L$ with $L=\\overline{L}$ as in Proposition \\ref{PrIdCf}.\\\\\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n Let $I$ be the sum of some minimal primes of $(f)$ that contains $(y_1)$ but not $(x_n)$. Then \n $$I:=(y_1,f_{i_1,i_1+1},\\ldots,f_{i_k ,i_k+1}).$$\n Note that we can always reduce the case $I=(y_1,f_{1,2},\\ldots,f_{k-1, k})$. In fact, if $i_1 >1$ or if there exists an index $j$ such that $i_{j+1} \\neq i_j +1$, then we can write $I$ as a sum of ideals on disjoint sets of variables, say $I= I_1+\\ldots+I_t$. Hence the minimal primes of $I$ are sums of minimal primes of each $I_j$. Among them, the only minimal primes that we still have to study are those of the ideal $(y_1,f_{1,2},\\ldots,f_{k-1, k})$ for some $k \\in \\{1, \\ldots,n\\}$. Analogously, if $I$ is the sum of some minimal primes of $(f)$ that contains $(x_n)$ but not $(y_1)$.\\par \n \n \\begin{prop}\\label{lemma1}\nLet $I:=(y_1,f_{1,2},\\ldots,f_{k-1, k})$ and let $P \\in \\Min (I)$. Then \n \\begin{enumerate}\n \\item $P=(y_1, \\ldots, y_k)$, or\n \n \\item $P=\\left(y_1,y_2, \\ldots, y_{i-1},x_{i-1}\\right)\n+P_S $ for some $i \\in \\{2,\\ldots,k\\}$, where $P_S $ is a minimal prime of the binomial edge ideal of the path $$\\{i,i+1\\}, \\ldots, \\{k-1,k\\}.$$ Hence, $P_S =\\overline{P_S}$.\n\\end{enumerate}\n\n\nAnalogously, if $I:= (f_{k ,k+1}, \\ldots,f_{n-1,n},x_n)$ and $Q \\in Min (I)$ then \n\n\\begin{enumerate}\n\\item $Q=\\left( x_k, \\ldots,x_n \\right)$, or\n \n\\item $Q=\\left( y_{l+1},x_{l+1},x_{l+2}, \\ldots,x_n\\right)+ Q_T$ where $Q_T $ is a minimal prime of the binomial edge ideal of the path $$\\{k,k+1\\} \\ldots \\{l-1,l\\}.$$\nHence, $Q_T =\\overline{Q_T}$.\n\n\\end{enumerate}\n\\end{prop}\n\n\\begin{proof}\nWe prove only the first part of the proposition. Simmetrically, one can prove the analogous result for $Q$. We start off by noticing that \n$$I=(y_1, x_1 y_2,f_{2,3} \\ldots, f_{k-1, k}).$$\nTherefore, if $P$ is a minimal prime of $I$, we have that $P \\supset (x_1 y_2)$. Since $P$ is prime, we have two possibilities, either $x_1 \\in P$ or $y_2 \\in P$.\\par \nIf $x_1 \\in P$, then $P \\supseteq (x_1,y_1)+ I \\supseteq I$ and if we set $\\tilde{I}=(x_1,y_1)+ I$, we have that $P \\in \\Min (\\tilde{I})$. But $\\tilde{I}= (x_1,y_1)+(f_{2,3},\\ldots, f_{k-1,k})$ is a sum of two ideals generated by polynomials in disjoint sets of variables. Hence we conclude that $P= (x_1,y_1)+ P_S$ where $P_S$ is a minimal prime of the ideal $(f_{2,3},\\ldots, f_{k-1,k})$.\\par \nIf $y_2 \\in P$, then $P \\supseteq (y_1,y_2)+ I \\supseteq I$ and if we set, as before, $\\tilde{I}=(x_1,y_1)+ I$, we get that $P \\in \\Min (\\tilde{I})$. Thus we need to study the minimal primes of $\\tilde{I}$. Again, we notice that\n$$\\tilde{I}=(y_1,y_2,x_2 y_3,f_{3,4},\\ldots,f_{k-1,k}).$$\nThus, $P\\supset \\tilde{I} \\supset (x_2 y_3)$ and, since $P$ is a prime ideal, we deduce that either $x_2 \\in P$ or $y_3 \\in P$. We can then iterate the previous argument and get the thesis.\n\\end{proof}\n\n\\begin{oss}\\label{step1} More generally, if $P \\in \\Min \\left( y_1,f_{1,2},\\ldots,f_{n-1,n},x_n\\right)$, then\n\n\\begin{enumerate}\n\\item $P=(y_1,x_1, \\ldots,x_n)$ or $P=(y_1, \\ldots,y_n,x_n)$, or \n \\item $P=\\left( y_1,y_2, \\ldots, y_{i-1},x_{i-1} \\right)\n+P_S + \\left( y_{l+1},x_{l+1},x_{l+2}, \\ldots,x_n \\right)$ \n \nwhere $P_S$ is a minimal prime of the binomial edge ideal of the path\n$$\\{i,i+1\\}, \\ldots, \\{l-1,l\\}.$$\n\nHence, $P_S =\\overline{P_S}$.\\par\n\\end{enumerate}\n\\end{oss}\n\n\nPutting together Lemma \\ref{lemmaP+Q}, Proposition \\ref{lemma1}, and Remark \\ref{step1} we get the following result about the primary decomposition of sums of minimal primes of $(f)$.\n\n\\begin{lem}\\label{step2} Let $P_1, \\ldots,P_k$ be minimal primes of $ \\left( f \\right)$ and let $Q \\in \\Min (P_1+\\ldots+P_k)$. Then\n\\begin{equation}\\label{primiQ}\nQ=\\left( y_1,y_2, \\ldots, y_{i-1},x_{i-1} \\right)\n+L + \\left( y_{l+1},x_{l+1},x_{l+2}, \\ldots,x_n \\right)\n\\end{equation}\n \nwhere $L =\\overline{L} \\subset \\mathbb{K}[x_i,x_{i+1},\\ldots,x_{l-1},x_l]$ is a minimal prime of the binomial edge ideal of a weakly closed graph and each of the three summands may possibly be the zero ideal.\n\n\\end{lem}\n\nThis argument can be generalized in order to see what happens if we take the sum of two prime ideals as in (\\ref{primiQ}) in order to characterize all the minimal primes of Knutson ideals associated with $f$.\\\\\n\nThe next proposition generalizes Proposition \\ref{lemma1}.\n\n\\begin{prop}\\label{lemma2}\nConsider two ideals of the form\n\\begin{align*}\nP_1&=\\left(y_1,\\ldots,y_{i-1}\\right)+\\left(x_j\\right)_{J_1 \\subset \\{1,\\ldots, i-1\\}}+P_{S_1}, \\qquad \\ P_{S_1}=\\overline{P_{S_1}}\\\\ \nP_2&=\\left(y_1,\\ldots,y_{k-1}\\right)+\\left(x_j\\right)_{J_2 \\subset \\{1,\\ldots, k-1\\}}+P_{S_2}, \\qquad P_{S_2}=\\overline{P_{S_2}} \n\\end{align*}\nwhere $P_{S_1} \\subset \\mathbb{K}[x_i,x_{i+1},\\ldots,x_{n-1},x_n]$ and $P_{S_2}\\subset \\mathbb{K}[x_k,x_{k+1},\\ldots,x_{n-1},x_n]$ are minimal primes of binomial edge ideals of weakly closed graphs. Let $P \\in \\Min \\left(P_1+P_2\\right)$. Then there exists an integer $l$ such that\n$$P= \\left(y_1,\\ldots,y_{l}\\right)+\\left(x_u\\right)_{U \\subset \\{1,\\ldots, l\\}}+L$$\nwith $L=\\overline{L}$ in $ \\mathbb{K}[x_{l+1},y_{l+1},\\ldots,x_{n},y_{n}]$.\\par \nSimmetrically, if we consider\n\\begin{align*}\nP_1&=\\left(x_i,\\ldots,x_n\\right)+\\left(y_v\\right)_{J_1 \\subset \\{i,\\ldots, n\\}}+P_{S_1}, \\qquad \\ P_{S_1}=\\overline{P_{S_1}}\\\\ \nP_2&=\\left(x_k,\\ldots,x_n\\right)+\\left(y_v\\right)_{J_2 \\subset \\{k,\\ldots, n\\}}+P_{S_2}, \\qquad P_{S_2}=\\overline{P_{S_2}} \n\\end{align*}\nwhere $P_{S_1} \\subset \\mathbb{K}[x_1,x_{2},\\ldots,x_{i-1}]$ and $P_{S_2}\\subset \\mathbb{K}[x_1,x_{2},\\ldots,x_{k-1}]$ are minimal primes of binomial edge ideals of weakly closed graphs. Let $P \\in \\Min \\left(P_1+P_2\\right)$. Then there esists an integer $l$ such that\n$$P= \\left(x_l,\\ldots,x_n\\right)+\\left(y_v\\right)_{V \\subset \\{l,\\ldots, n\\}}+L$$\nwith $L=\\overline{L}$ in $ \\mathbb{K}[x_{1},y_{1},\\ldots,x_{l-1},y_{l-1}]$.\\\\\n\\end{prop}\n\n\\begin{proof}\nWe only prove the first part of the proposition. The second part follows by symmetry. \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \nBy hypothesis $P_{S_1}= \\overline{P_{S_1}}$ in $\\mathbb{K}[x_i,x_{i+1},\\ldots,x_{n-1},x_n]$ and $P_{S_2}=\\overline{P_{S_2}}$ in $\\mathbb{K}[x_k,x_{k+1},\\ldots,x_{n-1},x_n]$. This means that we can write them as\n\\begin{align*}\nP_{S_1}=\\overline{P_{S_1}}=& \\Bigl( \\bigcup \\limits_{s \\in S_1}\\{x_s,y_s\\}\\Bigr) +I_2(X_{[r_1,t_1])})+\\ldots+I_2(X_{[r_{c_1},t_{c_1}]})\\\\\n P_{S_2}=\\overline{P_{S_2}}=& \\Bigl( \\bigcup \\limits_{s \\in S_2}\\{x_s,y_s\\}\\Bigr) +I_2(X_{[\\tilde{r}_1,\\tilde{t}_1])})+\\ldots+I_2(X_{[\\tilde{r}_{c_2},\\tilde{t}_{c_2}]})\n\\end{align*}\n with $i\\leq r_1 < t_1< r_{2} < t_{2} <\\ldots< t_{c_1}\\leq n$ and $ k\\leq \\tilde{r}_1 < \\tilde{t}_1 <\\tilde{r}_{2} < \\tilde{t}_{2}<\\ldots <\\tilde{t}_{c_2}\\leq n$. Without loss of generality we can assume that $i \\leq k$. We want to study the minimal primes of the ideal $P_1+P_2$. First of all we notice that\n \n \\begin{align*}\nP_1+P_2 =&\\left(y_l\\right)_{l\\leq k-1}+ \\left(x_j\\right)_{j \\in J_1 \\cup J_2}&& +\\left( \\bigcup \\limits_{s \\in S_1}\\{x_s,y_s\\}\\right)+I_2(X_{[r_1,t_1])})+\\ldots+I_2(X_{[r_{c_1},t_{c_1}]})\\\\\n& &&+\\left( \\bigcup \\limits_{s \\in S_2}\\{x_s,y_s\\}\\right) +I_2(X_{[\\tilde{r}_1,\\tilde{t}_1])})+\\ldots+I_2(X_{[\\tilde{r}_{c_2},\\tilde{t}_{c_2}]})\n\\end{align*}\n\n\n\n\n\n\n\n\n\n\n\nand, possibly dropping some summands, we can assume that $ t_1 \\geq k$.\\par \nBut then, keeping in mind that $y_i, \\ldots,y_{k-1} \\in P_1+P_2$, we can write this sum as\n\n\\begin{align*}\nP_1+P_2 &= \\left(y_l\\right)_{l\\leq k-1}+ \\left(x_j\\right)_{j \\in J_1 \\cup J_2} + \\left( x_s\\right)_{ \\substack{{s \\in S_1}\\\\{s \\leq k-1}}}+\\\\\n &+\\Bigl( \\bigcup _{ \\substack{{s \\in S_1}\\\\{s \\geq k}}} \\{x_s,y_s\\}\\Bigr)+I_2(X_{[r'_1,t_1])})+I_2(X_{[r_{2},t_{2}]})+\\ldots +I_2(X_{[r_{c_1},t_{c_1}]})+\\\\\n &+\\left(x_u y_v\\right)_{\\substack {u \\in U_1:=\\{r_1,\\ldots,k-1\\}\\\\ v \\in V_1:=\\{k,\\ldots,t_1\\}}}+P_{S_2}\n\\end{align*}\nwith $r'_1 \\geq k$.\\par\n\nIf $P \\in \\Min (P_1+P_2)$, then $P \\supseteq Q_1$ for some $Q_1 \\in \\Min \\left(x_u y_v\\right)_{\\substack {u \\in U_1 \\\\ v \\in V_1}} $. Therefore $$P \\in \\Min \\left(Q_1+\\left(P_1+P_2\\right)\\right).$$ On the other hand, there are only two possibilities for $Q_1$\n\\begin{itemize}[itemsep=1pt,topsep=4pt] \n \\item[\u2022]$Q_1=\\left(x_u\\right)_{u \\in U_1}$\n \\item[\u2022]$Q_1=\\left(y_v \\right)_{v \\in V_1}.$\n\\end{itemize}\n\n\\begin{itemize}[itemsep=1pt,topsep=4pt,leftmargin=0.1in] \n\\item[]\\textbf{1st case:} If $Q_1 =\\left(x_u\\right)_{u \\in U_1} $, we have\n \\begin{align*}\n Q_1 +\\left(P_1+P_2\\right)=\\left(y_l\\right)_{l\\leq k-1}+ \\left(x_j\\right)_{ j \\in U} +\\left( P'_{S_1}+P_{S_2}\\right) \n\\end{align*}\nwhere \n\\begin{align*}\nU&:= \\left(J_1 \\cup J_2 \\cup S_1\\cup U_1\\right) \\cap \\{1, \\ldots,k-1\\}\\\\\nP'_{S_1}&:= \\Bigl( \\bigcup _{ \\substack{{s \\in S_1}\\\\{s \\geq k}}} \\{x_s,y_s\\}\\Bigr) +I_2(X_{[r'_1,t_1])})+I_2(X_{[r_2,t_2])})+\\ldots+I_2(X_{[r_{c_1},t_{c_1}]}) \n\\end{align*}\n and $k \\leq r'_1< t_1<r_2<\\ldots<t_{c_1} \\leq n$. \n Thus, we have written $Q_1+\\left(P_1+P_2\\right)$ as the sum of two ideals of polynomials on disjoint sets of variables, namely \n $\\left(y_l\\right)_{l\\leq k-1}+ \\left(x_j\\right)_{ j \\in U}$ and \n $P'_{S_1}+P_{S_2}$. Since $P \\in \\Min \\left(Q_1+\\left(P_1+P_2\\right)\\right)$, we get that \n $$P= \\left(y_l\\right)_{l\\leq k-1}+ \\left(x_u\\right)_{U \\subset \\{1,\\ldots, k-1\\}}+L $$\n where $L \\in \\Min (P'_{S_1}+P_{S_2})$. By Lemma \\ref{lemmaP+Q}, we know that $L =\\overline{L} \\subseteq \\mathbb{K}[x_k,\\ldots,x_n]$ is a minimal prime of the binomial edge ideal of a weakly closed graph and we are done.\\\\\n \n \n \\item[] \\textbf{2nd case:} If $Q_1 =\\left(y_v \\right)_{v \\in V_1}$, then $I_2(X_{[r'_{1},t_{1}]})\\in \\left( y_l\\right)_{l \\leq t_1}$ and we get\n \\begin{align*}\n Q_1 +\\left(P_1+P_2\\right)&=\\left( y_l\\right)_{l \\leq t_1}+\\left(x_j\\right)_{j \\in J_1 \\cup J_2} &\\null&+\\Bigl( \\bigcup \\limits_{s \\in S_1}\\{x_s,y_s\\}\\Bigr) +I_2(X_{[r_{2},t_{2}]})+\\ldots +I_2(X_{[r_{c_1},t_{c_1}]})\\\\\n & &\\null &+\\Bigl( \\bigcup \\limits_{s \\in S_2}\\{x_s,y_s\\}\\Bigr) +I_2(X_{[\\tilde{r}_1,\\tilde{t}_1])})+\\ldots+I_2(X_{[\\tilde{r}_{c_2},\\tilde{t}_{c_2}]}).\n \\end{align*}\n \n where $r_2>t_1$ and again, possibly dropping some summands, we can assume that $ \\tilde{t}_1 \\geq t_1+1$.\\par \n As before, keeping in mind that $y_1, \\ldots,y_{t_1} \\in Q_1 +\\left(P_1+P_2\\right)$, we can write \n \\begin{align*}\nQ_1+\\left(P_1+P_2\\right)&= \\left(y_l\\right)_{l\\leq t_1}+ \\left(x_j\\right)_{j \\in J_1 \\cup J_2} + \\left( x_s\\right)_{ \\substack{{s \\in S_1 \\cup S_2}\\\\{s \\leq t_1}}}+\\\\\n &+\\Bigl( \\bigcup _{ \\substack{{s \\in S_1}\\\\{s \\geq t_1+1}}} \\{x_s,y_s\\}\\Bigr)+I_2(X_{[r_{2},t_{2}]})+\\ldots +I_2(X_{[r_{c_1},t_{c_1}]})+\\\\\n & +\\Bigl( \\bigcup _{ \\substack{{s \\in S_2}\\\\{s \\geq t_1+1}}} \\{x_s,y_s\\}\\Bigr)+I_2(X_{[{\\tilde{r}_1}^{\\prime},\\tilde{t}_1])})+\\ldots+I_2(X_{[\\tilde{r}_{c_2},\\tilde{t}_{c_2}]})\\\\\n &+\\left(x_u y_v\\right)_{\\substack {u \\in \\widetilde{U}_1:=\\{\\tilde{r}_1,\\ldots,t_1\\}\\\\ v \\in \\widetilde{V}_1:=\\{t_1+1,\\ldots,\\tilde{t}_1\\}}} \n\\end{align*}\nwith ${\\tilde{r}_1}^{\\prime} \\geq t_1+1$. Since $P \\in \\Min \\left(Q_1+\\left(P_1+P_2\\right)\\right)$ then $P$ must contain a minimal prime $Q_2$ of the ideal $\\left(x_u y_v\\right)_{\\substack {u \\in \\widetilde{U}_1\\\\ v \\in \\widetilde{V}_1}}$. Hence $$P \\in \\Min \\left(Q_2+ Q_1+\\left(P_1+P_2\\right) \\right).$$ Again, there are only two possisbilities for $Q_2$, namely\n \\begin{itemize}[itemsep=1pt,topsep=4pt] \n \\item[\u2022]$Q_2=\\left( x_u \\right)_{u \\in \\widetilde{U}_1}$\n \\item[\u2022]$Q_2=\\left( y_v \\right)_{v \\in \\widetilde{V}_1}.$\n\\end{itemize}\nand we can repeat the same argument as before.\\par \n\nThus, if $Q_2=\\left( x_u \\right)_{u \\in \\widetilde{U}_1}$ we have that\n\\begin{align*}\n Q_1 +\\left(P_1+P_2\\right)=\\left(y_l\\right)_{l\\leq t_1}+ \\left(x_j\\right)_{ j \\in U} +\\left( P''_{S_1}+P'_{S_2}\\right) \n\\end{align*}\nwhere \n\\begin{align*}\nU&:= \\left(J_1 \\cup J_2 \\cup S_1 \\cup S_2 \\cup \\widetilde{U}_1 \\right) \\cap \\{1, \\ldots,t_1\\}\\\\\nP''_{S_1}&:=\\Bigl( \\bigcup _{ \\substack{{s \\in S_1}\\\\{s \\geq t_1+1}}} \\{x_s,y_s\\}\\Bigr)+I_2(X_{[r_{2},t_{2}]})+\\ldots +I_2(X_{[r_{c_1},t_{c_1}]})\\\\\nP'_{S_2}&:= \\Bigl( \\bigcup _{ \\substack{{s \\in S_2}\\\\{s \\geq t_1+1}}} \\{x_s,y_s\\}\\Bigr)+I_2(X_{[{\\tilde{r}_1}^{\\prime},\\tilde{t}_1])})+\\ldots+I_2(X_{[\\tilde{r}_{c_2},\\tilde{t}_{c_2}]}).\n\\end{align*}\nSince $r_2,{\\tilde{r}_1}^{\\prime}>t_1$, we have written $ Q_2+ Q_1+\\left(P_1+P_2\\right)$ as a sum of two ideals of polynomials on disjoint sets of variables, namely \n $\\left(y_l\\right)_{l\\leq t_1}+ \\left(x_j\\right)_{ j \\in U}$ and \n $P''_{S_1}+P'_{S_2}$. Since $P \\in \\Min \\left(Q_2+Q_1+\\left(P_1+P_2\\right)\\right)$, we get that \n $$P= \\left(y_l\\right)_{l\\leq t_1}+ \\left(x_u\\right)_{U \\subset \\{1,\\ldots, t_1\\}}+L $$\n where $L \\in \\Min (P''_{S_1}+P'_{S_2})$. By Lemma \\ref{lemmaP+Q}, we know that $L =\\overline{L} \\subseteq \\mathbb{K}[x_{t_1+1},\\ldots,x_n]$ is a minimal prime of a weakly closed graph and we are done.\\\\\n\nIf instead, $Q_2=\\left( y_v \\right)_{v \\in \\widetilde{V}_1}$, we can iterate the above procedure. At each step we add new variables $Q_p$ to the generators of $P_1+P_2$ but we still have $P \\in \\Min (Q_p+\\ldots+Q_1+P_1+P_2)$. Since we have a finite number of variables, the algorithm must always terminate after a finite number of steps.\n\\end{itemize}\n\\end{proof}\n\n\n\n\n\n\n\n\n\n\nFrom Proposition \\ref{lemma2} and Lemma \\ref{lemmaP+Q}, we obtain a generalization of Lemma \\ref{step2}.\n\n\\begin{lem}\\label{corP}\nLet $P_1$ and $P_2$ be two prime ideals of the form\n\\begin{align*}\nP_1&=\\left(y_1,\\ldots,y_{a-1}\\right)+\\left(x_u\\right)_{U_1 \\subset \\{1,\\ldots, a-1\\}}+P_{S_1}+\\left( x_{b+1},\\ldots,x_n \\right)+\\left( y_v \\right)_{V_1 \\subset \\{b+1,\\ldots, n\\}}\\\\ \nP_2&=\\left(y_1,\\ldots,y_{c-1}\\right)+\\left(x_u\\right)_{U_2 \\subset \\{1,\\ldots, c-1\\}}+P_{S_2}+\\left(x_{d+1},\\ldots,x_n\\right)+\\left(y_v\\right)_{V_2 \\subset \\{d+1,\\ldots, n\\}}\n\\end{align*}\nwhere $P_{S_1}=\\overline{P_{S_1}}\\subset \\mathbb{K}[x_a,x_{a+1}, \\ldots, x_{b}]$ and $ P_{S_2}=\\overline{P_{S_2}}\\subset \\mathbb{K}[x_c,x_{c+1}, \\ldots, x_{c}]$ are minimal primes of binomial edge ideals of weakly closed graphs and each of the summands may possibly be the zero ideal. Let $P \\in \\Min \\left(P_1+P_2\\right)$. Then\n$$P= \\left(y_1,\\ldots,y_{i-1}\\right)+\\left(x_u\\right)_{U \\subset \\{1,\\ldots, i-1\\}}+L+\\left(x_{j+1},\\ldots,x_n\\right)+\\left(y_v\\right)_{V \\subset \\{j+1,\\ldots, n\\}}$$\nwith $L=\\overline{L}\\subseteq \\mathbb{K}[x_{i},y_{i},\\ldots,x_{j},y_{j}]$ and each of the three summands may possibly be the zero ideal.\n\\end{lem}\n\nFinally, putting together Lemma \\ref{step2} and Lemma \\ref{corP}, we manage to identify all the minimal primes of the ideals in $\\mathcal{C}_f$ and prove Proposition \\ref{PrIdCf}.\n\n\n\n\n\n\n\n\\begin{proof}\nWe recall that $\\mathcal{C}_f$ is constructed from $(f)$ by taking its minimal primes, their sums, their intersections, and iterating. Since by Remark \\ref{sumint}, finite sums and intersections commute, we only need to prove the result for sums of minimal prime ideals.\\par We know that the minimal primes of $(f)$ are\n$$(x_n), (f_{1,2}),(f_{2,3}),\\ldots,(f_{n-1,n}),(y_1).$$\nThese primes have the desired form.\\par \nIf we take the sum of some of these prime ideals, by Lemma \\ref{step2} the minimal primes of this sum are of the form\n$$ P=\\left( y_1,y_2, \\ldots, y_{i-1},x_{i-1} \\right)\n+P_S + \\left( y_{l+1},x_{l+1},x_{l+2}, \\ldots,x_n \\right)$$\n where $P_S= \\overline{P_S}\\subset \\mathbb{K}[x_i,x_{i+1},\\ldots,x_{l-1},x_l]$ is a minimal prime of the binomial edge ideal of a weakly closed graph.\nNote that these minimal primes satisfy the hypotheses of Lemma \\ref{corP}. Thus applying Lemma \\ref{corP} and iterating this procedure we get the thesis.\n\\end{proof}\n\n\n\n\n\n\n\n\n\n\nFrom Theorem \\ref{cf-wc}, we know that Knutson binomial edge ideals are exactly those binomial edge ideals associated with weakly closed graphs. In view of this result one might hope to find a generalization of this theorem to higher dimensions, that is a characterization of all Knutson determinantal facet ideals.\\par \nDeterminantal facet ideals of simplicial complexes are the natural extension of binomial edge ideals of graphs. In \\cite{BSV}, the authors introduce \\lq \\lq unit-interval\u201d, \\lq \\lq under-closed\u201d, and \\lq \\lq weakly-closed\u201d simplicial complexes as natural $d$-dimensional generalizations of unit-interval, interval, and weakly-closed graphs and they investigate their connections with Hamiltonian paths and determinantal facet ideals. \\par \n\nCertainly, by \\cite[Theorem 77]{BSV}, determinantal facet ideals of semi-closed simplicial complexes are Knutson ideals and we know from \\cite[Example 73]{BSV} that this result does not extend to weakly closed simplicial complexes. However, there could be other simplicial complexes whose determinantal facet ideals are Knutson ideals. \\par \nThis is a challenging question, somehow related to the problem of finding a primary decomposition of determinantal facet ideals. As we have seen, the proof of Theorem \\ref{cf-wc} makes heavy use of primary decompositions of binomial edge ideals, which are well-known. Instead, the primary decomposition of a determinantal facet ideal is still unknown in general, even if there have been some steps in this direction (\\cite{HS}, \\cite{MR}). This makes it hard to generalize the previous proofs to higher dimensions.\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\\begin{thebibliography}{20}\n\\addcontentsline{toc}{chapter}{Bibliografia} \n\\bibitem [AV]{AV} A. Almousa, K. Vandebogert, \\emph{Determinantal facet ideals for smaller minors}, Arch. Math., vol.118, 247--256, 2022.\n\\bibitem [BSV] {BSV} B. Benedetti, L. Seccia, M. Varbaro, \\emph{Hamiltonian paths, unit-interval complexes, and determinantal facet ideals}, to appear in Adv. Applied Math., 2022.\n\\bibitem[CV]{CV} A. Conca, M. Varbaro, \\emph{Squarefree Gr\\\"obner degeneration}, Invent. math., vol. 221, 713--730, 2020.\n\\bibitem[EHHM] {EHHM} V. Ene, J. Herzog, T. Hibi, F. Mohammadi. \\emph{Determinantal facet ideals}. Michigan Mathematical Journal, 62(1), 39-57, 2013.\n\\bibitem[HHHKR]{HHHKR} J. Herzog, T. Hibi, F. Hreinsdottir, T. Kahle, J. Rauh, \\emph{Binomial edge ideals and conditional independence statements}, Adv. Applied Math., vol.45, 317--333, 2010.\n\\bibitem[HS] {HS} S. Hosten, S.Sullivant \\emph{Ideals of adjacent minors}, J.Algebra, vol.277, no.2, 2004.\n\\bibitem[Kn]{Kn} A. Knutson, \\emph{Frobenius splitting, point-counting, and degeneration}, arXiv:0911.4941, 2009.\n\\bibitem [Ma]{Ma} K. Matsuda, \\emph{Weakly closed graphs and F-purity of binomial edge ideals}, Algebra Colloquium, vol. 25, 567--578, 2018.\n\\bibitem[MR] {MR} F. Mohammadi, J.Rauh, \\emph{Prime splitting of determinantal ideals},Communications in Algebra, 46:5, p. 2278-2296, 2018.\n \\bibitem[Na]{Na} H. Narasimhan, \\emph{The irreducibility of ladder determinantal varieties}, J. Algebra vol.102, 162--185, 1986.\n\\bibitem[Oh]{Oh} M. Ohtani, \\emph{Binomial Edge Ideals of Complete Multipartite Graphs}, Communications in Algebra, vol. 41, no. 10, 3858--3867, 2013.\n\\bibitem[Ra]{Ra} J. Rauh, \\emph{Generalized binomial edge ideals}, Advances in applied mathematics, vol.50, n.3 , p. 409-414, 2013.\n\\bibitem[Sc] {Sc} K. Schwede, \\emph{F-adjunction}, Algebra and Number Theory, vol. 3, no. 8,\n907\u2013950, 2009.\n\\bibitem[Se1]{Se1} L. Seccia, \\emph{Knutson ideals and determinantal ideals of Hankel matrices}, \tJournal of Pure and Applied Algebra, vol. 225, no. 12, 2021.\n\\bibitem[Se2]{Se2} L. Seccia, \\emph{Knutson ideals of generic matrices}, Proceeding of the AMS, vol. 150, no.5, 1967--1973, 2022.\n\\bibitem[Sh]{Sh} L. Sharifan, \\emph{Binomial edge ideals with special set of associated primes}, Comm. Algebra, vol. 43, no. 2, 503--520, 2015.\n\\bibitem[Vi]{Vi} R.H. Villarreal, \\emph{Cohen-macaulay graphs}, Manuscripta Math 66, 277\u2013293, 1990.\n\\end{thebibliography}", "images": []}
|
|
|
|
interleaved/c765e44b-e539-413e-a890-08f557e2ee5d.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/dcd0961b-ade9-42f3-a3e1-36d8ad815e8d.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
interleaved/df14c47a-2a4c-4414-af6d-e2fef9e76757.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "", "images": []}
|
|
|
|
interleaved/e0216c74-8b09-4cfc-bf19-7b8bdb07cb21.json
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
{"txt": "\n\n\nDeveloping a more human-like dialogue system has been an important topic in artificial intelligence, where one of the major challenges is to maintain a consistent persona~\\cite{li2016persona,qian2017assigning,mazare2018training,song2019exploiting,wolf2019transfertransfo,song2020generating,liu2020you,song2021bob}. Key-value lists are first used to construct structured profile explicitly, including name, gender, age, location, etc~\\cite{qian2017assigning,zheng2019personalized,zheng2020pre}.\nMore recently, \\citet{zhang2018personalizing} define the personality as several textual persona sentences as stated in \\figref{table:example}. As the unstructured personas are natural, vivid, and facilitate the description of complicated personalities, it sparks a wide range of interest in developing generators of personality-consistent responses~\\cite{fillwock2018identification}.\nTo enhance the understanding of predefined textual personas: \n\\citet{wolf2019transfertransfo} first employ pretrained model that leveraged the general dialogue corpus to understand textual personas better.\n\\citet{song2021bob} pretrain an encoder with non-dialogue inference data to strengthen consistency understanding. \n\\citet{xu2020neural} propose enriching predefined persona by searching related topics, and \\citet{majumder2020like} generalize predefined personas by leveraging commonsense to guess the underlying personas.\n\n\nHowever, there are several limitations for existing methods on generating responses based on textual persona sentences. \n\\textbf{First}, \nmost current methods adopt only the predefined personas for response genreation, and thus easily fail in generating the reasonable response if confronting the OOP problem.\nAs shown in \\figref{table:example}, Q2 and Q3 are two classical OOP examples, which cannot directly answer the query like ``farmily sistutation'' with the given personas. \nHowever, without more external knowledge, \nthe agent may fabricate several inappropriate personas (e.g., ``have four children'') that may be inconsistent with the prior persons, which is so-called personality inconsistent generation problem.\n\\textbf{Second}, \nalthough there exists several works on expanding predefined personas for generation, they merely focus on paraphrasing a specific predefined,\nwithout considering the consistency of the expanded persona with the given query and other predefined personas~\\cite{xu2020neural, majumder2020like}. It may extend a persona that is not suitable for response, even inconsistent with the rest of personas, and lead to contradiction problems.\n\\textbf{Third}, \nsome methods~\\cite{zheng2019personalized,zheng2020pre} simply fuse all personas into the generation process cursorily, \nwhich may lead to the output of an inappropriate response with an inconsistent persona.\nAs shown in \\figref{table:example}, \nsuch as ``I am kayaker'' may be more relevant to Q4, however the agent still graft ``I have two cats'', as ``I have pets'' is a more general persona in the whole dataset, as compared to ``I am kayaker'', which is so-called long-tail bias problem. \nUnder such circumstance, it is non-trivial to directly solve the OOP problem.\n\nIn this paper, we argue the importance of addressing the OOP problem, which may significantly improve the consistency of existing personalized dialogue systems. \nRecall the examples shown in \\figref{table:example}, for the OOP queries (e.g., Q2 and Q3), an reasonable solution is to obtain an appropriate persona from an external knowledge based on the per-defined personas.\nHowever, the generator may overlook the appropriate persona we expand (e.g., R4), so we must filter the existing textual personas before generating responses. \nTherefore, we design a pipeline that retrieves persona and selects persona for addressing the OOP problem.\nInspired by this, our research starts by asking: \\textbf{What is the principle of retrieving persona for OOP query?}\nHere, the first important issue is whether the retrieved personas are semantically consistent with the predefined ones.\nFor example, ``I don't like pets'' obviously implies ``I don't have a dog or a cat''.\nTherefore, the retrieved personas should be compared with the predefined ones for semantic conflict checking. \nThe second question is: \\textbf{How to ensure that endowing the chatbot (e.g., the generated response) with the retrieved persona?}\nGenerally, the existing generative models trend to select commonly appeared personas for generation. With the target of avoiding the general response generation, the generation model cannot use all of retrieved personas. \nInstead, we encourage the model to select the most query-relevant persona before generation, significantly improving the relevance of the generated response to the context.\n\nTherefore, this paper proposes a novel retrieval-to-prediction pipeline consisting of \\emph{PRM} and \\emph{PS-Transformer}. Specifically, \\emph{PRM} is designed as a ranking module that extends personas by retrieving from a global persona set\\footnote{In this paper we simply take all personas from the test set of ConvAI2 as our global personas.}.\nIn particular, we leverage Natural Language Inference (NLI) to select personas that do not conflict with predefined personas. \n\\emph{PS-Transformer} adopts \\emph{Target-Guided Persona Scorer} to predict the availabilities of each persona to the query by posterior information. Incorporated with such a persona distribution, our proposed model is able to select the most suitable persona to generate responses.\nWe build a challenging set named \\underline{I}nadequate-\\underline{T}iny-\\underline{ConvAI2} (IT-ConvAI2) by removing those query-related personas from the original ConvAI2 dataset.\nIn this way, we verify that the \\emph{PRM} could steadily extend a suitable new persona to tackle the OOP problem\nand facilitate \\emph{PS-Transformer} to generate personality-consistent responses.\nOn both IT-ConvAI2 and ConvAI2, \nwe demonstrate that our method directly improves the coherence of generation at the personality level. \n\nThe main contributions of this research are summarized:\n\n\\textbf{First}, we propose a novel framework solving the OOP problem in dialogue generation. This framework involves two processes, i.e., conflict-detecting persona retrieving and dialogue generation with selected personas.\n\n\\textbf{Second}, we are the first to leverage NLI to estimate the coherence from persona candidates to predefined personas. Extensive experiments demonstrate that our proposed \\emph{PRM} can gather better personas than others.\n\n\\textbf{Third}, we propose a novel \\emph{PS-Transformer} introducing the \\emph{Target-Guided Persona Scorer} to predict persona distributions instead of fusing them roughly. The \\emph{PS-Transformer} yields the best results on both IT-ConvAI2 and ConvAI2.\n\\subsection{Personalized Dialogue}\n\nAlthough neural response generation models have achieved promising results \\cite{vinyals2015neural, shang2015neural, hochreiter1997long, zhang2020dialogpt, zhao2017learning, target2021wei, emotion2019wei}, they are still unsatisfactory. \nPrevious work~\\cite{eggins2005analysing} investigated that topic changing will significantly satisfy conversational participants. Furthermore, \\citet{mitsuda2019information} proposed that $78.5$\\\nAfter this, \\citet{qian2017assigning} proposed WD Profile Dataset, and \\citet{zhang2018personalizing} proposed ConvAI2. Such personalized dialogue contributed to the development of both retrieval-based and generative-based personalized dialogue models.\n\nIn the line of retrieval-based methods \\cite{zhang2018personalizing,gu2019dually,gu2021partner}, \\citet{gu2021partner} found it is helpful to utilize personas in response selection. \nAlthough our proposed \\emph{PRM} retrieves personas, our pipeline method does not belong to the retrieval-based methods. Because our method does not directly take the retrieval results as responses, but uses them as the basis for generating, which facilitates the generation of informative and consistent responses.\n\nIn the line of generative-based methods, \n\\citet{li2016persona} first took user embedding as an implicit persona in multi-turn dialogues. However, it relied on expensive speak-tagged dialogue data.\nRecent works incorporated explicit persona into the generation in two ways:\n(1) \\citet{qian2017assigning} and \\citet{zheng2020pre} defined personality as structured key-value profiles consisting of some basic personal information such as name, age, and location. \n(2) \\citet{zhang2018personalizing} contributed a chat-oriented dataset, taking personality as a predefined collection of textually described persona sentences. \nMost of the persona dialogue methods\\cite{zhang2018personalizing, wolf2019transfertransfo, song2019exploiting, song2020generating, yavuz2019deepcopy, song2021bob} focused on how to understand personas better in the latter high-quality corpus.\nSpecifically, \n\\citet{zhang2018personalizing} employed basic Seq2Seq splicing personas with the query without distinguishing them. \n\\citet{wolf2019transfertransfo} first introduced transfer learning by fine-tuning pretrained model to improve the quality of generation.\nHowever, all methods above take the agent's personality as a predefined closed set. Once the query goes beyond predefined personas (OOP problem), the agent tends to fabricate a new persona, resulting in a risk of inconsistent personality. To tackle the problem, we propose our retrieval-to-prediction pipeline that extends persona before generation.\n\n\n\n\n\n\\subsection{Natural Language Inference}\nThe task of Natural Language Inference (NLI) is to learn a function $f_{\\mathrm{NLI}}(p,h)=\\{\\mathrm{E}, \\mathrm{N}, \\mathrm{C}\\}$, where $p$ and $h$ denote premise and hypothesis respectively. The outputs E, N and C represent the conjunction, neural and contradiction relations between premises and hypotheses.\nSince the release of the large-scale corpus SNLI \\cite{bowman2015large}, deep neural network approaches have made promising progress\\cite{chen2017enhanced,gong2018natural,kim2019semantic}.\n\\citet{welleck2019dialogue} modeled the detection of conversational consistency as an NLI task and proposed the Dialogue NLI dataset.\nAnd \\citet{song2020generating} adopted the RL framework to leverage NLI knowledge as a reward. \n\\citet{song2021bob} further pretrained on NLI task to ensure generating responses that entail predefined personas.\n\nMotivated by this, we argue that NLI is crucial for personal retrieval to identify the relevances between persona candidates and predefined personas.\nSo we consider the entail and conflict with the predefined personas in the NLI perspective when \\emph{PRM} retrieves the persona, thus providing suitable persona for the generative model.\n\n\n\n\n\n\\subsection{Knowledge Enhanced Dialogue}\nThe incorporation of knowledge has been shown to be an effective way to improve the performance of dialogue generation. There is a trend to leverage many domain-specific knowledge bases to ground neural models \\cite{xu2017incorporating,zhu2017flexible,gu2016incorporating,zou2021multi,zhao22multiview,pan2021context}, in which the textual persona sentences are one of the most frequently considered knowledge \\cite{lian2019learning}.\nRecently, \\citet{lian2019learning} propose that compared with the knowledge posterior distribution that further considers the actual knowledge used in real responses, the prior distribution has a large variance, and therefore, it is difficult for existing models to simply select the appropriate knowledge based on the prior distribution during training. On this basis, \\citet{song2019exploiting} and \\citet{gu2019dialogwae} use posterior distributions effectively to ensures that knowledge is better utilized in generating responses.\n\nWe borrow the idea that leverages posterior distribution to select the appropriate knowledge with several differences in motivation and methodology: (1) We use the posterior distribution to select the actual personas rather than traditional knowledge in the grounded response. (2) Compared to fusing all personas into one representation~\\cite{gu2019dialogwae}, we consider the modeling of persona selection distribution.\n\n\n\n\n\n\n\n\n\n\n\n\\label{sec:method}\n\n\\subsection{Task Definition}\\label{sec:taskdef}\n\nIn this paper, our personalized dialogue generation problem aims at endowing a dialogue system with a consistent personality for building a human-like conversation sysmte, which can be formally defined as follows, \ngiven a query $\\mathcal{Q} = \\{q_i\\}_{i=1}^{m}$ and a set of predefined personas $\\mathcal{P} = \\{p_1, p_2, \\dots, p_{n}\\}$, where each persona depicted with a sentence $p_i = \\{w_j\\}_{j=1}^{m} (i \\in \\{1, 2, \\dots, n\\})$, the task aims to generate a response $\\mathcal{R} = \\{r_i\\}_{i=1}^m$ coherent to both the query and agent's personas.\n\nAs stated in \\figref{fig:case}, assuming a global persona collection $\\mathcal{P}_{Global}$ for all agents, personas belonging to a specific agent could be declared as $\\mathcal{P}_{Agent}\\ (\\mathcal{P}_{Agent} \\subsetneqq \\mathcal{P}_{Global})$. Usually, we handly predefine a persona set $\\mathcal{P}\\ (\\mathcal{P} \\subsetneqq \\mathcal{P}_{Agent})$ for the agent.\nOn this assumption, we divide queries into A-type and B-type. \nA-type queries can be answered based on predefined $p_{\\mathcal{Q}}\\ (p_\\mathcal{Q} \\in \\mathcal{P})$, while B-type queries need us to detect a new persona $p_g\\ (p_g \\in \\mathcal{P}_{Agent},\\ p_g \\notin \\mathcal{P})$ to tackle.\n\nTo simplify the problem, we assume that global persona set $P_{Global}$ must contain at least one persona related to the query. So this paper focuses on retrieving a suitable persona $p_g$ from $P_{Global}$ and generating responses coherent with extended personas. The personalized dialogue generation can be briefly stated below:\n\\begin{equation}\n\\begin{aligned}\n& \\mathbf{P} (\\mathcal{R}|\\mathcal{Q}, \\mathcal{P}, \\mathcal{P}_{Global}) \\\\\n= & \\mathbf{P} (\\mathcal{R}|\\mathcal{Q}, \\mathcal{P} \\cup \\{p_g\\}) \\cdot \\mathbf{P}(p_g | \\mathcal{Q}, \\mathcal{P}, \\mathcal{P}_{Global}), \\\\\n\\end{aligned}\n\\end{equation}\nwhere $\\mathbf{P}(p_g | \\mathcal{Q}, \\mathcal{P}, \\mathcal{P}_{Global})$ denotes detecting a new persona $p_g$ from $\\mathcal{P}_{Global}$ considering current query $\\mathcal{Q}$ and predefined personas $\\mathcal{P}$.\nAnd $\\mathbf{P}(\\mathcal{R}|\\mathcal{Q}, \\mathcal{P} \\cup \\{p_g\\}) = \\prod_{t=1}^{n_r} \\mathbf{P}(r_t|\\mathcal{Q}, \\mathcal{P} \\cup \\{p_g\\}, r_{<t})$ represents the response generation based on both the context query $\\mathcal{Q}$ and extended personas $\\mathcal{P}\\cup \\{p_g\\}$.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\\subsection{Overview}\\label{sec:overview}\n\nAs stated in \\figref{fig:overview}, we designed a retrieval-to-prediction pipeline that combines persona extending and response generation. The pipeline consists of two stages: \n\\emph{PRM} retrieves a persona from the global persona set based on the predefined personas and the context query.\nThen \\emph{PS-Transformer} generates a response with the query and the extended personas. The details of our method will be explained in Section~\\ref{sec:retriever} and Section~\\ref{sec:generator}, respectively.\n\n\n\n\n\\subsection{Persona Retrieval Model}\\label{sec:retriever}\n\n\nThe \\emph{Persona Retrieval Model (PRM)} is responsible for addressing $\\bm{P}(p_g | \\mathcal{Q}, \\mathcal{P}, \\mathcal{P}_{Global})$, i.e. ranking all the candidate personas and picking the mostly one for the agent. \nFirstly we handly prepare a collection of persona candidates $\\mathcal{P}_{Global}$\\footnote{\n To avoid the label leaking, \n we make sure that all candidates did not be used for training our sentence-pair model and NLI model.}\nfrom the ConvAI2 dataset. As stated in \\figref{fig:overview}, the \\emph{PRM} ranks all the candidates based on both its relanvences to query and predefined personas.\n\nWe employ a sentence-pair matching model to estimate the logical association between query and persona candidate. The score predicted by the binary classification model is the query-persona relevance, as stated in Equation \\ref{eq:relevance}.\n\\begin{equation}\\label{eq:relevance}\n\\begin{aligned}\nr & = \\mathrm{Related}(\\mathcal{Q}, p_g)\n\\end{aligned}\n\\end{equation}\n\n\\citet{song2020generating} finds that NLI models could be used to calculate the coherence between response and query. Inspired by their work, We empirically adopt a standard pretrained NLI model \\cite{gao2021adapting} to check textual entailment and conflict between the persona candidate and predefined personas.\nWe apply the maximum algorithm to encourage the persona candidate $p_g$ closing to predefined personas $\\mathcal{P}$, as shown in Equation~\\ref{eq:entail}.\n\\begin{equation}\\label{eq:entail}\n\\begin{aligned}\ne & = \\mathrm{Entail}(\\mathcal{P}, p_g) \\\\\n& = \\max \\limits_{p_i \\in \\mathcal{P}}\\{\\mathrm{Entail}(p_i, p_g)\\}\n\\end{aligned}\n\\end{equation}\nwhere $\\mathrm{Entail}(\\cdot,\\cdot)$ is the entailment score predicted by our model. \nThe persona candidate $p_g$ should be punished if it conflicts with any predefined persona of $\\mathcal{P}$, so we also apply the maximum to calculate the conflict score for persona candidate $p_g$ in Equation~\\ref{eq:conflict}.\n\\begin{equation}\\label{eq:conflict}\n\\begin{aligned}\nc & = \\mathrm{Conflict}(\\mathcal{P}, p_g) \\\\\n& = \\max \\limits_{p_i \\in \\mathcal{P}}\\{\\mathrm{Conflict}(p_i, p_g)\\}\n\\end{aligned}\n\\end{equation}\nwhere $\\mathrm{Conflict}(\\cdot,\\cdot)$ is the conflict score given by our model. \n\nWe propose two approaches to combine scores $r,e,c$ as our ranking methods:\n\n\\begin{enumerate}\n\n\\item \\textbf{Heuristic Rules} (NLI\\textsubscript{HR}): We first retrieve top-10 candidates from $\\mathcal{P}_{Global}$ with the highest $r$ score, and these persona candidates should be most relevant to the query $\\mathcal{Q}$. Then, we take the persona with the highest $e$ score from the top-3 lowest $c$ scores to\nencourage both its low conflict and strong entailment to predefined personas.\n\n\\item \\textbf{Weight Combination} (NLI\\textsubscript{WC}): We adopt three regulator $\\alpha$, $\\beta$, and $\\gamma$ to construct a combined score $S=\\alpha\\cdot r + \\beta\\cdot (1-c) + \\gamma\\cdot e$. Then we sort the candidates with $S$ scores in descending order and take the first one as a result. In this paper we set $\\alpha =0.75, \\beta=0.25, \\gamma=0.10$.\n\n\\end{enumerate}\n\n\n\n\n\n\n\n\n\\subsection{Posterior-scored Transformer}\\label{sec:generator}\n\nThe dialogue generator we proposed is a transformer-based model stated in \\figref{fig:generator}. Following the champion model in the ConvAI2 competition~\\cite{dinan2020second}, we adopt OpenAI GPT~\\cite{radford2018improving} as our weight-shared encoder $\\mathrm{Encoder_{GPT}}(\\cdot)$ and decoder $\\mathrm{Decoder_{GPT}}(\\cdot)$. \n\n\\subsubsection{\\textbf{Target-Guided Persona Scorer}}\\label{sec:tgp}\n\nLet $\\mathcal{Q}, p_i, \\mathcal{G}$ denote query, $\\mathrm{i^{th}}$-persona, and ground truth (also known as target response), respectively. As stated in Equation~\\ref{eq:encode}, we first adopt $\\mathrm{Encoder_{GPT}(\\cdot)}$ to turn the token-level embeddings into fixed-length representations at timestamp $t$.\n\\begin{equation}\\label{eq:encode}\n\\begin{aligned}\n\\mathbf{E}_{\\mathcal{Q}}, \\mathbf{H}_{\\mathcal{Q}}^{t} &= \\mathrm{Encoder_{GPT}}(\\mathcal{Q}) \\\\\n\\mathbf{E}_{\\mathcal{G}}, \\mathbf{H}_{\\mathcal{G}}^{t} &= \\mathrm{Encoder_{GPT}}(\\mathcal{G}) \\\\\n\\mathbf{E}_{p_i}, \\mathbf{H}_{p_i}^{t} &= \\mathrm{Encoder_{GPT}}(p_i), i = 1,2,...,n_p \n\\end{aligned}\n\\end{equation}\nwhere $\\mathbf{E}_*$ represents time-independent sentence embeddings of each input after self-attention only. $\\mathbf{H}_*^t$ denotes the hidden states of each input after interacting with generated $\\mathcal{G}_{<t}$.\n\nThe multi-head self-attention (denoted as $\\mathrm{MHA}$, \\cite{vaswani2017attention}) is used to compute\n\nthe importance from $\\mathrm{i^{th}}$-persona to either query $\\mathcal{Q}$ or the ground truth $\\mathcal{G}$.\nFor each persona $p_i$ we calculate the attention $\\mathbf{A}_i^*$ in Equation~\\ref{eq:attn}.\n\n\\begin{equation}\\label{eq:attn}\n\\begin{aligned}\n\\mathbf{A}_i^{pri} &= \\mathrm{MHA}_{pri}(\\mathbf{Q}=\\mathbf{E}_{p_i}, \\mathbf{K}=\\mathbf{E}_{\\mathcal{Q}}, \\mathbf{V}=\\mathbf{E}_{\\mathcal{Q}})\\\\\n\\mathbf{A}_i^{post} &= \\mathrm{MHA}_{post}(\\mathbf{Q}=\\mathbf{E}_{p_i}, \\mathbf{K}=\\mathbf{E}_{\\mathcal{G}}, \\mathbf{V}=\\mathbf{E}_{\\mathcal{G}})\n\\end{aligned}\n\\end{equation}\n\nSince attention $\\mathbf{A}_i^*$ denotes the importance of each persona to the response. A two-layer multilayer feedforward perceptron (MLP) with a sigmoid activation is used to turn them into a comprehensed weight as stated in Equation~\\ref{eq:mlp}.\n\\begin{equation}\\label{eq:mlp}\n\\begin{aligned}\nw_i^{*} = \\sigma (\\mathrm{MLP}(\\mathbf{A}_i^*)),\\ (* = post\\ \\mathrm{or}\\ pri)\n\\end{aligned}\n\\end{equation}\n\nThe binary cross entropy (BCE) loss is adopted to optimize the capture of weight $w_i^{post}$:\n\\begin{equation}\\label{loss:bce}\n\\begin{aligned}\n\\mathcal{L}_1 = -[w_i \\log w_i^{post} + (1-w_i) \\log (1-w_i^{post})]\n\\end{aligned}\n\\end{equation}\n\nBesides, the cosine embedding loss is used to gain both attentions from prior and posterior network as stated in Equation~\\ref{loss:cos}.\n\\begin{equation}\\label{loss:cos}\n\\begin{aligned}\n\\mathcal{L}_2=1-\\mathrm{cos}(\\mathbf{A}_i^{post},\\mathbf{A}_i^{pri})\n\\end{aligned}\n\\end{equation}\n\n\n\\subsubsection{\\textbf{Decoder for Weighted-sum Attentions}}\\label{sec:decode}\n\nFirstly, the representation $\\mathbf{H}_{\\mathcal{P}}$ for the predefined persona set $\\mathcal{P}$ could be incorporated from $\\mathbf{H}_{p_i}$ in Equation~\\ref{eq:encode} based on $w_i^{post}$ given by Equation~\\ref{eq:mlp}.\n\n\\begin{equation}\\label{eq:aggp}\n\\begin{aligned}\n\\mathbf{H}_{\\mathcal{P}}^{t} = \\sum_{i=1}^{n_p} w_i^{post} \\cdot \\mathbf{H}_{p_i}^{t}\n\\end{aligned}\n\\end{equation}\n\nTo give consideration to both query and the past generated words,\nIn each timestamp $t$ of decoding, representations of query, personas and past generated words are treated equally. The prediction of word $r_t$ is stated in Equation~\\ref{eq:decode}.\n\\begin{equation}\\label{eq:decode}\n\\begin{aligned}\n\\mathbf{H}_{dec}^t &= \\mathrm{mean}(\\mathbf{H}_\\mathcal{Q}^{t}, \\mathbf{H}_\\mathcal{P}^{t}, \\mathbf{H}_{\\mathcal{G}}^{t}) \\\\\nr_t &= \\mathrm{Decoder_{GPT}}(\\mathbf{H}_{dec}^t)\n\\end{aligned}\n\\end{equation}\nwhere $\\mathrm{mean(\\cdot)}$ denotes averaging given matrices by element.\n\nIn essence, the \\emph{PS-Transformer} read the persona set $\\mathcal{P}$ and the query ${\\mathcal{Q}}$ to predict the target response $\\mathcal{G}$. So we apply the negative log-lokeihood loss during training.\n\\begin{equation}\n\\begin{aligned}\n\\mathcal{L}_3 &=-\\log \\left(\\bm{P}(\\mathcal{G}|\\mathcal{P}, \\mathcal{Q})\\right) \\\\\n&=-\\sum_{t=1}^{|\\mathcal{G}|} \\log (\\bm{P}(r_{t}|\\mathcal{P}, \\mathcal{Q}, \\mathcal{G}_{<t}))\n\\end{aligned}\n\\end{equation}\n\n\n\\subsubsection{\\textbf{Inferrence Network}}\\label{sec:infer}\nSimilar to Equation~\\ref{eq:aggp}, the predefined personas are soft-selected by weighted summation based on the $w_i^{pri}$ predicted in Equation~\\ref{eq:mlp}:\n\\begin{equation}\n\\begin{aligned}\n\\mathbf{H}_{\\mathcal{P}}^{t} = \\sum_{i=1}^{n_p} w_i^{pri} \\cdot \\mathbf{H}_{p_i}^{t}\n\\end{aligned}\n\\end{equation}\n\nDuring decoding, the response is generated in a self-recursion way as stated in Equation~\\ref{eq:infer}.\n\\begin{equation}\\label{eq:infer}\n\\begin{aligned}\n\\mathbf{H}_{dec}^t &= \\mathrm{mean}(\\mathbf{H}_\\mathcal{Q}^{t}, \\mathbf{H}_\\mathcal{P}^{t}, \\mathbf{H}_{\\mathcal{R}}^{t}) \\\\\nr_t &= \\mathrm{Decoder_{GPT}}(\\mathbf{H}_{dec}^t)\n\\end{aligned}\n\\end{equation}\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\\begin{table*}[!th]\n \\centering\n \\caption{Automatic evaluation on IT-ConvAI2. In this evaluation, we adopt NLI\\textsubscript{WC} in Section~\\ref{sec:retriever} as the \\emph{PRM}.}\n \\begin{tabular}{cccccccccccccc}\n \\toprule\n \\multirow{3}[6]{*}{\\textbf{Model}} & \\multirow{3}[6]{*}{\\textbf{Pretrained}} & & \\multicolumn{5}{c}{IT-ConvAI2} & & \\multicolumn{5}{c}{IT-ConvAI2 with \\emph{PRM}} \\\\\n\\cmidrule{4-8}\\cmidrule{10-14} & & & \\textbf{Consist} & & \\multicolumn{3}{c}{\\textbf{Quality}} & & \\textbf{Consist} & & \\multicolumn{3}{c}{\\textbf{Quality}} \\\\\n\\cmidrule{4-4}\\cmidrule{6-8}\\cmidrule{10-10}\\cmidrule{12-14} & & & Entail & & BLEU & ROUGE & CIDEr & & Entail & & BLEU & ROUGE & CIDEr \\\\\n \\midrule\n Seq2Seq & \\xmark & & 0.115 & & 5.62 & 1.71 & 8.77 & & 0.178 & & 5.69 & 1.71 & 9.06 \\\\\n PerCVAE & \\xmark & & 0.306 & & 2.26 & 0.93 & 4.46 & & 0.380 & & 2.27 & 0.96 & 4.22 \\\\\n DialogWAE & \\xmark & & 0.077 & & 4.13 & 1.12 & 5.81 & & 0.103 & & 3.84 & 1.09 & 5.27 \\\\\n \\midrule\n Transformer & \\cmark & & 0.539 & & 6.21 & 1.55 & 10.56 & & 0.495 & & 6.17 & 1.52 & 11.11 \\\\\n TransferTransfo & \\cmark & & 0.546 & & 5.12 & 1.34 & 13.23 & & 0.645 & & 5.18 & 1.36 & 12.85 \\\\\n BoB & \\cmark & & 0.505 & & 5.39 & 1.43 & 11.39 & & 0.628 & & 5.35 & 1.40 & 10.74 \\\\\n \\midrule\n \\textbf{PS-Transformer } & \\cmark & & \\textbf{0.560} & & \\textbf{7.12} & \\textbf{1.71} & \\textbf{14.43} & & \\textbf{0.670} & & \\textbf{7.35} & \\textbf{1.73} & \\textbf{15.88} \\\\\n \\bottomrule\n \\end{tabular}\n \\label{tab:result_inde}\n\\end{table*}\n\\subsection{Research Questions} \\label{sec:rq}\nTo fully demonstrate the superiority of our method, we conduct experiments to verify the following six research questions (RQ): \n\\begin{itemize}\n \\item (\\textbf{RQ1}): Can our proposed pipeline, consisting of \\emph{PRM} and \\emph{PS-Transformer}, yield good results on automatic metrics in response to OOP queries? (See Section~\\ref{sec:rq1})\n \\item (\\textbf{RQ2}): Can our proposed \\emph{PRM} actually solve the OOP problem to some extent? Will the quality of the response generated by \\emph{PS-Transformer} be better if we solved the OOP problem better? (See Section~\\ref{sec:rq2})\n \\item (\\textbf{RQ3}): Can our proposed \\emph{PS-Transformer} more accurately select the personality used to generate the response, compared to other baselines? (See Section~\\ref{sec:rq3})\n \\item (\\textbf{RQ4}): What is the impact of the key components in the \\emph{PS-Transformer} on performance? (See Section~\\ref{sec:rq4})\n \\item (\\textbf{RQ5}): Can \\emph{PS-Transformer}'s performance on IT-ConvAI2 be generalized to the original ConvAI2? (See Section~\\ref{sec:rq5})\n \\item (\\textbf{RQ6}): How does our response method differ from baselines? (See Section~\\ref{sec:rq6})\n\\end{itemize}\n\n\n\n\n\\subsection{Datasets}\\label{sec:dataset}\n\n\\textbf{ConvAI2}\\footnote{ConvAI2 is available at \\url{https://github.com/facebookresearch/ParlAI/tree/master/parlai/tasks/convai2}.} It is published for the second Conversational Intelligence Challenge~\\cite{dinan2020second}, and both speakers of each conversation consist of at least five persona descriptions. The dataset contains 17,878/1,000 multi-turn dialogues conditioned on 1,155/100 personas for train/test. \n\n\\textbf{Inadequate-Tiny-ConvAI2 (IT-ConvAI2)} \nSince ConvAI2 encourages conversation participants to exchange their persona information, speakers tend to express their personas actively without being asked, resulting in fewer OOP queries than in actual practice. To obtain a realistic evaluation of persona-missing conversation, we build IT-ConvAI2 in two steps: \n(1) We first extract queries asking for persona and responses related to personas, respectively. If the extracted query and response are present in a conversation triad $(query,response,personas)$, we will collect them into Tiny-ConvAI2.\n(2) To build IT-ConvAI2, for each conversation in Tiny-ConvAI2, those personas involved in response will be removed.\nAs a result, we manually collect 1,595 conversations as IT-ConvAI2.\n\n\n\n\n\\subsection{Baseline Methods}\n\nWe compared our proposed approach with the following strong models:\n\n\\begin{itemize}\n \\item Generative Based: \n \\textbf{Seq2Seq}~\\cite{zhang2018personalizing} is a traditional LSTM-based encoder-decoder model prepending all personas to the query. \n \\textbf{PerCVAE}~\\cite{song2019exploiting} further incorporates personas with contexts by a memory network.\n \\textbf{DialogWAE}~\\cite{gu2019dialogwae} contains a conditional Wasserstein Auto-Encoder, and we adapt it to personalized dialogue generation by concatenating personas with the query directly.\n \\item Pre-training \\& Fine-tuning Based: \n \\textbf{Transformer}~\\cite{dinan2020second} achieves state-of-the-art performance in the manual metrics of the ConvAI2 competition while \\textbf{TransferTransfo}~\\cite{wolf2019transfertransfo} tops automatic evaluations. \n \\textbf{BERT-Over-BERT (BoB)}~\\cite{song2021bob} contains two decoders pretrained on NLI task. It is good at generating responses entailed with personas.\n\\end{itemize}\n\n\n\n\n\\subsection{Evaluation Metrics} \\label{sec:metrics}\n\n\\subsubsection{Automatic Evaluation.} \nTo highlight the quality of generation on both personality and contextual aspects, \nwe evaluate each response with two aspects:\n\\begin{itemize}\n\\item \\textbf{Consistency}: \nFollowing \\citet{dziri2019evaluating}, we employ ESIM\\footnote{We use an NLI model different from the one in \\emph{PRM} for a fair evaluation.}~\\cite{chen2017enhanced} to automatically evaluate the \\textbf{entailment score} between the generated response $\\mathcal{R}$ and the agent's personas $\\mathcal{P} = \\{p_1,p_2,...,p_n\\}$:\n\\begin{equation}\n\\begin{aligned}\ne' & = \\mathrm{Entail}'(\\mathcal{P}, \\mathcal{R}) \\\\\n& = \\max\\limits_{p_i \\in \\mathcal{P}}\\{\\mathrm{Entail}'(p_i, \\mathcal{R})\\}\n\\end{aligned}\n\\end{equation}\n\\item \\textbf{Quality}: \n\\textbf{BLEU}~\\cite{papineni2002bleu} and \\textbf{ROUGE}~\\cite{lin2004rouge} are used to measure the relevance between the ground truth and generated response. We also employ \\textbf{CIDEr}~\\cite{vedantam2015cider} to capture the overlap of persona information between the machine response and human reference.\n\\end{itemize}\n\n\\subsubsection{Human Evaluation for PRM} \nThree masters students in the field of dialogue were asked to evaluate per \\emph{PRM} according to three metrics:\n\\begin{itemize}\n\\item \\textbf{Query-relevance ${S_{q}^p}$ (0-1)}: To indicate if the retrieved persona is related to the query based on 1/0 scoring schema.\n\\item \\textbf{Persona-entailment ${S_{p}^p}$ (0-2)}: Scoring how the retrieved persona entails with the query. 0 means conflict, 1 means neutral and 2 means entailment.\n\\item \\textbf{DCG@3}: We collect the top three retrieved results for each method and calculate the DCG@3 in Equation~\\ref{eq:dcg}.\n\\begin{equation}\\label{eq:dcg}\n\\mathrm{DCG}_{3}=\\sum_{i=1}^{3} \\frac{2^{rel_{i}}-1}{\\log _{2}(i+1)}\n\\end{equation}\nwhere $rel_{i} = \\bm{S_{p}^p}$ if the retrieved persona is related to the query, otherwise $rel_{i} = 0$.\n\\end{itemize}\n\n\\subsubsection{Human Evaluation for PS-Transformer.} \nThree judges are asked to evaluate Query-relevance $S_{q}$ (1-3), Persona-entailment $S_{p}$ (1-3) and Response-fluency $S_{r}$ (1-3) of generated responses:\n\\begin{itemize} \n \\item For \\textbf{Query-relevance $S_{q}$}, 1 point means that the response is irrelevant with the query. 2 point means that the response is relevant with query, but is the general response. 3 means that the response perfectly answers the query. \n \\item \\textbf{Persona-entailment $S_{p}$} measures whether the response is entailed with predefined personas. 1 means the response doesn't contain any persona. 2 means the response contains persona but not in predefined persona set. 3 means the response contains predefined persona. \n \\item \\textbf{Response-fluency $S_{r}$} is used to evaluate the syntactic and logical fluency of the response. The higher the score, the better the performance. 3 point means that the response is both grammatically and logically correct.\n\\end{itemize}\n\n\n\\subsection{Implementation Details}\\label{sec:training}\n\\begin{itemize}\n\\item The sentence-pair classifier and the NLI scorer of \\emph{PRM} are both BERT-based models. We manually annotate one thousand related $(query, persona)$ pairs for training the sentence pair classifier. NLI scorer is pretrained on both SNLI\\footnote{The SNLI is available at \\url{https://nlp.stanford.edu/projects/snli}.} and MultiNLI\\footnote{The MultiNLI is available at \\url{https://cims.nyu.edu/~sbowman/multinli}.}\\cite{williams2017broad}, then is finetuned on DNLI\\footnote{The DNLI is available at \\url{https://wellecks.github.io/dialogue_nli}.}.\n\n\\item To train the \\emph{Target-Guided Persona Scorer}, we follow \\citet{song2019exploiting} labelling each response with its corresponding persona by inverse document frequency. The response has a tf-idf similarity with each persona, and we label each $(response, persona)$ pair with 1/0 according to whether the similarity is higher than a threshold.\n\n\\item We employ OpenAI's GPT~\\cite{radford2018improving} to initialize \\emph{Transformer}, \\emph{TransferTransfo} and our \\emph{PS-Transformer}. \n\\end{itemize}\n\\label{sec:result}\n\n\\subsection{Automatic Evaluations (RQ1)}\\label{sec:rq1}\n\nAs shown in \\tabref{tab:result_inde}, we evaluate all the methods on \\emph{IT-ConvAI2}, and \\emph{IT-ConvAI2 with PRM}, respectively, and we have the following observations:\n\n\\textbf{Our pipeline method has the best overall performance in response to OOP queries.} \\emph{PS-Transformer} outperforms all baselines regardless of whether our proposed \\emph{PRM} is applied to baselines or not. In particular, for \\emph{PS-Transformer}, the personality of \\emph{PRM} retrieval brings a significant improvement to the entailment of the response. This result shows that in the view of the generative model, the \\emph{PRM} retrieval results are very suitable for responding to OOP queries.\n\n\\textbf{The \\emph{Persona Retrieval Model (PRM)} helps almost all methods generate personality-consistent responses.} \nAll methods except \\emph{Transformer} reach a higher entailment score after being given a new persona by \\emph{PRM}.\nThus, we can generalize a general conclusion that retrieving a suitable persona using our proposed \\emph{PRM} in response to an OOP query can help the vast majority of generators to produce a personality-consistent response. It also helps to reduce the risk of fabricating a random persona and generating personality-conflicting responses. In addition, only \\emph{PS-Transformer}, when combined with \\emph{PRM}, shows a significant improvement not only in entailment but also in response generation quality, which implies that \\emph{PS-Transformer} is better than baselines for selection and utilization of personality information.\n\n\n\n\n\n\\subsection{Human Evaluations for \\emph{PRM}s (RQ2)}\\label{sec:rq2}\n\n\n\\begin{table}[!t]\n \\centering\n\\captionsetup{justification=justified}\n\\caption{The left shows human evaluation of \\emph{PRM}s. The Fleiss\u00b4kappa values of $S_q^p, S_p^p$, DCG@3 are 0.62, 0.49, and 0.57, respectively, indicating \\emph{Substantial}, \\emph{Moderate}, and \\emph{Moderate agreement}. The maximum value of the $S_q^p, S_p^p$ are 1, 2, respectively. The right shows automatic evaluations for generated responses based on different \\emph{PRM}s.}\n \\setlength{\\tabcolsep}{4pt}{\n \n \\begin{tabular}{cccccccc}\n \\toprule\n \\multirow{2}[4]{*}{\\textbf{Model}} & \\multicolumn{3}{c}{\\textbf{Retrieval Quality}} & & \\multicolumn{3}{c}{\\textbf{Response Quality}} \\\\\n\\cmidrule{2-4}\\cmidrule{6-8} & $S_q^p$ & $S_p^p$ & DCG@3 & & Entail & Conflict & BLEU \\\\\n \\midrule\n BM25 & 0.35 & 0.86 & 0.77 & & 0.592 & 0.237 & 5.65 \\\\\n \\midrule\n Classify\\textsubscript{SP} & 0.56 & 0.87 & 1.01 & & 0.650 & 0.304 & 7.16 \\\\\n \\midrule\n NLI\\textsubscript{HR} & 0.55 & 0.90 & 1.15 & & 0.643 & 0.241 & 7.35 \\\\\n \\textbf{NLI\\textsubscript{WC}} & \\textbf{0.59} & \\textbf{0.97} & \\textbf{1.33} & & \\textbf{0.670} & \\textbf{0.214} & \\textbf{7.35} \\\\\n \\bottomrule\n \\end{tabular}\n }\n \\label{tab:retriever}\n\\end{table}\n\n\n\nTo demonstrate the significance of our retrieval methods in Section~\\ref{sec:retriever}, we prepare some \\emph{PRM}s based on other retrieval methods for an ablation-like experiment:\n(1) \\textbf{BM25} algorithm is used to retrieve the persona most similar to the query in lexical.\n(2) \\textbf{Sentence-pair Classifier (Classify\\textsubscript{SP})} only adopts the $r$ score in Equation~\\ref{eq:relevance} to rank persona candidates.\nWe employ judges to evaluate a random sample of 150 items per \\emph{PRM} according to metrics mentioned in Section~\\ref{sec:metrics}.\nWe further adopt \\emph{PS-Transformer} to generate responses based on extended personas from different \\emph{Persona Retrieval Model}s, and we apply automatic evaluations on those responses.\n\n\\newcommand{\\myprm}{\\emph{NLI\\textsubscript{WC}}}\n\n\\textbf{Our proposed \\emph{PRM} can effectively solve the OOP problem, and the NLI contributes to improving retrieval performance.}\nAs retrieval quality is shown in \\tabref{tab:retriever}, 59\\\nSpecifically, {\\myprm} significantly outperforms \\emph{BM25} in terms of $S^p_q$ score, indicating that it is effective to consider the semantic relevance between query and retrieved persona. In addition, {\\myprm} significantly outperforms \\emph{Classify\\textsubscript{SP}} in terms of $S^p_p$ score and DCG@3 score, which indicates that natural language inference can effectively reduce the possibility of conflict between retrieved persona and predefined personas, thus improving the final persona retrieval performance.\n\n\\textbf{The better the OOP Problem is solved, the higher the response quality of our proposed pipeline method.}\nAs response quality is shown in \\tabref{tab:retriever},\n\\emph{BM25} retrieves persona considering text similarity only, which makes the retrieved persona weakly correlated with query and even becomes noise during generation. Therefore, \\emph{BM25} produces less improvement in Entail score than other \\emph{PRM}s and reduces BLEU score that reflects generative performance.\nSince \\emph{Classify\\textsubscript{SP}} ignores the relevance between retrieved persona and predefined personas, the retrieved persona may conflict with predefined personas. In such a case, no matter which persona the generative model selects, the response will conflict with the existing persona set, resulting in a higher Conflict score. \nCompared to \\emph{Classify\\textsubscript{SP}}, NLI-based \\emph{PRM}s (\\emph{NLI\\textsubscript{HR}} and \\emph{NLI\\textsubscript{WC}}) reduce the risk of personality conflicts by considering the NLI relevance from retrieval candidates to predefined personas, responses generated based on such \\emph{PRM}s also perform well with higher BLEU scores than other \\emph{PRM}s.\nThe results show that \\emph{NLI\\textsubscript{WC}} outperforms all other \\emph{PRM}s in persona retrieval and is most helpful in improving the Entail score and reducing the Conflict score of generated responses.\n\n\n\n\n\n\n\\subsection{Human Evaluations for Generative Models (RQ3)}\\label{sec:humaneval}\\label{sec:rq3}\n\n\n\\begin{table}[t]\n \\centering\n \\captionsetup{justification=justified}\n \\caption{Human evaluation of all the generative methods. The Fleiss\u00b4kappa values of $S_{q}, S_{p}, S_{r}$ are 0.56, 0.70, and 0.42, respectively, indicating \\emph{Moderate}, \\emph{Substantial} and \\emph{Moderate agreement}. }\n \\begin{tabular}{cccc}\n \\toprule\n \\textbf{Model} & $S_q$ & $S_p$ & $S_r$ \\\\\n \\midrule\n Seq2Seq & 1.40 & 1.75 & 2.59 \\\\\n PerCVAE & 1.37 & 1.70 & 2.54 \\\\\n DialogWAE & 1.25 & 1.24 & 2.32 \\\\\n \\midrule\n Transformer & 1.96 & 1.93 & 2.93 \\\\\n TransferTransfo & 2.12 & 2.40 & 2.90 \\\\\n BoB & 2.16 & 2.39 & 2.89 \\\\\n \\midrule\n \\textbf{PS-Transformer} & \\textbf{2.24} & \\textbf{2.44} & \\textbf{2.95} \\\\\n \\bottomrule\n \\end{tabular}\n \\label{tab:overallscore}\n\\end{table}\n\n\n\nWe randomly sample 150 predictions from column \\emph{IT-ConvAI2 with PRM} in \\tabref{tab:result_inde} and invite three graduate students for evaluation. The human evaluation results are shown in \\tabref{tab:overallscore}, and \\figref{fig:spe} shows the detailed compositions of $S_{q}$ and $S_{p}$ scores.\n\n\\newcommand{\\mygen}{\\emph{PS-Transformer}}\n\n\\textbf{The \\emph{PS-Transformer} outperforms baselines by generating persona-targeted responses.}\nAs the detailed composition of $S_{p}$ is shown in \\figref{fig:spe}, the generated results of {\\mygen} have a higher probability of containing predefined personas than all baselines. Compared to \\emph{TransferTransfo} and \\emph{BoB}, {\\mygen} has a lower probability of fabricating persona. This is because {\\mygen} determines which personas should be used before generation, so those selected personas are more likely to be reflected in the response. \nIn addition, as the detailed composition of $S_{q}$ is shown in \\figref{fig:spe}, responses generated by {\\mygen} are the most consistent with queries because the personas selected by {\\mygen} in advance are strongly correlated with queries. Thus the responses generated based on the selected personas strongly correlate with the context.\nNot only do the results demonstrate that the \\emph{Target-Guided Persona Scorer} plays a vital role in accurately selecting persona to generate context-coherence responses,\nbut they are also consistent with the automatic evaluation result that \\emph{PS-Transformer} significantly outperforms other methods in both personality coherence and generating quality.\n\n\n\n\n\n\n\\subsection{Ablation Study (RQ4)}\\label{sec:rq4}\n\n\n\\begin{table}[t]\n \\centering\n \\caption{Ablation study on \\emph{IT-ConvAI2 with PRM}.}\n \n \\setlength{\\tabcolsep}{3pt}{\n \\begin{tabular}{lccccc}\n \\toprule\n \\multicolumn{1}{c}{\\multirow{2}[4]{*}{\\textbf{Settings}}} & \\textbf{Consist} & & \\multicolumn{3}{c}{\\textbf{Quality}} \\\\\n\\cmidrule{2-2}\\cmidrule{4-6} & Entail & & BLEU & ROUGE & CIDEr \\\\\n \\midrule\n \\textbf{PS-Transformer} & \\textbf{0.670} & & \\textbf{7.35} & \\textbf{1.73} & \\textbf{15.88} \\\\\n \\midrule\n - w/o Posterior Network & 0.660 & & 6.71 & 1.64 & 14.67 \\\\\n - w/o Scorer & 0.356 & & 3.54 & 1.02 & 10.05 \\\\\n \\bottomrule\n \\end{tabular}\n }\n \\label{tab:ablation}\n\\end{table}\n\nAs reported in \\tabref{tab:ablation}, we designed and evaluated two variants of \\emph{PS-Transformer}: \n(1) We first remove the posterior network (Eq.~\\ref{loss:cos}) by directly training the model with prior attention $\\mathbf{A}_i^{pri}$. It means we drop the actual personas used in real responses modeled by posterior distribution. It results in deteriorated performance, indicating the importance of the guidance from posterior information.\n(2) We remove the entire scoring mechanism (Eq.~\\ref{loss:bce}) by treating all personas equally while generating. The significant decrements of all metrics indicate that considering the relevance of personas to query and accurately selecting personas plays an important role in generating personality-consistent and high-quality responses.\n\n\n\n\n\n\n\\subsection{Effectiveness of \\emph{PS-Transformer} on ConvAI2 (RQ5)}\\label{sec:rq5}\n\n\n\\begin{table}[!t]\n \\centering\n \\caption{Automatic evaluation on original ConvAI2.}\n \n \\begin{tabular}{cccccc}\n \\toprule\n \\multirow{2}[4]{*}{\\textbf{Model}} & \\textbf{Consist} & & \\multicolumn{3}{c}{\\textbf{Quality}} \\\\\n\\cmidrule{2-2}\\cmidrule{4-6} & Entail & & BLEU & ROUGE & CIDEr \\\\\n \\midrule\n Seq2Seq & 0.092 & & 5.12 & 1.43 & 9.41 \\\\\n PerCVAE & 0.287 & & 2.44 & 0.91 & 5.49 \\\\\n DialogWAE & 0.047 & & 3.71 & 1.07 & 5.68 \\\\\n \\midrule\n Transformer & 0.544 & & 5.47 & 1.37 & 10.87 \\\\\n TransferTransfo & 0.508 & & 4.70 & 1.13 & 13.74 \\\\\n BoB & 0.499 & & 4.33 & 1.10 & 9.81 \\\\\n \\midrule\n \\textbf{PS-Transformer } & \\textbf{0.546} & & \\textbf{6.58} & \\textbf{1.49} & \\textbf{14.57} \\\\\n \\bottomrule\n \\end{tabular}\n \\label{tab:result_ori}\n \n\\end{table}\n\n\n\\begin{table*}[!htbp]\n \\centering\n \\caption{Case Study. Responses in red are consistent with agent's personas, and in blue are in conflict with them.}\n \\begin{tabular}{cp{4.5cm}p{4.5cm}p{4.5cm}}\n \\toprule\n \\textbf{Query} & Hello, I'm a art major in college, what do you do? & I am getting in spring, you married? & What grade are you in? \\\\\n \\midrule\n \\multirow{3}[2]{*}{\\textbf{Predefined Personas}} & I enjoy poetry. & I was poor growing up. & I play with my friends on the playground. \\\\\n \\multicolumn{1}{c}{} & I am a huge star wars fan. & Cheeseburgers are my favorite food. & Mickey mouse is my favorite character. \\\\\n \\multicolumn{1}{c}{} & I played football for a division of a college. & I like watching war documentaries. & I love to go to disney world every year. \\\\\n \\midrule\n \\textbf{Extended Persona} & I am studying to be a computer programmer. & I'm married. & I am in the third grade. \\\\\n \\midrule\n Seq2Seq & I am a scientist. I love to write poetry. & No, but i do like watching the flintstones. you? & I am in \\textcolor[rgb]{0.2, 0.1, 0.8}{the first grade}. how about you ? \\\\\n PerCVAE & Who r you on there? & Oh no i live at a beach. & I love to sleep in. \\\\\n DialogWAE & Do you want to go to the beach? & Wow I love to travel. i love it. & Do have any pets? \\\\\n Transformer & \\textcolor[rgb]{0.2, 0.1, 0.8}{I am a} \\textcolor[rgb]{0.9, 0.3, 0.5}{computer science} major at hs. & I am \\textcolor[rgb]{0.2, 0.1, 0.8}{getting married in 6 months}. & I am in my \\textcolor[rgb]{0.2, 0.1, 0.8}{last year of high school}. \\\\\n TransferTransfo & \\textcolor[rgb]{0.2, 0.1, 0.8}{I am a} \\textcolor[rgb]{0.9, 0.3, 0.5}{programmer}. What do you do? & Yes, \\textcolor[rgb]{0.9, 0.3, 0.5}{I am married} to my wife. How about you? & My grade is \\textcolor[rgb]{0.2, 0.1, 0.8}{5th grade} and you? \\\\\n BoB & \\textcolor[rgb]{0.2, 0.1, 0.8}{I'm a} \\textcolor[rgb]{0.9, 0.3, 0.5}{computer programmer} in college. & No, \\textcolor[rgb]{0.2, 0.1, 0.8}{I am not married}. I am a single mom. & I am in \\textcolor[rgb]{0.2, 0.1, 0.8}{10th grade}. I love it. \\\\\n \\midrule\n \\textbf{PS-Transformer} & I am a student, but I am \\textcolor[rgb]{0.9, 0.3, 0.5}{hoping to be a computer science major}. & Yes, \\textcolor[rgb]{0.9, 0.3, 0.5}{I'm married} to my wonderful husband. & I am in \\textcolor[rgb]{0.9, 0.3, 0.5}{third grade}. \\\\\n \\bottomrule\n \\end{tabular}\n \\label{tab:casestudy}\n\\end{table*}\n\nAs stated in \\tabref{tab:result_ori}, the performance of all the methods on ConvAI2 is consistent with those on IT-ConvAI2 (in \\tabref{tab:result_inde}). \nCompared to IT-ConvAI2, a large part of the conversations in ConvAI2 do not even need to be responded to using personas, but our proposed {\\mygen} still outperforms all other baselines. The \\emph{Target-Guided Persona Scorer} not only selects personas related to the query, but also excludes irrelevant personas as noise, avoiding the deliberate use of personas when generating responses. \n\n\n\n\n\\subsection{Case Study (RQ6)}\\label{sec:rq6}\n\n\nIn this section, we present an in-depth analysis of response generation of our proposed approach at the level of personality consistency. As shown in \\tabref{tab:casestudy}, we prepare three cases generated by different models to explain the superiority of our motivations in personalized dialogue generation.\n\n\\textbf{For the first case}: The results suggest that the response generated by our approach is more consistent with personas. \nFor instance, the response ``I am a student, but I am hoping to be a computer science major.'' preserves the persona ``to be a programmer''. At the same time, other methods focus on ``programmer'' only.\n\n\n\\textbf{For the second case}: The persona retrieved by \\emph{PRM} is proper for the query. The responses generated by \\emph{TransferTransfo} and \\emph{PS-Transformer} are coherent at both personality and semantic levels when other methods give wrong or irrelevant answers. It should be noted that althought we determine agent's personas as ``married'', it is still possible for agents to fabricate personas about ``gender'', which is a potential problem for further research.\n\n\n\\textbf{For the third case}: The persona retrieved by \\emph{PRM} is related to the query and strongly entails all the predefined personas. Though it is hard to exploit persona with numeric information such as ``third grade'' accurately, \\emph{PS-Transformer} still generates the response leveraging the proper persona when others give wrong answers.\n\n\n\n\\subsection{Limitations} \\label{sec:limit}\n\nA major limitation of our proposed pipeline is that the global persona set used by \\emph{PRM} is constructed in advance, which would make the pipeline still unable to handle OOP queries outside of the entire global persona set. \nA potential solution is introducing a large-scale commonsense knowledge graph (e.g., ConceptNet~\\cite{speer2017conceptnet}) to infer new personas, and the utilization of knowledge graphs leaves another research direction.\nIn this paper, we propose to tackle the OOP problem in personalized dialogue generation. To tackle the problem above, we formally define the persona extending task and demonstrate that Natural Language Inference can help \\emph{PRM} to retrieve a coherent persona for generating response. \nBesides, the \\emph{PS-Transformer} introduces a posterior network named \\emph{Target-Guided Persona Scorer} that \nhelp select persona accurately, which help generate personality-consistent responses.\nFor future work, we will explore how the extended persona affects the next extension to generalize the \\emph{retrieval-to-prediction} paradigm over multi-turn conversations.\nThis work was supported in part by the National Natural Science Foundation of China under Grant No.61602197, Grant No.L1924068, Grant No.61772076, in part by CCF-AFSG Research Fund under Grant No.RF20210005, and in part by the fund of Joint Laboratory of HUST and Pingan Property \\& Casualty Research (HPL). \n\n\\bibliographystyle{ACM-Reference-Format}\n\\balance\n\\bibliography{cikm22}", "images": []}
|
|
|
|